Extremely slow render to texture performance

This topic contains 7 replies, has 2 voices, and was last updated by  Joe Davis 4 years, 7 months ago.

Viewing 8 posts - 1 through 8 (of 8 total)
  • Author
    Posts
  • #31154

    registerme
    Member

    I ran into a slow performance problem with render to texture. To verify this, i modified the HelloAPI code to render to texture instead of render to screen. The screen resolution is 1920×1080 and I changed the triangle to cover the whole screen (2 triangles). Since the driver will not render if it sees the target texture is not used, I have to do a small read using glReadPixel for just one pixel size. With the normal rendering to screen, it takes 27 ms for each frame. For render to texture, it is 87 ms. Is there anything wrong here?

    #36715

    Joe Davis
    Member

    glReadPixels is a very expensive operation that serializes the render. For this reason, it should not be used to force renders to complete. As we’re discussing in this thread, rendering to two FBOs that reference each other should force the driver to kick renders without impacting performance.

    Thanks,
    Joe

    #36716

    registerme
    Member

    Thank you. I will try not to use glReadPixels then. But one question is, how comes it takes 27 ms to render a 1920×1080 rectangle? It’s doing nothing really, isn’t that taking too long time?

    #36717

    Joe Davis
    Member

    Hi,

    I’ve done some calculations, and the triangle rendered full screen on your platform should take ~17ms.
    How have you calculated your render time? If you’ve done it in your application, did you disable vsync?

    I’d recommend using PVRTune’s timing data to find the cost of your render. Even with VSync enabled, PVRTune will be able to show you the cost of your render in ms.

    Thanks,
    Joe

    #36718

    registerme
    Member

    I just do a while loop to let it render 2000 times, measure the begin time and end time, the result is (end time – begin time/2000.

    I took the code from HelloAPI and modified it. It does not seem to have vsync.

    I will try the PVRTune when I get a chance.

    Just curious: how can you do the calculation and come up with 17ms? How fast does it run on your system with GSX 540?

    #36719

    Joe Davis
    Member

    Your platform may still be VSync limited (which would account for the additional time). PVRTune’s timing data will make it clear if this is the case.

    I used a clock speed of 304MHz (from the Pandaboard Wikipedia page). I also used the number of USSE pipes in the 540 (4) and the resolution you’re rendering to.

    Thanks,
    Joe

    #36720

    registerme
    Member

    I got the PVRTune running but what “timing data” I am supposed to look for for Vsync? Read the document and it did not mention vsync.

    Could you elaborate on how the 17ms is calculated? What is USSE pipes?

    #36721

    Joe Davis
    Member

    If v-sync is disabled, then the GPU should be constantly busy. For example, a fragment processing limited application should not have any gaps between its 3D (fragment processing) tasks.

    If there are periods of time when there are no TA or 3D tasks being processed, then the render is either v-sync limited or CPU limited (a CPU limit can be confirmed by dragging the CPU load counter onto PVRTune’s graph).

    Our “Getting Great Graphics Performance with
    the PowerVR Insider SDK”
    presentation from GDC 2012 gives an overview of how PVRTune can be used to identify bottlenecks.

Viewing 8 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic.