eglSwapBuffer is slow on SGX 530

This topic contains 11 replies, has 2 voices, and was last updated by  Joe Davis 3 years, 7 months ago.

Viewing 12 posts - 1 through 12 (of 12 total)
  • Author
    Posts
  • #31631

    videoguy
    Member

    We have an embedded system that uses TI 8168 based h/w. This SOC has SGX 530 as its GPU. We are running Android 4.0.3 on this platform. We found the frame rate of Android apps seem to be low compared to same app running on similarly powered tablet.

    After timing various functions in Android activity, android platform code, we noticed eglSwapBuffer() is taking 45+ ms most of the times.
    What I read in various forums is that it should not be more than 16ms. The resolution of the frame buffer is 1920×1080.

    I appreciate any suggestions to fix this bottleneck.

    The contents of egl.cfg from /system/lib/egl folder:

    0 0 android
    0 1 POWERVR_SGX530_125

    The contents of /system/lib/egl folder

    egl.cfg
    libEGL_POWERVR_SGX530_125.so
    libGLES_android.so
    libGLESv1_CM_POWERVR_SGX530_125.so
    libGLESv2_POWERVR_SGX530_125.so

    The contents of rc.pvr script from /system/bin/sgx folder

    #!/system/bin/sh

    # PowerVR SGX DDK for Embedded Linux – installation script
    #
    # Copyright 2004-2006 by Imagination Technologies Limited.
    # All rights reserved. No part of this software, either
    # material or conceptual may be copied or distributed,
    # transmitted, transcribed, stored in a retrieval system
    # or translated into any human or computer language in any
    # form by any means, electronic, mechanical, manual or
    # other-wise, or disclosed to third parties without the
    # express written permission of Imagination Technologies
    # Limited, Unit 8, HomePark Industrial Estate,
    # King’s Langley, Hertfordshire, WD4 8LZ, U.K.

    # Auto-generated for omap4430_android from
    # $RCSfile: common.m4 $ $Revision: 1.8 $
    # $RCSfile: rc.pvr.m4 $ $Revision: 1.26 $
    # $RCSfile: rc.pvr.m4 $ $Revision: 1.5 $
    #

    load_pvr()
    {
    /system/bin/devmem2 0x48180F04 w 0x0
    /system/bin/devmem2 0x48180900 w 0x2
    /system/bin/devmem2 0x48180920 w 0x2

    insmod /system/bin/sgx/pvrsrvkm.ko
    insmod /system/bin/sgx/omaplfb.ko
    sleep 1
    chmod 0666 /dev/pvrsrvkm
    /system/bin/pvrsrvinit
    echo “Loaded PowerVR consumer services.”
    return 0;
    }

    unload_pvr()
    {
    if rmmod omaplfb; then :; else return 1; fi
    if rmmod pvrsrvkm; then :; else return 1; fi
    echo “Unloaded PowerVR consumer services.”
    return 0;
    }

    # Deal with the type of invocation we get.
    #
    case “$1” in
    “start”)
    load_pvr
    ;;
    stop)
    if ! unload_pvr; then
    echo “Couldn’t unload modules” >&2;
    fi
    ;;
    reload|restart)
    if unload_pvr; then
    load_pvr
    else
    echo “Couldn’t unload modules” >&2;
    fi
    ;;
    *)
    echo “$0: unknown argument $1.” >&2;
    ;;
    esac

    #38473

    videoguy
    Member

    The Surface flinger logs when system is booting up. This is to give an idea what GL extensions are supported, version of EGL etc.

    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): starting up service SurfaceFlinger
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): setting display to 0
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): SurfaceFlinger is starting
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): Returning SufraceFlinger service
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): Initializing thread for: SurfaceFlinger
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): SurfaceFlinger’s main thread ready to run. Initializing graphics H/W…
    2014-02-15 12:54:35 WARNING logcat: hd[0]: SurfaceFlinger(687): ro.sf.lcd_density not defined, using 160 dpi by default.
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): EGL informations:
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): # of configs : 30
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): vendor : Android
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): version : 1.4 Android META-EGL
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): extensions: EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_base EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_gl_renderbuffer_image EGL_KHR_fence_sync EGL_ANDROID_image_native_buffer EGL_ANDROID_image_native_buffer
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): Client API: OpenGL ES
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): EGLSurface: 8-8-8-8, config=0x1
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): OpenGL informations:
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): vendor : Imagination Technologies
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): renderer : PowerVR SGX 530
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): version : OpenGL ES-CM 1.1
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): extensions: GL_OES_byte_coordinates GL_OES_fixed_point GL_OES_single_precision GL_OES_matrix_get GL_OES_read_format GL_OES_compressed_paletted_texture GL_OES_point_sprite GL_OES_point_size_array GL_OES_matrix_palette GL_OES_draw_texture GL_OES_query_matrix GL_OES_texture_env_crossbar GL_OES_texture_mirrored_repeat GL_OES_texture_cube_map GL_OES_blend_subtract GL_OES_blend_func_separate GL_OES_blend_equation_separate GL_OES_stencil_wrap GL_OES_extended_matrix_palette GL_OES_framebuffer_object GL_OES_rgb8_rgba8 GL_OES_depth24 GL_OES_stencil8 GL_OES_compressed_ETC1_RGB8_texture GL_OES_mapbuffer GL_OES_EGL_image GL_OES_EGL_image_external GL_EXT_multi_draw_arrays GL_OES_required_internalformat GL_IMG_read_format GL_IMG_texture_compression_pvrtc GL_IMG_texture_format_BGRA8888 GL_EXT_texture_format_BGRA8888 GL_OES_egl_sync GL_IMG_vertex_array_object
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): GL_MAX_TEXTURE_SIZE = 2048
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): GL_MAX_VIEWPORT_DIMS = 2048 x 2048
    2014-02-15 12:54:35 INFO logcat: hd[0]: SurfaceFlinger(687): flags = 00010000
    2014-02-15 12:54:35 WARNING logcat: hd[0]: SurfaceFlinger(687): This platform does not support HW composer

    #38474

    videoguy
    Member

    The driver version based on /proc/ovr/version:

    / # cat /proc/pvr/version
    Version blaze_android_sgx_ogles1_ogles2_GPL sgxddk 18 1.8@789263 (release) omap4430_android
    System Version String: SGX revision = 1.2.5

    #38475

    Joe Davis
    Member

    Hi,

    Can you capture and share a PVRTrace recording of your application with us? If you would like to share the recorded file with us confidentially, please attach it to a ticket in our Support system.

    Regards,
    Joe

    #38476

    videoguy
    Member

    Interesting comments for this post on StackOverflow: http://stackoverflow.com/questions/22028742/eglswapbuffer-on-sgx-530-is-slow

    #38477

    videoguy
    Member

    I uploaded .pvrt file to google docs folder as the support ticketing system doesn’t allow files bigger than 2MB.
    The Google docs link is https://drive.google.com/file/d/0B1LCVMNS8uWVRTNRckFZb1JTeUU/edit?usp=sharing.

    The android app came up; I waited 10 minutes and used it that makes the app draw couple of points in 5×5 pixel area. Then I downloaded the .pvrt file down to my Mac and uploaded it.

    We have custom UI launcher that doesn’t allow me to launch PVRTrace setup app on the device. I ended up modifying egl.cfg, pvrtrace.cfg etc to generate .pvrt file for every GL app.
    Even though the surfaceflinger is OpenGL app, I didn’t see it generate .pvrt file.

    Please look at the .pvrt file for the app though. When I looked at it in PVRTrace GUI, I saw 400+ gl calls with 30+ texture operations for the frame that I thought touched 5×5 pixel area. It is interesting data.

    #38478

    videoguy
    Member

    I am wondering if there is finer control to start/stop pvr tracing from code directly. I like to start/stop it from SurfaceFlinger C++ code based on a runtime condition.

    #38479

    Joe Davis
    Member

    Hi,

    I’ve downloaded your PVRT file, but my PVRTraceGUI (SDK 3.2) is unable to load it. Can you share the version number of your PVRTraceGUI with us? It can be found in the “Help–>Feedback..” toolbar dialog.

    There is a network recording mode, where PVRTraceGUI can be used to send start and stop signals to the PVRTrace recorder library. The PVRTrace User Manual explains how to configure the recorder library for this mode.

    As you’re investigating a performance issue, I would also recommend running PVRTune to profile your app. It may be the case that eglSwapBuffer() is blocking for a long period of time because the driver is waiting for the next V sync event. PVRTune will help you understand where the bottleneck in your render is.

    Regards,
    Joe

    #38480

    videoguy
    Member

    I updated my sdk to 3.2. Now I regenerated the pvrtune and pvrt files.

    I launched app and clicked on couple of buttons. While I am doing this, I have PVRTune app connected and logging the results. After the above test, I disconnected the app and saved the results to surfaceflingertune.pvrtune file.

    It is available at https://drive.google.com/file/d/0B1LCVMNS8uWVeDgxSnNReUlFVUk/edit?usp=sharing

    I pulled generated pvrt file and uploaded that to https://drive.google.com/file/d/0B1LCVMNS8uWVOVJkeUNYbUpiTUk/edit?usp=sharing

    Please keep in mind that these files are for surfaceflinger, not for my app. Surfaceflinger is the window compositor on Android responsible for displaying final frame on display.

    Please look at the last 4 frames in pvrt to match the pvrtune results. The pid of surfaceflinger is 987. The pid of my app is 2698. Please scroll the time line to 3762.7 secs to see the logs for the above activity.

    I see that GPU 3D core was busy for 33ms and GPU TA was busy for 4.5 ms.

    Do you see anything suspicious in PVRT file for these frames?

    Thanks for your help.

    #38481

    Joe Davis
    Member

    I’ve had a look at your PVRTune recording. I take it the large idle periods are by design, i.e. you’re pausing for ~0.5 seconds between your rendering?

    At the time period you’ve mentioned, SurfaceFlinger is causing the performance bottleneck. The reason that your application is blocking in eglSwapBuffers() for so long is that it has to wait for SurfaceFlinger to finish before the next frame’s render can begin.

    On early SGX cores (such as the SGX530), the GPU can only process a single context at a time. This means that the TA and 3D tasks of a single process can overlap, but the GPU cannot process the TA task of one process and the 3D of another. This is the reason why the TA and 3D tasks are processed serially (not overlapping) in your recording.

    It will not be possible to optimize the processing time of SurfaceFlinger, as it is simply blending visible layers. The 3D time will purely be the bandwidth cost of sending surface data to the GPU for composition. However, composition can be moved to a dedicated 2D core (if there is one available) or you may be able to use TI’s Android composition bypass. If you would like to know more about composition bypass and if it can be enabled for your platform, you should discuss this with TI.

    Regards,
    Joe

    #38482

    videoguy
    Member

    Thanks Joe for your analysis. I am going to ask TI folks about composition bypass.

    What I don’t quite understand is the amount of time that the GPU is taking to handle one frame. All surfaceflinger is doing is pushing the window that was already rendered by my app. In our system, we have only one foreground application and we don’t show top status bar and other android gadgets seen in Android UI. The foreground app is full screen application.

    This is chronology of events with Android apps.

    1) App renders to its surface using OpenGL
    2) App signals surfaceflinger to post its surface
    3) SurfaceFlinger composites windows based on their z-order. In our environment, there should be only one surface.
    4) SurfaceFlinger posts it to GPU eventually by calling eglSwapBuffer.

    As each surface is created using gralloc, same buffer should be available to GPU without any need for extra copy. Basically steps (2), (3) and (4) should be null ops. But in all the PVRTune logs, I see sgx 3d core was consistently busy for 34-36ms during SurfaceFlinger posts. I am wondering why.

    Thanks

    #38483

    Joe Davis
    Member

    With SurfaceFlinger enabled, the GPU will have to read from all surfaces that need to be blended and then write into SurfaceFlinger’s target surface. These read-write operations are very memory bandwidth intensive.

    Composition bypass will allow your system to write directly into the surface that will be output to the display, thus removing the composition overhead.

    Joe

Viewing 12 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic.