SGX Hardware Recovery triggered

This topic contains 8 replies, has 2 voices, and was last updated by  Joe Davis 3 years, 5 months ago.

Viewing 9 posts - 1 through 9 (of 9 total)
  • Author
    Posts
  • #31618

    Viks19
    Member

    Hi,

    I am using OMAP5 evm based platform with linux running on it. While running one of the application I get following SGX lock up detection messages. Is it always necessary to fix such error even the platform recovered successfully? OR Is it fine in case it recovers always?

    [13262.587249] PVR_K: HWRecoveryResetSGX: SGX Hardware Recovery triggered
    [13262.594451] PVR_K: SGX debug (SGX_DDK_Linux_CustomerTI sgxddk 19 1.9@2166536)
    [13262.603027] PVR_K: (P0) EUR_CR_CORE_ID: 01191201
    [13262.608856] PVR_K: (P0) EUR_CR_CORE_REVISION: 00010106
    [13262.614562] PVR_K: (P0) EUR_CR_EVENT_STATUS: 243C2780
    [13262.620727] PVR_K: (P0) EUR_CR_EVENT_STATUS2: 000000A0
    [13262.626525] PVR_K: (P0) EUR_CR_BIF_CTRL: 00000000
    [13262.632415] PVR_K: (P0) EUR_CR_BIF_BANK0: 00000007
    [13262.638183] PVR_K: (P0) EUR_CR_BIF_INT_STAT: 00080000
    [13262.644104] PVR_K: (P0) EUR_CR_BIF_FAULT: 00000000
    [13262.649841] PVR_K: (P0) EUR_CR_BIF_MEM_REQ_STAT: 00000000
    [13262.655761] PVR_K: (P0) EUR_CR_CLKGATECTL: 002AA6AA
    [13262.661529] PVR_K: (P1) EUR_CR_EVENT_STATUS: 043C2780
    [13262.667419] PVR_K: (P1) EUR_CR_EVENT_STATUS2: 000000A8
    [13262.673248] PVR_K: (P1) EUR_CR_BIF_CTRL: 00000000
    [13262.679199] PVR_K: (P1) EUR_CR_BIF_BANK0: 00000007
    [13262.684936] PVR_K: (P1) EUR_CR_BIF_INT_STAT: 00080000
    [13262.690856] PVR_K: (P1) EUR_CR_BIF_FAULT: 00000000
    [13262.696624] PVR_K: (P1) EUR_CR_BIF_MEM_REQ_STAT: 00000000
    [13262.702514] PVR_K: (P1) EUR_CR_CLKGATECTL: 002AA6AA
    [13262.708282] PVR_K: Checking EDM memory context (index = 7, PD = 0xaf786000)
    [13262.715820] PVR_K: Found MMU context for page fault 0x00000000
    [13262.722137] PVR_K: GPU memory context is for PID=646 (insmod)
    [13262.728363] PVR_K: No PDE found
    [13262.731719] PVR_K: Checking TA memory context (index = 0, PD = 0x9c880000)
    [13262.739196] PVR_K: Found MMU context for page fault 0x00000000
    [13262.745483] PVR_K: GPU memory context is for PID=1191 (viewmanager_Map)
    [13262.752685] PVR_K: No PDE found
    [13262.756072] PVR_K: Checking 3D memory context (index = 0, PD = 0x9c880000)
    [13262.769134] PVR_K: Found MMU context for page fault 0x00000000
    [13262.776672] PVR_K: GPU memory context is for PID=1191 (viewmanager_Map)
    [13262.783752] PVR_K: No PDE found
    [13262.787231] PVR_K: Checking PTLA memory context (index = 0, PD = 0x9c880000)
    [13262.794738] PVR_K: Found MMU context for page fault 0x00000000
    [13262.801055] PVR_K: GPU memory context is for PID=1191 (viewmanager_Map)
    [13262.808044] PVR_K: No PDE found
    [13262.811553] PVR_K: SGX Host control:
    [13262.815307] PVR_K: (HC-0) 0x00000001 0x00000000 0x00000000 0x00000004
    [13262.822784] PVR_K: (HC-10) 0x00000000 0x0000000A 0x00068010 0x00000003
    [13262.829864] PVR_K: (HC-20) 0x00000001 0x00000001 0x00000000 0x000036F7
    [13262.836975] PVR_K: (HC-30) 0x000D4766 0x0493FD94 0x00000000 0x00000000
    [13262.844024] PVR_K: (HC-40) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.851104] PVR_K: (HC-50) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.858306] PVR_K: (HC-60) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.865417] PVR_K: (HC-70) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.872406] PVR_K: (HC-80) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.879486] PVR_K: SGX TA/3D control:
    [13262.883422] PVR_K: (T3C-0) 0xF4003000 0xF4003120 0xF4002000 0xF4129800
    [13262.890533] PVR_K: (T3C-10) 0x00000000 0x00000000 0x00000000 0xF4002980
    [13262.897705] PVR_K: (T3C-20) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.904876] PVR_K: (T3C-30) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.912078] PVR_K: (T3C-40) 0x00000000 0x00000000 0x00000000 0x00000002
    [13262.919250] PVR_K: (T3C-50) 0x00000000 0x00000000 0x00000001 0x00003291
    [13262.926361] PVR_K: (T3C-60) 0x000034A0 0xF41DF810 0xF41DF7B8 0xF4000000
    [13262.933563] PVR_K: (T3C-70) 0xAF786000 0xF4004000 0xF41A8C00 0xF41DF810
    [13262.940673] PVR_K: (T3C-80) 0xF4121F60 0xF41AC6A0 0xF41DF7B8 0xF4122440
    [13262.948028] PVR_K: (T3C-90) 0x91FF34FF 0x90FF34FF 0x00000000 0x00000000
    [13262.955108] PVR_K: (T3C-A0) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.965087] PVR_K: (T3C-B0) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.972320] PVR_K: (T3C-C0) 0x00000000 0x00000000 0x00000000 0x00000000
    [13262.979583] PVR_K: (T3C-D0) 0x000088CE 0x000088CD 0xF4005000 0xF4010820
    [13262.986694] PVR_K: (T3C-E0) 0xF4002020 0xF411FCC0 0xF411FCC0 0x00000000
    [13262.993927] PVR_K: (T3C-F0) 0x00000000 0x000004A7 0x000004A7 0x00000000
    [13263.001037] PVR_K: (T3C-100) 0x00000001 0x00000001 0xDE676F24 0x00000000
    [13263.008392] PVR_K: (T3C-110) 0x55DE2482 0x00000000 0x00000000 0x00000000
    [13263.017089] PVR_K: SGX Kernel CCB WO:0xF7 RO:0xF7

    Thanks & Regards,
    Viks

    #38427

    Joe Davis
    Member

    Hi Viks,

    The platform should recover, but it’s best to avoid these issues if you can as they will reduce your application’s performance and may introduce instability.

    It’s possible that relying on undefined API behaviour is causing the issue. I’d recommend recording your application with PVRTrace and reviewing the output of PVRTraceGUI’s Static Analysis to see if any errors or warnings have been identified. If the application is OK, then you should report the issue to TI to see if you’re hitting a known driver bug.

    Regards,
    Joe

    #38428

    Viks19
    Member

    Thanks Joe for your quick answer.

    Thanks & Regards,
    Vikash

    #38429

    Viks19
    Member

    Hi Joe,

    I have noticed following error while running the application and at the time of lock up. Does this signifies any known issue or gives any hint to resolve lock up?

    PVR:(Error): Render Timeout! LastFrame: 565 [1657, /sgxkick_client.c]
    PVR:(Warning): PB Watermark Info – Alloc: 0x22a , Free: 0x5f8 [486, /sgxrender_targets.c]
    PVR:(Warning): PB Watermark Info – Alloc: 0x22a , Free: 0x5f8 [486, /sgxrender_targets.c]

    Regards,
    Vikash

    #38430

    Joe Davis
    Member

    Hi Viks,

    It sounds like you’re filling the Parameter Buffer and are subsequently hitting a driver bug. You should ask TI if there’s a newer graphics driver available for the platform in case this resolves the issue. However, for optimal performance you should always try to avoid filling the Parameter Buffer.

    In the latest version of PVRTune, there is an SPM counter (part of counter group #2) that identifies when Parameter Buffer overflow events have occurred. To use this, I would recommend dropping the counter onto the Graph View and using the Counter Properties dialog to change the Y axis value to a small value (e.g. 4) to make it easier to see when the value of the graphed counter changes.

    Regards,
    Joe

    #38431

    Viks19
    Member

    Hi Joe,

    Previously I was using SGX Linux DDK based on 1.9@2166536 from TI. Asking about the latest DDK I got 1.9@2253347 from TI and with this also I am seeing SGX lock up after running the Navigation application.
    Does this latest version looks sufficient for this issue?

    Currently I am in process to get the PVRTrace for this lock up and analyse it using PVRTrace GUI as you suggested.in your previous reply.

    Regards,
    Vikash

    #38432

    Joe Davis
    Member

    Hi Viks,

    Looking at the DDK commit messages, there was a fix in 2257028 that modified the Parameter Buffer resize heuristics. I suspect that the driver’s PB resizing is responsible for the issues you’re seeing. TI should be able to help you to configure the driver to disable PB resize to see if it resolves the issue.

    Regards,
    Joe

    #38433

    Viks19
    Member

    Hi Joe,

    I tried to observe the PB overflow with the latest PVRTune (SDK 3.3), however I can’t find SPM counter there. Is this requires latest driver?

    Also last micro kernel trace I am seeing are as follows:

    [ 241.097167] PVR_K: (MKT-1FB) 000A1C09 01600800 0000F91E AD000184 MKTC_3DLB_END
    [ 241.104919] PVR_K: (MKT-1FC) 000A1C5F 01600800 0100FA1D AD000281 MKTC_TALB_FINDTA
    [ 241.113220] PVR_K: (MKT-1FD) 000A1D01 01600800 0100FA1D AD000282 MKTC_TALB_END
    [ 241.121185] PVR_K: (MKT-1FE) 000EBBB9 01600800 0101041F AD000A01 MKTC_TIMER_POTENTIAL_3D_LOCKUP
    [ 241.130493] PVR_K: (MKT-1FF) 000EBC0C 01600800 0101041F AD000A0A MKTC_TIMER_LOCKUP
    [ 241.138671] PVR_K: SGX Kernel CCB WO:0xE3 RO:0xE3

    Thanks for this valuable information. I will contact TI to see if I get any help in this case.

    Best Regards,
    Vikash

    #38434

    Joe Davis
    Member

    Hi Vikash,

    Sorry for the massive delay here. I’m doing a clean up of the forum and spotted this discussion was unresolved.

    I’d forgotten to mention before that the latest version of PVRTune includes a Search widget. From here, you can search for a string in all counter and timing data that has been captured. If an SPM event has occurred, a search for the term will list all SPM events in the recording.

    Regards,
    Joe

Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic.