Question on shader internal execution

Tagged: , ,

This topic contains 2 replies, has 2 voices, and was last updated by  chris_interealtime 3 years, 6 months ago.

Viewing 3 posts - 1 through 3 (of 3 total)
  • Author
    Posts
  • #31659

    I’m working on iOS, so series 5 and 6 GPUs. I’m wondering about a possible optimisation opportunity for an image processing system I’ve written.

    At the moment, the rendering happens in 2 passes, using 2 separate shaders, all in lowp or mediump. Those passes are actually separate but could be done in parallel.

    However, the recent disclosers about the ALU layouts got me thinking. There’s a mix of 16 and 32 bit ALUs. If I combined my 2 render passes into 1 shader, I could handle 1 pass in low or mediump, and the other in highp. Since they’re completely separable, they could run in parallel.

    So, the question is, how well would that actually work, from the shader side at least? I think it mainly comes down to this question: is it likely the compiler is adapting my current shader so it runs on both 16/32bit ALUs? If it’s already filling the pipeline well, it’s going to be a net loss I’m sure, but if the 32bit ALUs are currently sitting empty it could be a big win.

    Any hints on how that might play out would be welcome 🙂

    #38591

    Joe Davis
    Member

    Hi Chris,

    In Series6 and Series6XT, the pipelines can only take one precision path at a given point in time, i.e. it’s not possible to process F32 & F16 work during the same cycle. The reason for this is that there is shared hardware logic between the different precision paths.

    Thanks,
    Joe

    #38592

    Thanks Joe, that answers it. Makes perfect sense.

Viewing 3 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic.