Alpha Test VS Alpha Blend

This topic contains 13 replies, has 2 voices, and was last updated by  jarod 2 years, 6 months ago.

Viewing 14 posts - 1 through 14 (of 14 total)
  • Author
    Posts
  • #48685

    jarod
    Member

    I must say sorry that after reading many imgtec documents and performance guide, I still have question on the performance between alpha test and alpha blend on PoweVR

    With HSR, opaque pixel will get the best performance, however, alpha test and alpha blend still have to process all fragment cause their visibility are unknown, so they are supposed to get almost the same performance.

    In this topic:https://community.imgtec.com/forums/topic/question-about-alpha-test-performance/ Joe said:”Because the hardware does not have to re-determine fragment visibility when alpha blending”, I think this “re-determine” is the alpha value comparison in alpha test, it should be fast, and wont be an overhead even this goes twice(one in HSR stage, one in the real fragment processing), isn’t it?

    #48699

    Joe Davis
    Member

    Hi Jarod,

    Apologies this point wasn’t clear enough in our current documentation. We will aim to improve our overview for a future release.

    When the GPU processes blended object, it knows that every pixel in that primitive will be processed. This means that any depth/stencil writes that need to be performed by the ISP happen immediately for the entire primitive.

    When discard is used, pixel visibility isn’t known until the corresponding fragment shader executes. Because of this, depth and stencil writes must be deferred until pixel visibility is known. This reduces performance as the pixel visibility information has to be fed back to the ISP unit after shader execution to determine which depth/stencil positions need to be written to. The cost of this can vary, but in the worse case the entire fragment processing pipeline will stall until the deferred depth/stencil writes complete.

    #48731

    jarod
    Member

    Thank for your detailed explaination, and I made this diagram to confirm my current conprehension.
    PVRDemostration
    so for opaque, it goes HSR and saves real fragment processing time,
    for alpha test, it goes HSR and still have to process all fragment,
    for alpha blend, it skip HSR and process all fragment. (I misunderstood this and thought it goes to HSR as well)
    am I right?

    Attachments:
    #48742

    Joe Davis
    Member

    Hi Jarod,

    All primitives, regardless of blend/discard state, will go through the HSR process.

    Consider a case where a small town with buildings and trees is rendered. A well optimized application will submit all of the opaque draws first (buildings and tree trunks). The leaves of the tree may then be drawn as a textured quad, where the discard keyword is used to punch through transparent regions of the texture so only the leaves are drawn. If a building is partially obscuring the leaves of a tree, then the obscured leaves can be rejected from the render by the HSR unit when depth testing is performed.

    The statue in figure 10 of our PowerVR Hardware Architecture Overview for Developers is a good example of the GPU’s ability to discard blended fragments that are obscured by opaque fragments.

    The reason alpha test/discard is more costly is that is goes through the HSR stage twice; once to perform depth/stencil tests to see if any of the fragments are obscured, and a second time to write depth/stencil values for any fragments that were not discarded when the fragment shader executed.

    #48751

    jarod
    Member

    stageO
    I got it. For now as I understand, following are two figures show the flowline in the architecture, upper is alphablend, the other is alphatest
    stageA

    stageB

    in preprocess stage, it won’t process fragment shader, so it’s fast but meanwhile it won’t be able to know which pixel is discarded, so alphatest write depth after fragment shader processing, and alphablend write depth before it, is this right?

    Attachments:
    #48762

    Joe Davis
    Member

    Hi Jarod,

    Your description isn’t quite right. In your diagrams, you’ve shown two passes through the HSR stage for blended objects and three passes for alpha test. This should actually be one pass through it for blended primitives and two passes through it for alpha test.

    In the blended primitive case, depth tests and writes can be performed before the shader executes (one pass through HSR). For alpha tested primitives, depth testing can be performed on the initial pass, but a second pass through HSR is required after shader execution to update the on-chip depth buffer with data for the visible fragments.

    Hope this helps,
    Joe

    #48766

    jarod
    Member

    the flowline start point should not be considered as a “pass-through”, this’s my drawing mistake.
    stageA1
    stageB1
    I think my understanding is closer to your idea now:
    for alphablend:
    1. it preprocess the fragment (all preprocessing won’t involve fragment shaders, so that’s why it’s fast?)
    2. then it obtain depths (since no discard used), go back to HSR unit and do HSR, and write the depth buffer.(PASS 1 as you said)
    3. process the fragment shader in a traditional way, then blend and output

    for alphatest:
    1. it preprocess the fragment
    2. then it obtain depths and go back to HSR unit ( but depths can’t be used since discarding exists) (PASS 1)
    3. process the fragment shader in a traditional way,
    4. real depths obtained, then go back to HSR unit and write depth buffer. (PASS 2)
    5. then output.

    does this right?

    Attachments:
    #48784

    Joe Davis
    Member

    Hi Jarod,

    >1. it preprocess the fragment [all preprocessing won’t involve fragment shaders, so that’s why it’s fast?]
    There is no preprocessing stage. The ISP unit (where HSR is performed) has access to the position data of all primitives within the tile. With this information, it can perform depth and stencil reads/writes immediately. Blended primitives go through the exact same stages as opaque primitives. As all fragments of a blended object are considered to be visible, depth tests and writes can be performed up front. The only difference is that blended primitives must be sent down the pipeline one at a time to ensure they are processed in the submission order specified by the application.

    Blended primitives will do the following:
    1. ISP HSR: Depth and and stencil tests and writes
    2. Shading: Colours are calculated for fragments that pass the tests

    For alpha testing, there isn’t any preprocessing either. The only difference between alpha test and the blended path is that depth and stencil writes must be deferred until the shader has executed.

    Alpha tested primitives will do the following:
    1. ISP HSR: Depth and and stencil tests (no writes)
    2. Shading: Colours are calculated for fragments that pass the tests
    3. Visibility feedback to ISP: After the shader has executed, the GPU knows which fragments were discarded and which where kept. Visibility information is fed back to the ISP so depth and stencil writes can be performed for the fragments that passed the alpha test

    #48785

    jarod
    Member

    I think I’m clear now, but is going through HSR very expensive,
    alphatest: [HSR1(depth tests) + HSR2(depth writes)] > alphablend:[HSR1(depth tests + depth writes)] ? I don’t have much chip hierarchy knowledge.

    #48787

    Joe Davis
    Member

    Hi Jarod,

    The HSR process itself is very cheap. The overhead of alpha test comes from the ISP being blocked until alpha tested primitive visibility information has been fed back to the ISP. This blocking has to occur so depth and stencil buffers can be updated with the alpha tested primitive before subsequent draws are processed.

    #48794

    jarod
    Member

    Hi, Joe, finally I got the answer, thanks a lot for your patience on explaining me the techinic background of this question. 🙂

    #48811

    jarod
    Member

    And I still want to ask one more: When would the ISP unit be available for the next primitive? after depth writing of current primitive ends? so in the pipeline for alphablend the next primitive can enter ISP earlier than alphatest?

    #48829

    Joe Davis
    Member

    after depth writing of current primitive ends?

    As soon as the ISP has finished processing a primitive, it can begin processing another.

    so in the pipeline for alphablend the next primitive can enter ISP earlier than alphatest?

    That’s correct. In the blend scenario, the ISP can continue to process new primitives while the USCs are calculating colours for primitives that have already propagated through the pipeline.

    #48836

    jarod
    Member

    Nice! awesome hierarchy 🙂

Viewing 14 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic.