Using textures with premultplied alpha efficiently

This topic contains 5 replies, has 2 voices, and was last updated by  Martin Kraus 7 years, 1 month ago.

Viewing 6 posts - 1 through 6 (of 6 total)
  • Author
    Posts
  • #30339

    th_in_gs
    Member

    I’m attempting to switch to using textures with premultiplied alpha in my fragment shader (because the source of the texture images uses premultplied alpha, and it seems wasteful to un-premultiply them). My intuition tells me that this should be slightly /more/ efficient than using regular non-premultiplied textures, but I’m seeing lower performance – presumably because my calculation is not as efficient as the built-in mix() function.

    My testing’s being done on an Apple iPad.

    This is the (straightforward) code I’m using for non-premultiplied textures:

        lowp vec4 contentsColor = texture2D(sContentsTexture, vContentsCoordinate);

        lowp vec4 zoomedColor = texture2D(sZoomedContentsTexture, vZoomedContentsCoordinate);

        contentsColor = mix(contentsColor, zoomedColor, zoomedColor);

    [/CODE]

    Using the premultipleed sZoomedContentsTexture, I switch the mix() line to:

    [CODE]

        contentsColor = (contentsColor * (1.0 – zoomedColor)) + zoomedColor;

    [/CODE]

    And see my frame rate drop.

    Is there anything I can do to make this more efficient (beyond giving up on the premultiplied textures, of course)?

    [CODE]

        lowp vec4 contentsColor = texture2D(sContentsTexture, vContentsCoordinate);

        lowp vec4 zoomedColor = texture2D(sZoomedContentsTexture, vZoomedContentsCoordinate);

        contentsColor = mix(contentsColor, zoomedColor, zoomedColor);

    [/CODE]

    Using the premultipleed sZoomedContentsTexture, I switch the mix() line to:

        contentsColor = (contentsColor * (1.0 – zoomedColor)) + zoomedColor;

    [/CODE]

    And see my frame rate drop.

    Is there anything I can do to make this more efficient (beyond giving up on the premultiplied textures, of course)?

    [CODE]

        contentsColor = (contentsColor * (1.0 – zoomedColor)) + zoomedColor;

    [/CODE]

    And see my frame rate drop.

    Is there anything I can do to make this more efficient (beyond giving up on the premultiplied textures, of course)?

    #34495

    I assume the lines should be

    contentsColor =  mix(contentsColor, zoomedColor, zoomedColor.a);

    and

    contentsColor =  (contentsColor * (1.0 - zoomedColor.a)) + zoomedColor;

    How does the following perform?

    float transparency = 1.0 - zoomedColor.a;
    contentsColor = contentsColor * transparency + zoomedColor;

    In the good old days of assembler-like shading languages there was a command for linear interpolation (corresponding to mix) and also a command for the combination of a  multiplication and an addition (often called MAD). Maybe the compiler is clever enough to use it when the expressions are simple enough.

    Also, if you don’t need the alpha channel of the result, it might be worth to avoid the computation for the alpha channel.

    Well, I’m just guessing… 🙂

    #34496

    th_in_gs
    Member

    Ah, yes, you’re correct about the ‘.a’s – not sure how those went missing when I was editing the post…

    Unfortunately pulling out the subtraction doesn’t seem to help much. Maybe a /little/, but I suspect it’s just rounding:

    Speeds are:

    With mix(): 30fps

    My premultiply-aware code: 17fps

    With Martin’s extracted transparency subtraction: 18fps

    #34497

    Hmmm, is it possible to move the computation of the transparency further up in the fragment shader? (In general it is a good idea to try to move instructions that depend on each other as far apart as possible. Since transparency is used in the computation of contentsColor, this might require the unit to wait for the result of transparency before computing contentsColor. Of course, the computation of transparency itself depends on a texture lookup and it should also be as far away from that line as possible. 🙂 However, many compilers are pretty good at reordering instructions, thus, it might not make any difference if you change the order. In any case, it shouldn’t hurt to give the compiler a hint.

    #34498

    th_in_gs
    Member

    I tried moving it as far away as possible (which admittedly is not very far away – this shader doesn’t do much beyond blending textures), but it made no difference. Isn’t that to be expected though? Presumably a similar subtraction has to occur in the mix() function, and it’s plenty fast.

    Is mix() a hardware routine? I’m quite confused as to why it’s faster when I’m /trying/ to do a logically simpler operation (hence my thinking there must be a better way to do what I’m trying to do).

    #34499

    It depends on the hardware but it probably is. Have a look at the GL_ARB_fragment_program extension (http://www.opengl.org/registry/specs/ARB/fragment_program.txt); one of the commands is LRP:

        3.11.5.14  LRP: Linear Interpolation

    The LRP instruction performs a component-wise linear interpolation
    between the second and third operands using the first operand as the
    blend factor.

    tmp0 = VectorLoad(op0);
    tmp1 = VectorLoad(op1);
    tmp2 = VectorLoad(op2);
    result.x = tmp0.x * tmp1.x + (1 - tmp0.x) * tmp2.x;
    result.y = tmp0.y * tmp1.y + (1 - tmp0.y) * tmp2.y;
    result.z = tmp0.z * tmp1.z + (1 - tmp0.z) * tmp2.z;
    result.w = tmp0.w * tmp1.w + (1 - tmp0.w) * tmp2.w;

    Of course, it's up to the hardware producer how they implement the LRP instruction,
    but it's likely that it is rather efficient.
    You could also try PowerVRs shader compiler tools (I forgot the name) which should
    give you at least a rough idea about the number of actual instructions.

    Martin Kraus2010-11-04 22:55:55

Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic.