Monday, January 18, 2010

2.0 gamma textures and full-range scalars in YCoCg DXT5

A few years back there was a publication on real-time YCoCg DXT5 texture compression. There are two improvements on the technique I feel I should present:

There's a pretty clear problem right off the bat: It's not particularly friendly to linear textures. If you simply attempt to convert sRGB values into linear space and store the result in YCoCg, you will experience severe banding owing largely to the loss of precision at lower values. Gamma space provides a lot of precision at lower intensity values where the human visual system is more sensitive.

sRGB texture modes exist as a method to cheaply convert from gamma space to linear, and are pretty fast since GPUs can just use a look-up table to get the linear values, but YCoCg can't be treated as an sRGB texture and doing sRGB decodes in the shader is fairly slow since it involves a divide, power raise, and conditional.

This can be resolved first by simply converting from a 2.2-ish sRGB gamma ramp to a 2.0 gamma ramp, which preserves most of the original gamut: 255 input values map to 240 output values, low intensity values maintain most of their precision, and they can be linearized by simply squaring the result in the shader.


Another concern, which isn't really one if you're aiming for speed and doing things real-time, but is if you're considering using such a technique for offline processing, is the limited scale factor. DXT5 provides enough resolution for 32 possible scale factor values, so there isn't any reason to limit it to 1, 2, or 4 if you don't have to. Using the full range gives you more color resolution to work with.


Here's some sample code:


unsigned char Linearize(unsigned char inByte)
{
float srgbVal = ((float)inByte) / 255.0f;
float linearVal;

if(srgbVal < 0.04045)
linearVal = srgbVal / 12.92f;
else
linearVal = pow( (srgbVal + 0.055f) / 1.055f, 2.4f);

return (unsigned char)(floor(sqrt(linearVal)* 255.0 + 0.5));
}

void ConvertBlockToYCoCg(const unsigned char inPixels[16*3], unsigned char outPixels[16*4])
{
unsigned char linearizedPixels[16*3]; // Convert to linear values

for(int i=0;i<16*3;i++)
linearizedPixels[i] = Linearize(inPixels[i]);

// Calculate Co and Cg extents
int extents = 0;
int n = 0;
int iY, iCo, iCg;
int blockCo[16];
int blockCg[16];
const unsigned char *px = linearizedPixels;
for(int i=0;i<16;i++)
{
iCo = (px[0]<<1) - (px[2]<<1);
iCg = (px[1]<<1) - px[0] - px[2];
if(-iCo > extents) extents = -iCo;
if( iCo > extents) extents = iCo;
if(-iCg > extents) extents = -iCg;
if( iCg > extents) extents = iCg;

blockCo[n] = iCo;
blockCg[n++] = iCg;

px += 3;
}

// Co = -510..510
// Cg = -510..510
float scaleFactor = 1.0f;
if(extents > 127)
scaleFactor = (float)extents * 4.0f / 510.0f;

// Convert to quantized scalefactor
unsigned char scaleFactorQuantized = (unsigned char)(ceil((scaleFactor - 1.0f) * 31.0f / 3.0f));

// Unquantize
scaleFactor = 1.0f + (float)(scaleFactorQuantized / 31.0f) * 3.0f;

unsigned char bVal = (unsigned char)((scaleFactorQuantized << 3) | (scaleFactorQuantized >> 2));

unsigned char *outPx = outPixels;

n = 0;
px = linearizedPixels;
for(i=0;i<16;i++)
{
// Calculate components
iY = ( px[0] + (px[1]<<1) + px[2] + 2 ) / 4;
iCo = ((blockCo[n] / scaleFactor) + 128);
iCg = ((blockCg[n] / scaleFactor) + 128);

if(iCo < 0) iCo = 0; else if(iCo > 255) iCo = 255;
if(iCg < 0) iCg = 0; else if(iCg > 255) iCg = 255;
if(iY < 0) iY = 0; else if(iY > 255) iY = 255;

px += 3;

outPx[0] = (unsigned char)iCo;
outPx[1] = (unsigned char)iCg;
outPx[2] = bVal;
outPx[3] = (unsigned char)iY;

outPx += 4;
}
}




.... And to decode it in the shader ...



float3 DecodeYCoCg(float4 inColor)
{
float3 base = inColor.arg + float3(0, -0.5, -0.5);
float scale = (inColor.b*0.75 + 0.25);
float4 multipliers = float4(1.0, 0.0, scale, -scale);
float3 result;

result.r = dot(base, multipliers.xzw);
result.g = dot(base, multipliers.xyz);
result.b = dot(base, multipliers.xww);

// Convert from 2.0 gamma to linear
return result*result;
}

No comments:

Post a Comment