There's another realization of it I made recently: As a YUV-style color space, the Co and Cg channels are constrained to a range that's directly proportional to the Y channel. The addition of the scalar blue channel was mainly introduced to deal with resolution issues that caused banding artifacts on colored objects changing value, but the entire issue there can be sidestepped by simply using the Y channel as a multiplier for the Co and Cg channels, causing them to only respect tone and saturation while the Y channel becomes fully responsible for intensity.
This is not a quality improvement, in fact it nearly doubles PSNR in testing. However, it does result in considerable simplification of the algorithm, both on the encode and decode sides, and the perceptual loss compared to the old algorithm is very minimal.
This also simplifies the algorithm considerably:
int iY = px[0] + 2*px[1] + px[2]; // 0..1020
int iCo, iCg;
if (iY == 0)
{
iCo = 0;
iCg = 0;
}
else
{
iCo = (px[0] + px[1]) * 255 / iY;
iCg = (px[1] * 2) * 255 / iY;
}
px[0] = (unsigned char)iCo;
px[1] = (unsigned char)iCg;
px[2] = 0;
px[3] = (unsigned char)((iY + 2) / 4);
... And to decode:
float3 DecodeYCoCgRel(float4 inColor)
{
return (float3(4.0, 0.0, -4.0) * inColor.r
+ float3(-2.0, 2.0, -2.0) * inColor.g
+ float3(0.0, 0.0, 4.0)) * inColor.a;
}
While this does the job with much less perceptual loss than DXT1, and eliminates banding artifacts almost entirely, it is not quite as precise as the old algorithm, so using that is recommended if you need the quality.