Most of the images in Glider PRO's resources are in PICT format.

The PICT format is basically a bunch of serialized QuickDraw opcodes and can contain a combination of both image and vector data.

The first goal is to get all of the known resources to parse. The good news is that none of the resources in the Glider PRO application resources or any of the houses contain vector data, so it's 100% bitmaps. The bad news is that the bitmaps have quite a bit of variation in their internal structure, and sometimes they don't match the display format.

Several images contain multiple images spliced together within the image data, and at least one image is 16-bit color even though the rest of the images are indexed color. One is 4-bit indexed color instead of 8-bit. Many of them are 1-bit, and the bit scheme for 1-bit images is also inverted compared to the usual expectations (i.e. 1 is black, 0 is white).

Adding to these complications, while it looks like all of the images are using the standard system palette, there's no guarantee that they will - It's actually even possible to make a PICT image that combines multiple images with different color palettes, because the palette is defined per picture op, not per image file.

There's also a fun quirk where the PICT image frame doesn't necessarily have 0,0 as the top-left corner.

I think the best solution to this will simply be to change the display type to 32-bit and unpack PICT images to a single raster bitmap on load. The game appears to use QuickDraw abstractions for all of its draw operations, so while it presumes that the color depth should be 8-bit, I don't think there's anything that will prevent GlidePort from using 32-bit instead.

In the meantime, I've been able to convert all of the resources in the open source release to PNG format as a test, so it should be possible to now adapt that to a runtime PICT loader.

## Saturday, November 23, 2019

## Thursday, October 10, 2019

### Porting Glider - Part 1

Recently found out that Classic Mac game Glider PRO's source code was released, so I'm starting a project called GlidePort to bring it to Windows, ideally in as faithful of a reproduction as possible and using the original data files. Some additions like gamepad support may come at a later time if this stays on track.

While this is a chance to restore of the few iconic Mac-specific games of the era to, it's also a chance to explore a lot of the era technology, so I'll be doing some dev diaries about the process.

Porting Glider has a number of technical challenges: It's very much coded for the Mac platform, which has a lot of peculiarities compared to POSIX and Windows. The preferred language for Mac OS was originally Pascal, so the C standard library is often mostly or entirely unused, and the Macintosh Toolbox (the operating system API) has differences like preferring length-prefixed strings instead of C-style null terminated strings.

Data is in big endian format, as it was originally made for Motorola 68k and PowerPC CPUs. Data files are split into two "forks," one as a flat data stream and the other as a resource database that the toolbox provides parsing facilities for. In Mac development, parsing individual data elements was generally the preferred style vs. reading in whole structures, which leads to data formats often having variable-length strings and no padding for character buffer space or alignment.

Rendering is done using QuickDraw, the system-provided multimedia infrastructure. Most images use the system-native PICT format, a vector format that is basically a list of QuickDraw commands.

At minimum, this'll require parsing a lot of Mac native resource formats, some Mac interchange formats (i.e. BinHex 4), reimplementation of a subset of QuickDraw and QuickTime, substitution of copyrighted fonts, and switch-out of numerous Mac-specific compiler extensions like dword literals and Pascal string escapes.

The plan for now is to implement the original UI in Qt, but I might rebuild the UI instead if that turns out to be impractical.

## Wednesday, September 4, 2019

### Efficient ETC compression using unique cumulative offsets

When adding ETC support to Convection Texture Tools, I decided to try adapting the cluster fit algorithm used for desktop formats to ETC.

Cluster fit works by sorting the pixels into an order based on a color axis, and then repeatedly evaluating each possible combination of counts of the number of pixels assigned to each index. It does so by taking the pixels and applying a least-squares fit to produce the endpoint line.

For ETC, this is is simplified in a few ways: The axis is always 1,1,1, so the step of picking a good axis is unnecessary. There is only one base color and the offsets are determined by the table index, so the clustering step would only solve the base color.

Assuming that you know what the offsets for each pixel are, the least squares fit amounts to simply subtracting the offset from each of the input pixels and averaging the result.

For a 4x2 block, there are 165 possible cluster configurations, but it turns out that some of those are redundant, given certain assumptions. The base color is derived from the formula ((color1-offset1)+(color2-offset2)+...)/8, but since the adds are commutative, that's identical to ((color1+color2+...)-(offset1+offset2+...))/8

The first half of that is the total of the colors, which is constant. The second is the total of the offsets.

Fortunately, not all of the possible combinations produce unique offsets. Some of them cancel out, since adding 1 to or subtracting 1 from the count of the offsets that are negatives of each other produces no change. In an example case, the count tuples (5,0,1,2) and (3,2,3,0) are the same, since 5*-L + 0*-S + 1*S + 2*L = 3*-L + 2*-S + 3*S + 0*L.

For most of the tables, this results in only 81 possible offset combinations. For the first table, the large value is divisible by the small value, causing even more cancellations, and only 57 possible offset combinations.

Finally, most of the base colors produced by the offset combinations are not unique after quantization: Differential mode only has 5-bit color resolution, and differential mode only has 4-bit resolution, so after quantization, many of the results get mapped to the same color. Deduplicating them is also inexpensive: If the offsets are checked in ascending order, then once the candidate color progresses past the threshold where the result could map to a specific quantized color, it will never cross back below that threshold, so deduplication only needs to inspect the last appended quantized color.

Together, these reduce the candidate set of base colors to a fairly small number, creating a very optimal search space at low cost.

There are a few circumstances where these assumptions don't hold:

One is when the clamping behavior comes into effect, particularly when a pixel channel's value is near 0 or 255. In that case, this algorithm can't account for the fact that changing the value of the base color would have no effect on some of the offset colors.

One is when the pixels are not of equal importance, such as when using weight-by-alpha, which makes the offset additions non-commutative, but that only invalidates the cancellation part of the algorithm. The color total can be pre-weighted, and the rest of the algorithm would have to rely on working more like cluster fit: Sort the colors along the 1,1,1 line and determine the weights for the pixels in that order, generate all 165 cluster combinations, and compute the weight totals for each one. Sort them into ascending order, and then the rest of the algorithm should work.

One is when dealing with differential mode constraints, since not all base color pairs are legal. There are some cases where a base color pair that is just barely illegal could be made legal by nudging the colors closer together, but in practice, this is rare: Usually, there is already a very similar individual mode color pair, or another differential mode pair that is only slightly worse.

In CVTT, I deal with differential mode by evaluating all of the possibilities and picking the best legal pair. There's a shortcut case when the best base color for both blocks produces a legal differential mode pair, but this is admittedly a bit less than optimal: It picks the first evaluation in the case of a tie when searching for the best, but since blocks are evaluated starting with the largest combined negative offset, it's a bit more likely to pick colors far away from the base than colors close to the base, even though colors closer to the average tend to produce smaller offsets and are more likely to be legal, so this could be improved by making the tie-breaking function prefer smaller offsets.

In practice though, the differential mode search is not where most of the computation time is spent: Evaluating the actual base colors is.

As with the rest of CVTT's codecs, brute force is still key: The codec is designed to use 8-wide SSE2 16-bit math ops wherever possible to processing 8 blocks at once, but this creates a number of challenges since sorting and list creation are not amenable to vectorization. I solve this by careful insertion of scalar ops, and the entire differential mode part is scalar as well. Fortunately, as stated, the parts that have to be scalar are not major contributors to the encoding time.

You can grab the stand-alone CVTT encoding kernels here: https://github.com/elasota/ConvectionKernels

Cluster fit works by sorting the pixels into an order based on a color axis, and then repeatedly evaluating each possible combination of counts of the number of pixels assigned to each index. It does so by taking the pixels and applying a least-squares fit to produce the endpoint line.

For ETC, this is is simplified in a few ways: The axis is always 1,1,1, so the step of picking a good axis is unnecessary. There is only one base color and the offsets are determined by the table index, so the clustering step would only solve the base color.

Assuming that you know what the offsets for each pixel are, the least squares fit amounts to simply subtracting the offset from each of the input pixels and averaging the result.

For a 4x2 block, there are 165 possible cluster configurations, but it turns out that some of those are redundant, given certain assumptions. The base color is derived from the formula ((color1-offset1)+(color2-offset2)+...)/8, but since the adds are commutative, that's identical to ((color1+color2+...)-(offset1+offset2+...))/8

The first half of that is the total of the colors, which is constant. The second is the total of the offsets.

Fortunately, not all of the possible combinations produce unique offsets. Some of them cancel out, since adding 1 to or subtracting 1 from the count of the offsets that are negatives of each other produces no change. In an example case, the count tuples (5,0,1,2) and (3,2,3,0) are the same, since 5*-L + 0*-S + 1*S + 2*L = 3*-L + 2*-S + 3*S + 0*L.

For most of the tables, this results in only 81 possible offset combinations. For the first table, the large value is divisible by the small value, causing even more cancellations, and only 57 possible offset combinations.

Finally, most of the base colors produced by the offset combinations are not unique after quantization: Differential mode only has 5-bit color resolution, and differential mode only has 4-bit resolution, so after quantization, many of the results get mapped to the same color. Deduplicating them is also inexpensive: If the offsets are checked in ascending order, then once the candidate color progresses past the threshold where the result could map to a specific quantized color, it will never cross back below that threshold, so deduplication only needs to inspect the last appended quantized color.

Together, these reduce the candidate set of base colors to a fairly small number, creating a very optimal search space at low cost.

There are a few circumstances where these assumptions don't hold:

One is when the clamping behavior comes into effect, particularly when a pixel channel's value is near 0 or 255. In that case, this algorithm can't account for the fact that changing the value of the base color would have no effect on some of the offset colors.

One is when the pixels are not of equal importance, such as when using weight-by-alpha, which makes the offset additions non-commutative, but that only invalidates the cancellation part of the algorithm. The color total can be pre-weighted, and the rest of the algorithm would have to rely on working more like cluster fit: Sort the colors along the 1,1,1 line and determine the weights for the pixels in that order, generate all 165 cluster combinations, and compute the weight totals for each one. Sort them into ascending order, and then the rest of the algorithm should work.

One is when dealing with differential mode constraints, since not all base color pairs are legal. There are some cases where a base color pair that is just barely illegal could be made legal by nudging the colors closer together, but in practice, this is rare: Usually, there is already a very similar individual mode color pair, or another differential mode pair that is only slightly worse.

In CVTT, I deal with differential mode by evaluating all of the possibilities and picking the best legal pair. There's a shortcut case when the best base color for both blocks produces a legal differential mode pair, but this is admittedly a bit less than optimal: It picks the first evaluation in the case of a tie when searching for the best, but since blocks are evaluated starting with the largest combined negative offset, it's a bit more likely to pick colors far away from the base than colors close to the base, even though colors closer to the average tend to produce smaller offsets and are more likely to be legal, so this could be improved by making the tie-breaking function prefer smaller offsets.

In practice though, the differential mode search is not where most of the computation time is spent: Evaluating the actual base colors is.

As with the rest of CVTT's codecs, brute force is still key: The codec is designed to use 8-wide SSE2 16-bit math ops wherever possible to processing 8 blocks at once, but this creates a number of challenges since sorting and list creation are not amenable to vectorization. I solve this by careful insertion of scalar ops, and the entire differential mode part is scalar as well. Fortunately, as stated, the parts that have to be scalar are not major contributors to the encoding time.

You can grab the stand-alone CVTT encoding kernels here: https://github.com/elasota/ConvectionKernels

Subscribe to:
Posts (Atom)