The Code Deposit

Saturday, June 8, 2024

A postmortem on failing to adapt FSE to GPU compute

I think failures are educational and worth documenting, so here's the story of how FSE-on-GPU failed.

One of the things I'm working on right now is Gstd, an attempt to make a Zstandard variant that runs on GPU compute.

Zstandard uses a newer entropy coding method called FSE, which is a variant of tabled asymmetric numeral systems (tANS). Using this scheme requires two steps, both of which have challenges in how to implement them on GPU compute: Decoding the probabilities and converting those probabilities into a decode table.

GPUs are designed for highly parallel instruction execution, especially executing the same operation on many different inputs at once (a "vector" of values). Like Brotli-G and GDEFLATE, Gstd uses multiple bitstreams simultaneously to allow values to be parsed from all of them at once.

One of the reasons this type of scheme is so efficient is that GPUs have instructions to compute a running total for each of the inputs, so if you can create a vector containing 0 for lanes that don't need to be refilled and 1 for lanes that do need to be refilled, and do a prefix-sum op, then you'll get a vector where all of the lanes contain the offset from the current read position that they need to be refilled from.

Doing things in parallel this way however requires minimizing the dependency of sequential values. Running totals ARE dependent on preceding values, but they can be done fast because the GPU has an operation to do it.

This creates a problem for decoding FSE because it has multiple steps that are sequentially dependent.

Here's a quick summary of the FSE table decoding and generation algorithm:

Determine how many table slots have yet to be assigned.
Determine the maximum possible value that the next value to read can have - It is equal to the number of remaining slots + 1.
Load enough bits to encode that value.
If the value is small enough (how small is determined by the number of remaining slots), return 1 bit to the bitstream (or, depending on how you phrase it, don't load it in the first place). This step prevents any value from being encoded that would be larger than the number of remaining slots.
Subtract 1 from the decoded number
If the resulting number is -1, then it is a special "less than one" probability with a slot usage of 1. Otherwise the slot usage is the decoded value.
Repeat the preceding steps until there are no slots remaining.
Once all symbols are decoded, position all "less than one" probabilities at the end.
For each symbol, scatter the slots by adding a prime number to the last slot index, and then take that sum modulo the table size. If it overlaps a less-than-one entry, repeat this process until an empty slot is found.
Sort the table positions in ascending order.
Assign each slot a baseline and bit usage. Basically, when a value is decoded, the slot corresponding to a "state" value is read from the table. That slot indicates a number of bits to load and a baseline value that it must be added to.

The example given in the specification looks like this:

state order	0	1	2	3	4
state value	1	39	77	84	122
width	32	32	32	16	16
`Number_of_Bits`	5	5	5	4	4
range number	2	4	6	0	1
`Baseline`	32	64	96	0	16
range	32-63	64-95	96-127	0-15	16-31

Note that the baselines start at zero where bit usage is low and wrap around to the high bit usage ones, which are lower-numbered and come first. The size of the baseline range is always 2^Number_Of_Bits, which ensures there is always a slot that will decode to any given state value. Like all ANS variants, these are encoded in the reverse order that they are decoded, so there needs to always be a slot that will decode to a symbol that, combined with the bits, equals a the desired state value.

There are several serial dependencies here:

The low-bit-usage cutoff of each probability depends on the slot usage remaining after the previous symbol.
The table index depends on how many less-than-one probabilities were skipped by preceding symbols (otherwise it is just a multiple of the slot index in the series).
The bit ranges depend on how many slots of the same symbol precede it in numerical order.

The last one has a few implications, there are basically two ways to do it: Run a loop over the post-placement slots and figure out which occurrence of the symbol each entry is, or run a loop over pre-placement slots for each symbol to sort them. Both are seriously problematic due to low utilization. What if we could just assign the baselines to pre-scattered values instead of post-scatter?

This goes against the theory, but if it didn't hurt the compression ratio enough, it might make it work. A clever idea that turned out to be a big mistake. It's actually pretty nice in terms of implementation possibilities:

First, fill an empty table with zeroes and then pack the baseline, bits, and symbol into a value and put it into the table. On hardware with WaveMatch and equivalent ops, it's possible to create a bitfield containing 0 for lanes with zeroes and 1 for lanes with non-zero values, which can be used in tandem with masking and high-bit search ops to quickly find which lane preceding a given lane has a non-zero value. On hardware without it, it's possible to do it in logarithmic time by packing and transforming the values in a way that their desired order is numerically ascending and repeatedly using WaveMax.

The baselines are ascending, but were monotonically ascending, so this basically involved negating the bit usage and coding a value that could be subtracted from the slot index and then multiplied by 2^BitUsage to get the actual baseline.

Once that's all done, loop over that table and multiply each index by the prime distribution value and take that value modulo the table size to get the final position. Great!

So, I tested this out with some sample data from OpenArena. The results were coming out pretty favorably (~15%) ahead of GDEFLATE and changing from post-scatter to pre-scatter was not having a big impact on the compression ratio.

However, FSE is still difficult to work with on GPU: It requires a big table, and that big table has to be indexed from groupshared memory because of how big it is, unless it can be crammed into a register somehow... But there's a lot of data that would have to be crammed into registers!

For 15%, it would be worth it though! Unfortunately that hit a painful realization while working on the decoder: The encoder was dropping the contents of in-compressible blocks. Once that was fixed, the savings were only about 3%.

3% was not great to begin with, but was especially odd because GDEFLATE is not functionally different from Deflate, and Zstandard usually outperforms Deflate by a measurable margin. Upon running tests, there were a good number of blocks where Zstandard was compressing to about 10% larger than GDEFLATE. My best guess was, and still is, that GDEFLATE uses libdeflate as its compressor, and libdeflate is quite good... so, to test this possibility, I needed to make some code that could convert a Deflate block into a Zstandard block!

Now, in order to do that, I need to generate FSE probability tables for the match, literal length, and offset codes, and then rescale them to total a power of 2 as the scheme requires. How do you do that?

Well, normally for an arithmetic or range coder, the cost per occurrence of a symbol is -log(Probability/SumOfProbabilities)/log(2). As probability decreases, the bit usage declines rapidly at first and then slower. Through some math, this can be simplified into smaller calculations, but there's one thing that's weird about FSE: The reduction of bit cost seems linear, doesn't it? Maybe the bit usage is just approximately-logarithmic but not actually? Close enough?

Suppose the successor state for a symbol is uniformly random (and they do behave pretty randomly). Have a look at the table above. If you bump the slot usage for a symbol by 1, the effect is to split one of those 5-bit ranges into two 4-bit ranges, but those 4-bit ranges cover the same range of successor states, so really that fraction of the successor state ranges was reduced by 1 bit in cost. The problem is that all of those 5-bit ranges are the same size, so each time you add 1 slot, it keeps reducing the bits-per-symbol by the same amount, not an amount that's changing according to a logarithm curve. It continues doing this until it hits a power of 2, at which point it has to start splitting 4-bit ranges into 3-bit ranges.

But that means the ideal slot to bump the probability of is the same until it hits a power-of-2 usage, resulting in a table that contains only power-of-2 values except for at most one probability - and all of those powers of 2 have integral bit usage, which is just Huffman.

This seemed very problematic so I had to run a few tests, and the answer was that FSE doesn't have this problem because given uniform successor states, FSE will produce predecessor states that are disproportionately lower values. (This is why the less-than-one distribution works in the first place.) It's somewhat easy to see why: Lower-numbered states take up more numeric range.

Doing baseline distribution before scattering breaks that assumption though, and it makes the predecessor states uniformly distributed.

Some further tests confirmed my suspicions: If you use a properly-scaled probability distribution with cell properties determined before scattering, then the result is worse than Huffman. It's only better than Huffman if the cell properties are determined AFTER scattering.

That means that in order to do this right, it was necessary to sort the entries, which I don't think there is any particularly good algorithm for.

But tests showed it was only a small amount of overhead?

Yeah, but Huffman is only a small amount of overhead too. FSE and arithmetic coding make it possible to get fractional bit usage, but the benefit of that depends a lot on the distribution of values. The fewer bits used by the most-probable symbols, the more likely it is to help, because a fractional bit of accuracy is a bigger portion of the values being encoded.

The silver lining

Ultimately, this realization is pretty lethal to using FSE on GPU, since the remaining options for table construction are quite bad, and the table is a problem in the first place.

The good news is that rANS should still work. In fact, the way GDEFLATE decodes Huffman codes is by using binary search within a vector register, and rANS symbol lookup would work the same way.

So, Gstd isn't dead yet (it will be dead if it turns out that the larger blocks are due to format inefficiencies instead of libdeflate being really good), but it will not be using FSE, and I think the work of getting FSE to work on GPU compute was unfortunately wasted.

Saturday, February 24, 2024

Dare to Dream Again - Bringing Obsidian back to life - Part 4 of 4 (Mirror)

This is part 4 of a 4-part series, mirrored from the Gale Force Games page.

Part 1 is here.

Part 2 is here.

Part 3 is here.

Better than it was before. Better. Stronger. Faster.

In the last installment, I discussed the process of getting Obsidian back to a point where it was, at least, no buggier than the original game. That is still not quite the case, there is one bug that doesn't occur in the original version: Inventory items getting swept across the screen during pan transitions.

Ironically, that bug isn't actually visible on modern machines running the original game because the pan transitions were all set to run at maximum rate, which is as fast as possible. If you don't like the bug, you can always turn off the "Enable short transitions" option in the game options. Today though, we're not here to talk about bugs, we are here to talk about making Obsidian in ScummVM the best Obsidian experience that over Obsidianed.

The first topic though falls into both categories: MIDI

A cacophony of bugs

Obsidian uses MIDI for music, played back through QuickTime. This seems like an odd choice, given that Myst and most other games were using digital music, and QuickTime's tinny MIDI instruments were very limiting, but based on interviews, this may have been driven by Thomas Dolby's love of technology and MIDI in particular. It sounds like a lot of the rationale was driven by a desire to have a dynamic music system, which didn't quite pan out. Dynamic music is used in exactly two places in the full game: During the church puzzle, and after solving the statue puzzle (which cuts out the piano track).

Before we start, we have to talk about what MIDI is. MIDI is an acronym for Musical Instrument Digital Interface, and it's designed to interface MIDI-compatible controllers (such as keyboards or programmable sequencers) to interface with devices that consume MIDI commands such as synthesizers, computers, and drum machines.

It was also commonly used in the early DOS days for various reasons related to being able to decode it using low-CPU-usage algorithms, or offloading it to sound hardware where it didn't use CPU at all, combined with its small size on disk, so ScummVM already had ample support for playing back MIDI - Sort of.

When you request a MIDI device from ScummVM, it might be a software backend like FluidSynth, it might be a physical MIDI playback device, it might be anything, but you only get one. This is a problem because Obsidian often has multiple MIDI files playing at once. The vidbot on/off sound for instance is a MIDI sound, and that plays simultaneously with the music. There's also one even more surprising one which I'll get to in a moment.

This is a problem because how MIDI works is that there are several different "channels" each of which responds to a command, and a MIDI controller sends commands assigned to channels. This allows a MIDI device to play multiple instruments at once by assigning instruments to different channels.

Being a protocol designed for real-time, MIDI commands that play notes also do not have an associated duration, instead there are separate commands for activating and deactivating a playing note. In turn, the file format used for MIDI in Obsidian, known as SMF (Standard MIDI Format), basically encodes commands to send to the MIDI device and what time to send them.

However, this creates a problem: If we stop sending commands from a MIDI file while a note is playing, then we'll never send the command to stop playing the note. When using QuickTime's software synthesizer, this isn't a problem, because it just stops running the software synthesizer for that MIDI source. If we only have one continuously-active MIDI output though, then we can't do that. Fortunately, most notes have finite duration anyway, but sometimes they don't, and for that, Obsidian has a gnarly hack: A MIDI file that sends an "all notes off" command to every channel in response to various commands, but also periodically on a 30-second timer.

In fact, you may notice a moment in the intro where the music pauses dramatically before displaying the title - that pause is at exactly the 30 second mark, presumably timed exactly to prevent that music note-stopper from ruining the theme.

So, the good news is that despite not having a proper multi-source output, the game is still capable of functioning with a single-source output, as long as you're willing to tolerate abrupt cuts in the music every 30 seconds.

There is another problem: MIDI sources have individually-controlled volume, so how do we handle that? Well, the simple way is that when a MIDI note is sent, it is sent with a "velocity" which roughly corresponds to the volume of the note, but really means something like "intensity." It's not exactly a volume scale, because you might get more attack in the sound with higher velocity, for instance, with a higher-quality MIDI renderer. But, it's close enough for what we need, so it was done by intercepting note on and note off commands, modulating the velocity, stuffing the new velocity back into the command, and sending it out.

Mixing things up a bit

Just because it works doesn't really mean it's good though, and the music periodically getting chopped is really annoying. But, like I said, I like digging through file formats and standards almost as much as I hate having free time, so I decided to create the secret weapon to solve this problem: The dynamic MIDI mixer. This fun feature is enabled if you enable the "Improved music mixing" option in the game options (it's on by default).

So, what does this actually do? Well, basically we are going to implement multiple MIDI drivers that funnel into a single MIDI driver in a more intelligent way. First, we have a way of tracking the full state of every single MIDI channel:

We track this for every channel of the output device and also every channel of each MIDI source.

Each MIDI source in the game is assigned to MidiCombinerSource, which is a thing that looks like a MIDI driver to it, but in the dynamic music mixer, is a funnel into the combiner.

Aside from starting and stopping notes though, there are several other types of MIDI messages: There are controllers, which can alter the qualities of a channel in some way, there's the program (which affects what instrument to play), and also two features called "sustain" and "sostenuto." So far, ScummVM doesn't support any games known to use sostenuto, but I wanted to get this right the first time. Sustain and sostenuto are normally controlled by pedals, and if the pedal is triggered while active, then any note playing continues playing after the "note off" command until the pedal is released, so to track an active note, we need to track whether or not it is sustained as well.

Internally, in order for anything to happen, a MIDI source channel has to be dynamically assigned to an available "physical" channel on the output device. The way that works is: All channels are unassigned by default. If a controller change happens, and the channel is unassigned, then it updates the MidiChannelState of the source channel, but otherwise does nothing.

When a note is played on an unassigned channel, the combiner tries to find a channel that is the right channel type (i.e. one channel is typically reserved for percussion), and otherwise stopped playing its last note the longest time ago. If there is still an active source assigned to that channel, then that source channel is unassigned. Then, the new source channel is assigned to that output channel, the channel state of the output channel and the source's channel are compared, and anything that is different is adjusted by sending the necessary MIDI commands to do so, and then the note is played. If a source channel is still actively assigned to an output channel, then control commands are sent to the output immediately.

Because of this, the "all notes off" trigger actually doesn't do anything any more. It updates the MIDI combiner source's channel state, but since nothing in that MIDI file ever plays a note, none of it ever goes to the output driver.

We can also use the MIDI gain control, which is an actual volume control, instead of having to module velocity, and since we know which output channels are being used by a source, we can completely quiet a source by just sending an "all sound off" command (which bypasses sustain) to silence the channel, and then deallocate it.

La la la, I can't hear you!

Well that's unfortunate. Individual sounds in Obsidian can have their own volume level, which is modulated by the global volume level, so setting the volume to the maximum doesn't necessarily mean that the MIDI source is set to the maximum possible volume. As mentioned earlier, the vidbot sounds are MIDI, and by default their volume is set to 50%, but they do seem significantly quieter when using gain control vs. modulating the velocity. What's going on here?

After a bunch of trawling around, it turns out the answer is in the General MIDI Level 2 specification at page 7: "gain in dB = 40 * log₁₀(cc7/127)"

... What does that mean? Unfortunately, I'm actually finding out that I did the fix wrong as I'm writing this, and need to fix it again! "cc7" is the MIDI channel volume value. The volume scale on MIDI sources is a linear scale, meaning a value half as large causes the amplitude to be halved. Decibels (dB) are a logarithmic scale, meaning any 2 values with the same distance apart the same proportional magnitude. One problem with this though is figuring out what we're measuring. Many measurements in electrical and sound engineering are the square or square root of other measurements due to various physics interactions. Decibels are a scale designed around factor-of-10 changes, but whether a 10x change is +10 or +20 depends on what quantity is being measured, due to that problem of quantities being squared.

What we want is the measure that the volume is supposed is be scaling, which is amplitude, which is on the +20 = 10x scale.

So converting normalized MIDI volume (a.k.a. the volume rescaled to a 0-1 scale) to modulation involves squaring it, so we need to figure out how to convert scaled modulation back into a new normalized volume.

... okay, great, so basically all we have to do to compute the new MIDI volume is scale the MIDI source volume to a 0-1 scale, and then multiple the original MIDI volume by the square root of that value. Cool. This means our vidbots playing at 50% volume are now a 0.7071x multiplier instead of 0.5. Unfortunately, the original implementation of the decibel scale wrong and was using a 4th-root, which I guess is better than it being too quiet, but still wrong.

Anyway, it's fixed-er now!

I've got 32 problems and my bit depth isn't one

Most monitors today try to run games in 32-bit 8-bits-per-channel (and 8 bits of waste to help with memory alignment) color mode, or sometimes HDR if the game supports it. Attempting to run Obsidian in 32-bit mode will greet you with this error though:

Obsidian was released at a time when 16-bit color depth (or "thousands of colors" as it was called on Mac) was fairly new, most things ran in 8-bit depth with a color lookup table, so 16-bit was fairly cutting edge.

All of Obsidian's images are 16-bit images though, so we're not really losing anything by running with a 32-bit render target, are we? Actually, we are. The images are 16-bit. The videos are not. The videos are encoded with Cinepak, which in full-color mode can render out to 8 bits per channel, and even if the input images to Cinepak were in 16-bit color, the averaging out that occurs during the encoding process (and from the YUV-to-RGB perceptual transform) have more accuracy per color channel. So, we want to render in 32-bit.

That's accomplished by having an override that just lies to the scripts about what color depth the game is being run at.

A more cinematic experience

When Obsidian came out, most desktop monitors were 640x480, which is a 3:4 aspect ratio. Most displays today are 16:9 widescreen. However, the images and videos in Obsidian are all 640x360, also 16:9. Well that's pretty neat. It would be cool to play the game in widescreen and get rid of the letterboxing on both sides of the monitor, wouldn't it?

Okay, let's just offset the game frame up 60 pixels, cut down the resolution, and lie to the game about the resolution it's running at so it doesn't throw a startup error! That's a great start, but what about the inventory items that display below the frame?

There's an internal system for overriding object behaviors, which can somewhat handle this. It's more complicated than it could be, because with how Obsidian is broken up, sometimes you carry the item into different sections and subsections, which means the elements that display them are duplicated.

Fortunately, the inventory items are color-keyed already even though they're on a black background, so I didn't need to do anything to make them not render a black box outline, but they did run into a problem with the security survey in the maze.

Uh oh, the keycards are overlapping the survey, and I really need to let him know that I eat my ice cream straight from a bowl! Well I guess we could just change the layer order so the form's on top of the cards, right? Actually, no, because the security form image isn't just the security form, it also includes a bit to the left.

Ultimately, this was resolved by detecting this specific situation. When the security form is displayed, the cards are moved off-screen, and moved back when the security form is dismissed. Now I can eat my ice cream in peace.

There was one last problem in widescreen mode:

... the Rocket Science Games logo is too big! Now you might be thinking, "so what, that company hasn't existed for 25 years, who caaaaaares!?" Unfortunately, I care, so we need to fix this! But what can we do? Shrink it? But then we get letterboxing on the sides again, and that doesn't look nice! What we need, which you may have seen if you watch old TV shows re-broadcast on widescreen, is an anamorphic filter.

An anamorphic filter works by stretching out the sides of the image more than the center areas. This was done by computing an exponential curve that has a derivative of 0 at the point where it's supposed to stop (meaning, basically, the rate of pixel coordinate change becomes normal where the curve tops, preventing a noticeable seam), and applying that to the pixel grid. Here's what the filter applied to an 8x8 grid pattern looks like:

And here's the filter applied to the logo video:

Much better!

Forgetting to save is half the adventure

While practically a foreign concept these days, checkpoints weren't always a thing. Like most games of its time, if you wanted to keep your progress, you had to save the game. The game even reminds you to save when you try to quit! Having auto-save would be really nice though, wouldn't it?

The auto-save feature works partly on a timer, like most ScummVM games, but there's an option (enabled by default) to also auto-save at progress points. Most puzzle solutions in Obsidian set some variable, but you aren't always allowed to save, so this creates a bit of a problem: Finding how to detect puzzle solutions, and then finding a safe place to save.

Auto-save detection is done in two ways: One way is by detecting arrival in a specific scene while coming from a specific other scene, which is used to detect things like chapter transitions, but also things like beating the maze without the proper document. The other way is detecting arrival in a specific scene with a puzzle completion variable set differently than what it was the last time the game was loaded or restarted. Normally, the latter category is done by triggering in the scene you would wind up in after the puzzle.

There is a minor omission in this scheme: If you complete a puzzle, save the game, then reload it, the autosave won't trigger... but in that case, they've pretty much saved right there anyway, so who cares?

I'm playing a game, not a menu!

ScummVM shows screenshots of the game where you saved it in its save game UI, but there's one problem: If you save the game from the in-game UI, then you're not looking at the game, you're looking at a menu. So, some hooks had to be added to take a screenshot before transitioning to the menu and using that screenshot for the save instead.

Some light reading

People seem to like subtitles, so why not add them too? Well, they were added, and you can download them from ScummVM's add-ons page.

Subtitles mostly work by detecting when a specific sound asset is activated and popping up the subtitles. There is however an option for a very small number of subtitles:

Without spoiling too much, there's one puzzle that involves sound, but popping up subtitles for the sound at all kind of gives away the answer, and it's one of the neatest puzzles in the game. So, depending on what your rationale is for enabling subtitles, you can keep this option off if you want to have subtitles, and can hear sound, but don't want to spoil the challenge.

Overall, there are a 686 voice lines in the game. Deciding when to split a subtitle up, and where to split the lines, was an ongoing challenge that I lack the expertise to do, but I did the best I could.

In some cases, this involved getting help for figuring out lines I couldn't make out. I'd never actually heard the term "lousing up" before in my life. Also, I added speaker names to the subtitles, but in some cases, we don't really know the names of these characters and just have to guess.

The identity of the characters also clearly has some story implications, but without knowing for sure who they are, that has to be danced around, a problem made more difficult by Obsidian's casting. While some of the characters are professional actors, many of them (especially the vidbots) are Rocket Science employees.

The bureau chief, for instance, is almost certainly Howard Cushnir, who also appears in a celebration video, and I think is also a character in a chapter intro cinematic, but it's not clear if that's because it's the same character, the same person playing multiple characters, or if I'm mistaken and it's not even the same person. Adding to this, the brief appearance in a cinematic is as Max's teacher, but he appears to be the project administrator in the journal. (At least, that seems to be the implication - that his appearance as the chief in the bureau realm is a reflection of his bureaucratic authority status in the real world.) It would be odd for that to be the same person, then, but maybe he's a graduate professor at a research university and it is the same character? There's no way to tell.

Another amusing case was the eye test in the bureau maze. The voiceover there drops to a soft, illegible level, and it's supposed to be a gag that it's unintelligible, so he tells you to go to the hearing test booth (who sends you back for an eye test). The problem is that it is intelligible if you isolate the sound and turn the volume up, so the intent is that it's illegible, but there are actual coherent words in the line. So, should the line show the words, or something like "<Unintelligible>" to keep the gag? I kind of split the difference.

Eventually, it's time to move on

Doing this involved a lot of testing. In many cases, I found things in the logic that looked like they caused bugs, confirmed that they caused bugs, and verified that the same bugs occurred in the original game. That type of bug is much harder to fix, and ultimately only one of them was (a progression blocker that occurs if you save before logging into the journal).

The inventory panning bug is the only bug left that hasn't been fixed, and due to how the widescreen mod works, giving it a proper fix is difficult. In non-widescreen mode, the fix is to only pan the part of the screen with main-scene elements.

In the end though, the two most important lessons I've learned in life are that shipping beats perfection, and motivation is a finite resource. Working on this was always a race against burnout, and eventually it started reaching the point where it was hard to justify any further work vs. moving on to other things. Eventually it's time to say "it's good, and that's good enough," stick a fork in it, and move on to other things.

The future of the mTropolis engine in ScummVM

mTropolis was used to ship a few dozen titles, and a few of them have been on the list as possible additions: Muppet Treasure Island, S.P.Q.R. - The Empire's Darkest Hour, Star Trek: The Game Show, and MindGym. The first one is done and was added in ScummVM in the 2.8.0 release, the next two I have copies of. S.P.Q.R. is the next one on the to-do.

However, as mentioned above, motivation is not a finite resource, and unlike some members of the team, I don't really approach things with nostalgia, or as a historian that thinks it's not their place to judge the quality of things from the past. My goal is to save valuable things from oblivion, and the further down the to-do list I go, the more questionable that "value" gets, in my opinion. So, we'll see what happens, and when it happens, but that's the plan for now.

Jank beyond the dream worlds

It wouldn't be a good rant if I didn't leave you with some tales from other mTropolis games, would it? Muppet Treasure Island has numerous hacks to deal with duplicated aliased compound variables needing to be linked up in a way that I still haven't made any sense of, and music doesn't work in S.P.Q.R. right now because it depends on sending messages to objects in a scene that is never loaded.

Also, I finally figured out what those extra 8 bytes in the catalog header are: Eventually mFactory realized that it was a problem to have separate formats for Mac and Windows, so they decided to make a cross-platform format. Does the cross-platform format work by being in a common format that works on both platforms? Of course not. It exports the Mac and Windows versions into the same file, and de-duplicates the asset data.

That's all for now. See you in the future, somewhere.

Fin

Dare to Dream Again - Bringing Obsidian back to life - Part 3 of 4 (Mirror)

This is part 3 of a 4-part series, mirrored from the Gale Force Games page.

Part 1 is here.

Part 2 is here.

Part 4 is here.

A quick foreword on "un-exporting" projects

The only other engine in ScummVM that really resembles mTropolis in terms of the development style and challenges is Director, and ScummVM's Director engine is able to take advantage of a tool called ProjectorRays that converts exported and protected Director movies back into a form that Director itself can load, including decompiling Lingo bytecode into source code.

Wouldn't something like that for mTropolis be useful too? Unfortunately, no, for multiple reasons:

Running things in mTropolis is buggy. If you change the state of a scene object while running the game, the state can persist into the editor. The documentation even tells you to do this for list variables because there is no way to edit their contents in the editor for some reason. Scenes may be loaded when playing in the editor when they wouldn't be loaded in-game.
Setting up test cases to determine its behavior doesn't usually require using the actual title.
Plug-ins have editor-only files that aren't distributed with titles. For instance, here's the Resource directory of the demo:

The files ending with ".ePP" provide editor functionality, only the ".rPP" and ".cPP" files are distributed with the built title. That means if a title is using a non-standard plug-in, the editor won't be able to use it, because the ".ePP" file for it is missing.
mTropolis has a lot of non-standard plug-ins. Remember what I said earlier about it seeming like mFactory was using a business-to-business model? A lot of plug-ins have shown up in retail titles that seem like pre-release or customized versions of the standard ones. There is very little consistency.
I made a debugger that is way more informative than the mTropolis editor anyway, which we'll get to very soon!

We can rebuild it. We have the technology.

In the last installment, I covered how MTDisasm came into being and ultimately was able to fully dump Obsidian's game data. At that point, making it work again was clearly an achievable goal, it was just going to take a lot of work to do it.

My initial plan was to convert MTDisasm into a library and use that as the loader for a new program called MTEmu (mUlator was a bit too on-the-nose). Unfortunately, while I had done a bunch of OS interface stuff for my Glider PRO port, very little of it except for the resource loader was going to be useful for Obsidian. Even the StuffIt unpack tool, ported from The Unarchiver, wasn't going to be too useful - the installer format used a different algorithm, so I'd have to port that too. I was basically going to be starting from 2D drawing boilerplate, and writing boilerplate sucks.

Trying to add it to ScummVM was somewhat under consideration, and it already had things I needed: QuickTime parsing and decoding, PE-COFF resource parsing (needed to get cursors out of DLLs), MIDI output, and all of the OS boilerplate. My biggest reservation though was the license. I am not a GPL fan, everything I do is MIT/Apache licensed if I can help it because I think reaching more platforms is more important than contributor reciprocation.

I ultimately decided to try doing it as a ScummVM addition anyway for three reasons:

It was already in-demand for being added to ScummVM, and doing it as a separate project was probably going to result in it being brought in anyway with a bunch of duplicated work.
I didn't think my reservations about the licensing were going to matter because I didn't think anyone was actually going to untangle the IP rights mess needed to put it back into print. Surprise!
I figured it was probably technically possible to do in some kind of arms-length way that the mTropolis player code could be easily hoisted out if I wanted to do it later. It was, in fact, coded that way.

Obsidian on console when?

I want to keep the possibility open, but bringing it to console would mean re-implementing the things mentioned above (basically going the MTEmu route) and paying for certification. It wouldn't be cheap or easy, and given the level of interest in it, I think it would be difficult to justify.

I didn't think it would be re-released in the first place though, so who knows? If the people with the rights to it want ink on a dotted line, they know where to find me.

Foundational work and design

ScummVM has a "create_engine" tool now, but at the time the mTropolis work was started, the recommendation was to clone the engine used for acclaimed 1993 Game of the Year Plumbers Don't Wear Ties, so that was used to get a simple foundation, but one immediate problem was that the initial plan of bringing MTDisasm in as a library was going to be a no-no with the project's code standards (which, among other things, prefers using its own integer types instead of stdint.h).

Because of this, the MTDisasm data object loaders were mostly brought in by hand. One thing I knew I wanted to do from the start was separate the loading of object data from instantiation of the objects, partly for cleanliness (though at the cost of duplication), partly to deal with disconnects between how data was stored on disk vs. how it should exist in a loaded object.

One of the first technical challenges was planning for linking up object references, which has 2 problematic aspects: The first problem was that if objects are loaded in some order, then it's possible that an object references another object that hasn't been loaded yet, so object link-up has to be its own phase. The second, and much bigger problem, was mTropolis's system of "aliases."

As briefly mentioned in an earlier part, a modifier in mTropolis may be converted into an "alias" that allows the modifier to be reused in multiple different objects. This was especially important in tandem with "behaviors," which are modifiers containing other modifiers. One thing that might not be apparent from that though is that the behaviors have to be cloned when they are brought into the scene as aliases. If a behavior includes a variable inside of the behavior, for instance, then each instance of that behavior has its own copy of the variable - and other things inside of the behavior reference that variable!

On top of that, if a variable is converted to an alias, then the aliased variable has another special behavior: Changes to any instance of the alias apply to all of them, making it effectively a variable reference.

Aliased variables were actually implemented via an incorrect assumption for quite a while. Initially, they were implemented via a somewhat complicated approach where the aliased variable modifier existed globally, but references to it could be put in multiple parts of the project. It later turned out that aliased variable modifiers are actually distinct objects - If you add them to the scene twice, each one has a different GUID. The engine was eventually changed to handle this correctly, cloning variable modifiers and making them reference a shared storage object instead.

As a legacy of that though, the mTropolis engine's loader has several steps to making an object exist:

The definition of an object is is loaded from disk as a DataObject.
An object corresponding to the DataObject's type (which may be an alias!) is created.
The object is initialized by loading the data in the DataObject.
The object is, at some point, added to the scene.
After some operation that adds objects to the scene, the object is "materialized."

"Materializing" an object does 3 things:

Assigns it a new runtime GUID
Replaces any alias modifiers with clones of the global object referenced by the alias.
Resolves any references inside of the object. In some cases, this means references with static GUIDs are resolved up the scene hierarchy. In other cases, it means that it detects that a reference is to an object that was cloned, and the reference is replaced with a reference to the clone.

Even now, it's not entirely clear how object references are supposed to work though. Muppet Treasure Island, another mTropolis title, had numerous problems with variables having duplicate names but different contents and behaving in a way where the correct behavior must have involved resolving the references to a different one than what the GUID pointed to, but in other cases, the one pointed to by the GUID was the correct one.

Investing in the development experience

Dealing with a complex thing like this is really hard if you can't actively see what's happening. The first line of defense is logging messages. Since mTropolis depends heavily on message-passing for logic, it's really important to be able to see what messages are being sent and where they are going. If you run the game with debug level 3 or higher (e.g. by putting "-d 3" in the command line), then ScummVM will log all message propagation to the console.

This is extremely important, but printing things to the console doesn't provide a lot of information about the state of the scene. To help with a lot of development problems at once, one feature that went in very early was the debug overlay. It was so high-priority that its position in the to-do list was right after being able to reach the first screen.

If you've paid attention while adding Obsidian to ScummVM (or if you go to the options by hitting Ctrl-F5 in the ZOOM/Steam version to exit to the launcher...), you may notice that there's an option marked "Start with debugger."

This launches the game with a debug overlay and a few buttons on the side, and a display that shows you the name of the active scene and active shared scene.

Unfortunately, the step-through debugger part was planned (and much of the internal architecture built around it), but never actually came to fruition because it turned out to not be the right tool for debugging most of the logic bugs that popped up.

Step-through debugging is useful if you want to be able to analyze the state of a program while it's in the middle of running scripts, for example, but most problems with game logic working properly were not due to scripts executing incorrectly, they were due to problems with messaging - sending messages in the wrong order, sending messages that weren't supposed to be sent, not sending messages that were supposed to be sent, and so on.

Debugging message problems on the other hand didn't benefit a lot from having a step-through debugger, it mostly depended on looking at the disassembled scene to figure out how it was supposed to work, comparing that with the message log to figure out it was actually doing instead, and then setting up test scenarios in mTropolis to see what messages actually get sent - and in what order - in a similar situation.

One example of how messaging went wrong is having to figure out the exact point where queued messages were discharged during a scene transition, something that was causing the wrong music to play in the Statue lower level.

The project viewer and inspector, however, were incredibly helpful:

These let you see all kinds of information in real-time: What is loaded, what the values of variables are set to, what the GUIDs of objects in the scene (which allows them to be cross-referenced with MTDisasm output), and any other information that's been exposed for that object.

There are also some toast notifications that pop up at the bottom:

Warnings are colored yellow and errors are colored red. You won't see many warnings these days, but the main source of warnings is that every modifier and element in the ScummVM mTropolis engine has a "support level," which is either unimplemented, partially-finished, or finished.

Entering a scene with a partially-finished or unimplemented element or modifier results in a warning notification. This is to make it clear that a scene with things in an unfinished state has been entered, and if something isn't working correctly, one of those unfinished things is likely to be the culprit!

(You may be wondering why there's a warning about text labels in a scene with no text labels. That's because Obsidian has its own debug overlay text label, but in the retail version, it's moved off-screen.)

Error popups mostly occur due to Miniscript errors. If you play Obsidian, you'll notice several of these. That's not actually a problem with ScummVM - the game has some scripts that do invalid things, and in all of the cases that I'm aware of, I've confirmed that the proper behavior is in fact to throw an error and stop the script.

Making this also ran into a bit of a problem with ScummVM's internal architecture, it has its own GUI system, which you can see in the in-game settings and launcher, and it's not a great GUI architecture but it's not really bad for what it does either, but it does have one problem: It assumes exclusive control until you leave it, which means it is completely unusable while the game is running. Dealing with that required rolling a new small UI kit to make the windows, scroll bars, hierarchy tree, etc. that you see in the debug overlay.

Having that UI kit also required everything else to be aware of it. For example, if the game is supposed to change your mouse cursor, it actually doesn't, it changes the cursor assignment of the main game "window" so that the mouse is visible, and not the game cursor, when it's over one of the debug overlays. Same goes for detecting mouse movement if the mouse isn't in the game window.

Of course, the big irony of all of this is that these debugging capabilities are considerably more advanced and informative than what mTropolis gives you, so I actually had way more information available to me than the developers of the game did!

I know your deepest, darkest secrets

One fun thing about being able to see everything in the game data is being able to see things that were either obscure or in one case, really not meant to be found at all, and find all kinds of new information about the game that was either not well-known or not known at all.

Realm names

The internal name of the bureau realm is "Labyrinth" and the name of the spider realm is "Abraxas."

Space bar skip

It turns out pressing the space bar skips most cinematics, except for important story cinematics, letting you get through the game much faster. Huh. Doing this causes some bugs sometimes though, like preventing the music from stopping when you beat the Bureau chapter.

I heard you like FMV games so I put an FMV game in your FMV game

You may have noticed that there is a help booth in the first part of the Bureau named "Sources" that shows a falling book, referencing the Myst intro, and if you click the screen, you get a crazed man telling you to bring him "the blue pages," another Myst reference. What you may not know is that all of the booths are reachable from the phone puzzle, and if you call the Sources booth, the phone is answered by Henry Stauf from The 7th Guest.

Speaking of the phone puzzle, I've looked at the code for it, and I still don't understand it. Each of the dials actually represents a positive or negative coordinate, and a coordinate is only valid if either all three are negative or positive (determined by which half of the slider it's on), but that means there are actually 2 valid coordinates for any point you want to reach.

Fireflies in the Bismuth junkyard

There are supposed to be fireflies in the Bismuth junkyard, but you probably won't see them on a normal playthrough because when you first land in it, you're actually in a duplicate of it that belongs to the previous chapter, and the duplicate doesn't have the fireflies. Normally, leaving that landing scene triggers a disk change, which puts you in the actual section. Try going back down and looking around!

The dirtiest dirty secret of them all

Not only was this one completely unknown previously, it was actually discovered by accident due to the unfinished modifier toast popup mentioned earlier. Viewing the zoetrope popped up a warning about an unimplemented "Path Motion Modifier," which was a bit strange because the wobbling frame was working just fine and nothing seemed to be unusual, let alone having problems because of a missing modifier that basically existed to move sprites around.

Looking at the scene in the debugger showed some interesting things.

The path motion modifier responds to a message named B206_Start_Drop. Drop what though? Further down there is a "Click Behavior" but it's actually not a click behavior, it has 5 key detection triggers and a counter that fires when all 5 are reached.

After looking through the logic, it turns out that if you type "Voxel" after beating the puzzle, then a head appears and the animated bird drops a turd on it.

To make matters worse, the path motion modifier is not used for anything else in the game except for this, which meant I had to implement its behavior just for this bird poop Easter egg that nobody even knew about.

You thought nobody would ever find out...

Spaghetti no-code

No game can survive without reusable systems though, but making systems out of objects in a scene hierarchy is particularly weird.

Much of Obsidian's navigation system is handled by a list of "nodes," and you're usually on one of those nodes. The node lits is usually supplied by compound variable that has to be named "cSR" in the subsection.

This is a fairly common recurring pattern: Behaviors are placed in the hierarchy alongside variables that the behaviors have to consume.

A pretty gnarly piece of "what were they thinking?" here though is how those values get in the list in the first place. The editor lets you set the initial value of variables, so I bet you're thinking that you can edit lists in the editor right? Well you would be wrong, the actual way is that you have a script set the values that you want, run your project, and then when you go back into the editor, the values that you set while running it have persisted into the value. Intentionally leaking play state back into editor-persistent state as intended design? Whee!

Some other mTropolis users would later come up with their own patterns, like Muppet Treasure Island populating lists by broadcasting a message to part of the scene hierarchy and having all of the responders ping back. Why this was better than just giving the scripting language a "for" loop, I have no idea.

You're probably wondering then, if coding was so cumbersome in this, how did the library terminals work? Well, for one thing, the terminals are done via a combination of several different parts that handle specific functions, from text output to actually updating things. It's basically a finite state machine, so most of the logic behavior simply handles all possible states.

The game also includes a custom plug-in called "RSGKit" that includes a few modifiers used to do more complex tasks, like string manipulation (which Miniscript has no built-in support for) and creating WordMixer answers.

Speaking of WordMixer, the dictionary data is also completely stored in the RSGKit plug-in. The ScummVM mTropolis engine has its own internal plug-in system to implement these modifiers, including parsing out the dictionary data from hard-coded offsets in the DLLs, and the dictionaries are used for both WordMixer and the filing cabinets. Naturally, getting the word list exactly correct (including short words) is mandatory, since some important files are referenced by number in the scripts.

Trouble is on the menu

Sometimes, the way the game and its logic system work create problems that clash with how ScummVM is supposed to work in ways that are really hard to do anything about. A big one is the menus.

If you go to the game options in ScummVM and change the sound and music volume, you might notice that it... doesn't really work.

That's because the game logic manages the sound levels, and it kind of has to because there is no global volume, every sound emitter has its own volume and the game logic has to update them itself.

You might be thinking "well so what, just scale the volume of sounds with the SFX level and the MIDI music with the music level!" That's easier said than done. The volume levels are stored in the save files (by the game logic), and syncing them up with ScummVM's settings requires manually hooking into the game logic. Additionally, the sound volume system used in Obsidian was designed to evaluate volume levels when entering a scene - not while you're already in the scene! But one quirk you may not be expecting is that MIDI was used for some non-music sounds, like the keycard sound in the maze, and (surprise!) the vidbot screen on/off sound. (Remember that last one for the next installment!)

Similar problems when trying to load from the in-game menu:

Simply put, the game expects loads to happen at a certain time (i.e. only from the menu), and follow through on them a certain way. Saving a game in mTropols is not done by the engine tracking everything that needs to be saved, it's done by the game logic, which has its own ideas about what needs to be saved and loaded.

Fortunately, I was able to implement saving from the menus for Obsidian specifically, since whether or not you can press the Esc button to go to the menu is controlled by a single boolean variable, and you can save from anywhere that the menu is available.

Even supporting a menu is more complicated than it may seem. The game has to be able to support transitioning out of and then back into just about any scene, which is a problem if there is game state from puzzles persisting through scenes. That's probably why the piazza puzzle has a blocker that prevents you from saving anywhere in the main puzzle area, and there is at least one bug where reloading the game in a particular scene skips the puzzle (which is not a ScummVM bug - it was present in the retail version).

Room to grow

I think this has covered most of the process involved in getting the ScummVM mTropolis engine into a working state. The next and final installment of this series will cover how it went beyond the original, with better MIDI support, widescreen mode, subtitles, improved color depth, and a tiny little quirk to get the save screenshots right! It'll also have some final thoughts about wrapping this project up, and the future of the mTropolis engine in ScummVM.

Dare to Dream Again - Bringing Obsidian back to life - Part 2 of 4 (Mirror)

This is part 2 of a 4-part series, mirrored from the Gale Force Games page.

Part 1 is here.

Part 3 is here.

Part 4 is here.

Everyone needs some structure

In the last part, I went over the basics of analyzing values in an exported project, but this doesn't explain how to get actual data out of the scene. To skip ahead a bit, one problem is that objects in mTropolis aren't the same size. Without any knowledge of the structure, we aren't really even sure how to determine where objects start and end, and that can be a bit difficult.

A simple way of dealing with variable-sized objects is just make multiple identical ones and see how far apart they wind up in the data. It also isn't entirely clear yet how the stream catalog even associates with scenes in the project tree.

Let's export it and use MTDisasm's "bin" mode to dump the streams, then open up stream-3-1.scene for the Mac and Win versions in XVI32 again.

A bunch of useful things to note here. We can see that the element ends with a name and a null byte, and by noting the spacing, we can guess where the first one starts, and determine that they're 61 bytes in size. 61 is 3D in hexadecimal, and we can see the size marked in the data as well. The scene stream also seems to start with a scene definition of some sort too, maybe?

Also of interest, there is a 32-bit value that is 8 only in the last one - This turns out to be a flags field, and the 8 is a bit flag indicating that the element is the last one in the list of elements for that level of the tree.

(Fun fact: One of the flag bits is for if the object is expanded in the structure window, which is useless outside of the editor.)

The start of the element also provides some useful clues, which led to figuring out the format after investigating different types of objects: Most (but not all!) object types start with a 32-bit object type ID, a 16-bit revision number, a 32-bit field of unknown usage, and a 32-bit size that includes the preceding values.

Unfortunately, while the first 2 values are common across all objects, the next 2 are not always followed. That turned out to be a problem when a fallback handler for unknown object types was sometimes flying off outside of the data due to the size field not actually being there. Plug-ins (which we'll get to later) are particularly weird, with the size value being garbled, but still decodable, only on Windows.

That's no scene

There is actually an unusual quirk here: That piece at the start of the file is not the scene definition. One way that you can tell that it isn't is that if you add modifiers to the scene, they don't show up in the scene stream, they show up in... the boot stream?

Well, now we get to one of the weirder aspects of mTropolis's architecture, a quirk that actually causes an annoying bug in the Windows version of Obsidian. The object at the start is some kind of stream header, of unknown purpose. The definition of the actual scene object is not stored in the scene stream, it's stored in the boot stream and it always exists, but the objects inside of them are loaded on-demand.

Not only that, but scenes aren't really even scenes. Projects have their own type, as do sections and subsections, but scenes are just graphic elements... or at least they are most of the time. If you link them to an image or QuickTime movie, they change type into that kind of element, so scenes are actually the same type of object as a scene element.

So what is the bug in Obsidian that this causes? It's the bug where if you trigger the rebellion music in the first chapter, and then go the wrong way, the music keeps playing forever, even if you restart the game, because the MIDI modifier that plays the music is directly under the scene in the project hierarchy, preventing it from ever being unloaded. The Mac version fixes this issue by... deleting that music track.

An informative experience

The strategy for MTDisasm for the most part has been to create structures for all of the data types, byte arrays by default if all of the values seem to be the same, and larger values if they have visible byte swaps. Fields with unknown use are named "Unknown" with a number after it, and for convenience, if an unknown field turns out to be multiple values, it is not renumbered, but rather split so other values maintain their number. For example, if a field named unknown3 turns out to be 2 values, it is split into unknown3_1 and unknown3_2.

Here's what our test scene looks like cranked through MTDisasm's text mode today:

An inconsistent experience

Let's take one of the more minor examples of how this can go wrong by creating a floating point variable modifier and setting its initial value to pi.

So far so good, export the files, dump the streams with MTDisasm bin mode and...

Uh oh, something is very wrong here! The Windows version has a value in it that XVI32 is telling us looks like a double-precision float encoding pi at the end, but the Mac version is 2 bytes longer and looks pretty different. What is going on?

Let's think like a Mac programmer here: Motorola 68000 CPUs didn't originally have floating point support, and when they did, it was added via the MC68881 FPU add-on, later incorporated into the 68040 CPU. That FPU preferred float data in an 80-bit format, with a 16-bit exponent and 63-bit mantissa with a required 1 bit for finite numbers above the mantissa.

That format has a few advantages, such as being relatively efficient to work with on CPUs that only support integer arithmetic, but the specifics of floating point behavior are well-documented in other places, and the important thing here is that the Mac and Windows versions are using completely different number formats for floats.

This behavior exists in some other places too: Color values on Mac are 6 bytes per color (due to Macs having 16-bit precision per channel), but 4 byte BGRA format on Windows. The auxiliary data in QuickTime elements is particularly different, with 14 extra bytes in the header on Mac and 12 extra bytes in the footer on Windows.

Interactivity: A machine built out of mail

Now that we've covered much of how the data is structured, it's time to talk about what is either the best or worst feature of mTropolis: Object behavior and communication.

Almost everything that makes objects interesting in mTropolis is accomplished via modifiers and messages. Modifiers are a special type of object that can be attached to scene elements (and, in a few specific cases, other modifiers), making them do things. Importantly, many modifiers do not really modify the object, but provide other functionality, and simply exist as modifiers. Variables, for instance, are modifiers. Saving and loading games is a modifier. Scripts are modifiers.

Also central to this is one of mTropolis's most important features: Behaviors, which are containers for other modifiers. Behaviors can be switchable, allowing them to be turned on and off, potentially turning off all of the modifiers inside of the behavior, and they can also be aliased, allowing you to save a behavior into the alias palette and reuse it across multiple objects.

Inter-object and inter-modifier communication is done via messages. Most modifiers trigger when they respond to a message, and the message is either a built-in message type (such as a "Mouse Over" message firing when the mouse moves over an element) or a custom message type created by the author. Messages can usually be sent with a value as well, potentially affecting how the recipient responds.

Recreating mTropolis's behavior requires extensively testing what messages fire when, and in what order. Sometimes the order is complicated. For instance, message senders usually have an "immediate" flag that determines whether the message sends immediately (before the message currently being propagated continues propagating), or is added to a message queue. When the message queue is processed was behind a bug where the Statue area was playing the wrong music because it turns out that if a non-immediate message is sent, and then something causes a scene transition, the queued-up message has to be sent after the scene transition.

Weird code in the land of no code

It turns out that having a spaghetti of message senders and recipients isn't really enough to express your game logic efficiently all of the time, so mTropolis does supply a scripting system in the form of the Miniscript modifier.

Miniscript is infamous for being hobbled by intentionally not having any loop constructs to encourage developers to use things other than code instead. If you want a loop, you either have to queue up a message of some kind to re-trigger the modifier, and possibly chain things together. That's really only the beginning of its problems though, it's full of weird cases where the semantics aren't really clear at all. For example, as mentioned earlier, variables are modifiers, but if you use a variable from a script, it may treat it as referencing the value of the variable or referencing the variable modifier itself, depending on what you're doing. What triggers the implicit conversion of a variable to the value it contains? Who knows, just have to endure the bugs and find out.

One case involves a bunch of things being miscompiled in Obsidian due to a design fault that is invisible, and impossible to fix without trashing the project and starting over. For example, let's look at the Miniscript modifier with GUID 0034164d, named "Init on PE". Here's what MTDisasm currently tells us that it says when decompiled:

if element.width <> 640 then
    set element.width to 640
end if
if element.height <> 360 then
    set element.height to 360
end if
if element.visible <> true then
    set element.visible to true
end if
if element.position <> (0, 0) then
    set element.position to (0, 0)
end if
if element.direct <> true then
    set element.direct to true
end if
if element.paused <> true then
    set element.paused to true
end if
if label:(4:3ae) <> true then
    set element.loop to true
end if

... What?

What is going on with this last one? What is a label?

Well, let's say you write this code in Miniscript:

if loop <> true then
set element.loop to false
end if

What have you actually written here? When you save a Miniscript modifier, if there is an identifier like "loop" in this case, then it first tries to resolve it to a global value. If that fails, then it has to look it up, which internally is done using a subscript (i.e. by looking up a value "contained in" something else). At first I thought the thing it was subscripting something that meant, basically, "go look for it." Nope. The default thing to subscript is the current element, but if you subscript an element by name, it will still look further up in the scene hierarchy for that object.

The exact search order had to be sussed out through via repeated tests, cause you really don't want to be looking up the wrong thing, do you?

MTDisasm will add some additional annotations to note how a value is being resolved. But what is a label? Well, in this case, mTropolis supports a feature where you can embed time markers in AIFF sound files, and then allow those time markers to trigger a media cue modifier, among other things, but those time markers are named, and those names go into the global label table. In this case, someone imported a sound file with a time marker named "loop."

There is no way to view the global label table, there is no way to delete a label after it's been imported, and the labels permanently become names that identifiers in scripts resolve to.

So, some poor sap is now comparing the value of a sound marker to "true" instead of the element's "loop" property like he would have been if that sound file was never imported.

Another thing that must be handled properly when running Miniscript is errors. Many errors will terminate a script execution, while some things (like sending a message to an object that doesn't exist) will not, and games have code that doesn't work correctly if that behavior isn't mirrored correctly.

Garbage in, un-garbage out

How are we even getting these scripts back though? Do the Miniscript modifiers just have the program source in them? Well, in very early versions of mTropolis, yes. In every shipped title I've seen, no, the export process removes the Miniscript code and only leaves in a compiled version.

Let's take a look at the Miniscript object that we just mentioned in Obsidian.

After isolating the values, we see what we're looking at. Quick note: The reason for the "program header" is that there are actually two modifiers that use Miniscript programs: The Miniscript modifier, and the "If Messenger Modifier" which uses a Miniscript program to evaluate a condition.

In this case, there are no references. A reference is just a name and an object GUID. There is a field indicating the size of the instruction data, which looks like a blob of... stuff. It's not aligned, so the instructions aren't fixed-size. Let's look a bit closer.

If we look carefully, there is a bit of a pattern here. Each instruction has a size 4 bytes in, which makes it possible to isolate individual instructions. Convenient!

The first guess of how this works, which turned out to be correct, is that this is a stack machine, one of the easiest ways to write a scripting system. The way a stack machine works, for the unfamiliar, is that instructions do some combination of removing ("popping") values from the end of a list and appending ("pushing") new values to the end of the list ("stack"). For example, if you had to evaluate the expression:

((A*B)+(C*D))

Then the stack machine instructions would look something like:

push A
push B
multiply
push C
push D
multiply
add

After a lot of messing around, this is what MTDisasm outputs for the instructions of that first condition block:

0: push_global default
1: get_child 00000000
2: push_value double 640
3: cmp_neq
4: jump conditional unknown=62 skip=5
5: push_global,args=1 default
6: get_child,args=1 00000000
7: push_value double 640
8: set

Okay so what are these? "default" in the disassembly really means "element" but what we are doing here is pushing the value of "element" and then getting the attribute with index 0 from the attribute list (which is "width"), then pushing the value 640.0, doing a comparison (which pops the last 2 values from the stack and pushes true or false on to the stack. Then it does a conditional jump based on the value, and if it's false, skips to 5 instructions ahead. If that skip didn't happen, it pushes the element (with a write-intention flag in the arguments!), gets the child (also with a write-intention flag), pushes the value 640.0, and then runs a set operation that applies the value.

One irony here is that it would have been extremely easy to add support for loops into this by just allowing negative offsets in the jump instruction, the developers simply chose not to because... reasons?

This is kind of hard to read though, so what can we do? We decompile it!

Stacking boxes

Decompilation for something like this normally involves 2 steps: Converting stack machine instructions back into expression trees, and converting the control flow graph back into structured control flow.

The first one is straightforward, since Miniscript never has branches in the middle of an expression. Remember that pseudocode for the stack machine I had earlier? Well, if you just keep track of each input to something that consumes values from the stack, you can just regurgitate them into expression text. The only thing complicated there is some precedence analysis, which is done to avoid having to put parentheses around everything... It also doesn't quite work right, if I remember, but it works enough that I can tell what's going on.

Control flow analysis, which involves converting jumps back into if/else/end if blocks, is the harder part. Let's say we right a script that roughly looks like this:

if A then
do something A
else
do something B
end if
do something C

This will compile into stack machine code that looks something like this:

0: push A
1: jump conditional to instruction 4
2: Do something A
3: jump to instruction 5
4: Do something B
5: Do something C

These "Do something ..." blocks can be any number of instructions. In order to decompile this, we need to convert it into a control flow graph. Control flow graphs in modern compilers can be quite complicated, especially if they form loops, but since Miniscript doesn't have loops, we're left with what's called a directed acyclic graph, meaning no path through the graph ever flows through the same node more than once.

To do this, we split sets of instructions out into blocks. A block starts with any instruction that is the target of a jump, and ends with either a jump instruction or where another block starts.

Note that there is an "end of program" block. This block is always added to the end and contains no instructions, but it's useful for the decompile algorithm.

So how do we work with this?

First, for each one of these nodes, we have to determine several pieces of information about it:

Immediate predecessors (which nodes directly flow into the node)
Immediate successors (which nodes it can directly flow into)
Post-dominators (which nodes must be flowed through after the node)

I'm sure there are better ways to do this, but the way MTDisasm does this is by creating "islands" that each correspond to an emittable block of code. All islands start with a block of code and end in a "sink" which exits the island. Initially, there is one island that starts with the first block and sinks into the "end of program" block. It then repeatedly processes islands until it's done, and each time an island is processed, it can produce additional islands that also must be processed.

Processing an island is done as follows:

First, check if there is more than one post-dominator of the starting block. If there is, then we need to find one that terminates the island. Post-dominators are always reached in a fixed sequence, so we can find it by just following the first successor until we hit the first post-dominator. The island is then split into 2 islands, one that sinks into the post-dominator, and one that starts at the post-dominator (with the original sink).

Then, the successor list of the block is checked. For each successor, it checks if there is an island starting at that successor already. Doing that handles "if" statements that have no "else" block and go to a successor of the "else" block when the condition is false, for instance. If there is no island starting at that successor, a new island is created, starting with that successor's first block and ending in the original island's sink. Since we know the island is now post-dominated by its sink, we know that is the end of the control flow path of the new island.

This process is repeated until there are no islands left.

After that, we convert the islands into code. Starting with the first island, the island's instructions are converted back into an expression tree and emitted. Then, if the block ends in a condition, an "if" statement with the condition expression is emitted, followed by the "true" island, and an "else" followed by the "false" island if present, and then an "end if", followed by the sink island (which can chain into more conditions and sink island).

That's it! We can now see the code!

The last format standing

Most of the other data can be figured out using this test-case analysis. Create a bunch of the same thing, change one thing, see what changes. Some things are formatted interestingly, like a lot of things have a "rate" parameter which is actually exported as a large value divided by the rate, turning it into a delay instead.

Most of the asset formats are fairly straightforward too. Images are flipped and have different channel order on Windows, but otherwise are uncompressed. Audio is uncompressed. Sometimes things are compressed, but are compressed with a QuickTime codec and have the codec ID included, and nearly all QuickTime codecs have been reverse-engineered already.

What hasn't been reverse-engineered is that "mFactory Animation" codec. mTropolis has an animation format called "mToon" designed for possibly-moving image sequences. The mTropolis demo doesn't allow saving projects, but it does allow creating and saving mToon assets. I guess it would have to, otherwise you couldn't try it out, and it's not like you can do much with an mToon file if you can't save your project, right? (... Right? ... Yeah.)

In the state of the art of the mid-90's, there were basically three ways of compressing animation in a way that could feasibly be played back in real time: Vector quantization, run-length encoding (RLE), or other simple lossless schemes like LZ77. Vector quantization involves taking chunks of the image and converting them into a smaller set of similar chunks (the codebook), and replacing parts of the image with a lookup into the codebook. It's similar to creating a palette for an image (do people even know what that means these days?), except it creates lookups for blocks of pixels instead of single pixels. Run-length encoding just replaces series of identical pixels with a code indicating how long the series is.

How do we poke at this to start guessing what it does? Well, we can pop open Color It!, one of the many programs of the era designed to be an affordable Photoshop replacement, and try doing some things. Let's make an arrow, and give it a weird size like 12x13 so we can find the size easily, then make 2 identical frames so we can determine the frame stride and hopefully match that up with the size somewhere.

Okay, looks like we've got it. Let's set the compression to "None" and export our project and crack it open!

Important thing to note here: The segment asset data appears after all of the scene objects, so the way mTropolis handles some things here is by storing the metadata with the scene object, but the frame data is in the asset data area.

We have a problem though. We told it to be uncompressed, which means we should be seeing something that looks like an arrow shape in the hex data, but we don't. It compressed it anyway. The good news is, we can see from the codec ID and frame headers that this looks like an RLE codec, which is really good news because RLE codecs are way easier to figure out than VQ codecs.

There are a lot of ways to write an RLE codec. You can set a minimum run length to increase the numeric range of runs by adding it to the run count, you can expect that runs and non-run byte sequences alternate, you can let runs go past the end of a row, and so on, but it's very easy to correlate because you can just make test images with runs of given lengths and see how they're encoded.

This did turn out to be slightly more complicated: Most animations have "temporal compression" which allows for frames where runs of pixels are skipped and reused from the previous frame, and later it turned out that there was a special code for vertical displacement, which skips many rows of pixels at once. The 16-bit version is just the 8-bit version, but uses 16-bit values!

Internally, this was handled in a somewhat sneaky way too: ScummVM's mTropolis engine converts the RLE-16 format to RLE-32 so it can decode it on the fly and blast pixels straight into a 32-bit frame buffer... but that's for another time, because...

It looks like we've hit a turning point?

With all of this information, MTDisasm reached a point where it could dump all of Obsidian's data in readable, viewable, and listenable formats. I finally had something resembling the "source code" to the game. That's still a long way from actually running it, but it meant that there was no more information about how it's supposed to work that was out-of-reach.

Upon reaching this point, I still wasn't on the ScummVM team or really looking to join it. I had just come off of porting Glider PRO, which involved re-implementing tons of MacOS stuff by hand, so I really was not shy about spending excessive amounts of my free on this type of thing and doing it myself.

That story is for the next installment though. Next up, I'll take you down the path of turning a mTropolis player re-implementation into reality.

The Code Deposit

Saturday, June 8, 2024

A postmortem on failing to adapt FSE to GPU compute

But tests showed it was only a small amount of overhead?

The silver lining

Saturday, February 24, 2024

Dare to Dream Again - Bringing Obsidian back to life - Part 4 of 4 (Mirror)

Better than it was before. Better. Stronger. Faster.

A cacophony of bugs

Mixing things up a bit

La la la, I can't hear you!

I've got 32 problems and my bit depth isn't one

A more cinematic experience

Forgetting to save is half the adventure

I'm playing a game, not a menu!

Some light reading

Eventually, it's time to move on

The future of the mTropolis engine in ScummVM

Jank beyond the dream worlds

~~Fin~~

Dare to Dream Again - Bringing Obsidian back to life - Part 3 of 4 (Mirror)

A quick foreword on "un-exporting" projects

We can rebuild it. We have the technology.

Obsidian on console when?

Foundational work and design

Investing in the development experience

I know your deepest, darkest secrets

Realm names

Space bar skip

I heard you like FMV games so I put an FMV game in your FMV game

Fireflies in the Bismuth junkyard

The dirtiest dirty secret of them all

Spaghetti no-code

Trouble is on the menu

Room to grow

Dare to Dream Again - Bringing Obsidian back to life - Part 2 of 4 (Mirror)

Everyone needs some structure

That's no scene

An informative experience

An inconsistent experience

Interactivity: A machine built out of mail

Weird code in the land of no code

... What?

Garbage in, un-garbage out

Stacking boxes

The last format standing

It looks like we've hit a turning point?

Followers

Blog Archive

About Me

Fin