Wednesday, October 12, 2005

The Surprising Endurance of Self-Modifying Code

I recently had the pleasure of meeting a retrocomputing enthusiast, who has asked me to respect his anonymity. For personal amusement, I'm going to call him "Phil." Phil has done some work on software systems that run old video games -- Ms. PacMan, Asteroids, that sort of thing. While these software systems are often called "emulators", most of the good ones are a fair bit more sophisticated than the simple pseudocode that comes to mind when the term "emulator" is thrown around:
for (;;) {
unsigned char opcode = memory[CPU->programConter];
switch(opcode) {
Many of these systems are binary translators of one form or another. They take your Ms. PacMan ROM image, and produce a translation of Ms. PacMan retargeted to run natively on your machine. Phil had designed and built such a system, and we talked about its internals at some length. After a while, it dawned on me that his system would not be able to handle self-modifying code, and I became convinced I was missing something. Surely, if you're running these incredibly hairy machine-language programs that rely on such intimate machine details as the exact cycle counts of individual instructions, you'd run into lots of self-modifying code. If any emulator in the world has to get self-modifying code right, it would be an emulator for old video games, right? Right??!?

But no, Phil confirmed that the system was completely broken in dealing with self-modifying code. Yet, his system had no problem running all sorts of old games.

Why? In the lore of systems, self-modifying code is exactly the sort of bizaare space- and time-optimization that only makes sense in the semi-mythologically constrained environments of old computers. These games were extremely performance-critical, written in assembly language, under harrowing space constraints, often on 8-bit computers with a single general-purpose register. Yet they apparently didn't mutilate their program text by even a single byte.

After watching me squirm for a while, Phil let me off the hook. These are console games; since these systems would only ever run a single game, the code lived in ROM. It would have been prohibitively expensive to provide enough RAM to copy the code out of ROM, so self-modifying code was, ironically, a luxury unavailable to many old-time assembly language video game hackers, the very group with which most people associate self-modifying code.

Today, most developers regard self-modifying code as an occasionally interesting, but thankfully obsolete curiosity. After all, very little significant software is written in assembly anymore; even when it is, space-constraints are rarely what they used to be, and the performance argument would now go against self-modifying code, since it interferes with the instruction cache and pipeline on modern processors. Yet, if you peak under the hood of your running PC, today, in the year 2005, you'll find gobs of self-modifying code looking up at you. From dynamic linkers to JVM/CLRs, to various system instrumentation frameworks, to debuggers, to profilers, on and on ad infinitum, there's a whole heck of a lot of code getting rewritten in dribs and drabs on a modern system. So, whole-system monitors, like the one I work on, need to deal with self-modifying code correctly. In fact, code modification is so prevalent now that monitor engineers must worry not only about its correctness when running in a VM, but also its performance!

So, the next time you're bored around the coffee machine, bend some of your colleagues' minds by asking them which system is running more self-modifying code: a Z80 running Ms. PacMan, or their Windows XP laptop. As a rule of thumb, the more modern the system, the more self-modifying code you'll find.


Post a Comment

<< Home