Know Your Roots: Optimizing for the 286 and 386

I just read this old Mike Abrash article (warning, 3.5M PDF) from Byte magazine about optimizing for the 286 and 386. If you don’t know Mike Abrash’s name, you almost certainly know his work.

This is fascinating stuff.

Probably the most amazing thing is how much instruction fetch dominates 286/386 performance. Abrash frequently counts instruction bytes and includes this in his calculations for expected execution speed. After a bit of Googling, this made more sense: these were the days before instruction cache.

(Although it’s still possible, you rarely see fetch-limited code on something like a K8.)

So, if you think x86 is an ugly instruction set, consider this heritage. Variable instruction length actually made a lot of sense back then. It was a form of compression. For the same reason, Intel added complex instructions like BOUNDS and REP MOVSW. These compactly express a whole bunch of work.

I guess some things never change. I still do the same kind of measurements that Abrash did all those years ago. There are small differences — he calls a timer routine, where I can simply execute rdtsc — but the method is the same. I find this remarkable, considering how different the machines are.

(via Osterman’s blog)


3 Responses to “Know Your Roots: Optimizing for the 286 and 386”

  1. 1 veridicus February 7, 2007 at 9:08 am

    Granted, x86 is ugly due to legacy. But it’s time something completely new and much better is put together for general computing. It’s getting up-front support from MS and/or Apple that would be the dealbreaker, which is the only reason we’re still on x86.

  2. 2 Mark February 7, 2007 at 9:28 am

    I think there are actually precious few good reasons to switch away from x86.

    I certainly don’t think Microsoft and Apple are stopping us.

    Microsoft has, at least once, ported their entire OS to a new ISA (Alpha). And Apple is the *king* of this kind of thing. They’ve done it twice, most recently *to* x86 (away from the “newer” and “better put together” PPC).

    I’d be interested to hear your arguments in favor of a switch.

  3. 3 Larry Osterman February 7, 2007 at 3:33 pm

    Back in the day, we had a simple rule: Except for multiply and divide (which cost hundreds of clocks), the performance of a piece of code is directly proportional to the number of bytes it occupies.

    That was it. This metric worked all the way up until basically the 486 series of processors (when the prefetch algorithms got smart enough to defeat that simple metric.

    The reason was that memory cost overwhelmed processor cost on those old machines.

Comments are currently closed.

%d bloggers like this: