2.9. A Word About Optimization

Earlier in this session, I promised to give you some hints as to how we can improve the quality of the generated code. As I said, the production of tight code is not the main purpose of this series of articles. But you need to at least know that we aren't just wasting our time here … that we can indeed modify the parser further to make it produce better code, without throwing away everything we've done to date. As usual, it turns out that some optimization is not that difficult to do … it simply takes some extra code in the parser.

There are two basic approaches we can take:

There is one more type of optimization worth mentioning, that seems to promise pretty tight code without too much hassle. It's my “invention” in the sense that I haven't seen it suggested in print anywhere, though I have no illusions that it's original with me.

This is to avoid such a heavy use of the stack, by making better use of the CPU registers. Remember back when we were doing only addition and subtraction, that we used registers D0 and D1, rather than the stack? It worked, because with only those two operations, the “stack” never needs more than two entries.

Well, the 68000 has eight data registers. Why not use them as a privately managed stack? The key is to recognize that, at any point in its processing, the parser knows how many items are on the stack, so it can indeed manage it properly. We can define a private “stack pointer” that keeps track of which stack level we're at, and addresses the corresponding register. Procedure Factor, for example, would not cause data to be loaded into register D0, but into whatever the current “top-of-stack” register happened to be.

What we're doing in effect is to replace the CPU's RAM stack with a locally managed stack made up of registers. For most expressions, the stack level will never exceed eight, so we'll get pretty good code out. Of course, we also have to deal with those odd cases where the stack level does exceed eight, but that's no problem either. We simply let the stack spill over into the CPU stack. For levels beyond eight, the code is no worse than what we're generating now, and for levels less than eight, it's considerably better.

For the record, I have implemented this concept, just to make sure it works before I mentioned it to you. It does. In practice, it turns out that you can't really use all eight levels … you need at least one register free to reverse the operand order for division (sure wish the 68000 had an XTHL, like the 8080!). For expressions that include function calls, we would also need a register reserved for them. Still, there is a nice improvement in code size for most expressions.

So, you see, getting better code isn't that difficult, but it does add complexity to the our translator … complexity we can do without at this point. For that reason, I strongly suggest that we continue to ignore efficiency issues for the rest of this series, secure in the knowledge that we can indeed improve the code quality without throwing away what we've done.

Next lesson, I'll show you how to deal with variables factors and function calls. I'll also show you just how easy it is to handle multicharacter tokens and embedded white space.