Installment published 24th July 1988.
In the first three chapters of this series, we've looked at parsing and compiling math expressions, and worked our way gradually and methodically from dealing with very simple one-term, one-character “expressions” up through more general ones, finally arriving at a very complete parser that could parse and translate complete assignment statements, with multi-character tokens, embedded white space, and function calls. This time, I'm going to walk you through the process one more time, only with the goal of interpreting rather than compiling object code.
Since this is a series on compilers, why should we bother with interpreters? Simply because I want you to see how the nature of the parser changes as we change the goals. I also want to unify the concepts of the two types of translators, so that you can see not only the differences, but also the similarities.
Consider the assignment statement
x = 2 * y + 3
In a compiler, we want the target CPU to execute this assignment at execution time. The translator itself doesn't do any arithmetic … it only issues the object code that will cause the CPU to do it when the code is executed. For the example above, the compiler would issue code to compute the expression and store the results in variable x.
For an interpreter, on the other hand, no object code is generated. Instead, the arithmetic is computed immediately, as the parsing is going on. For the example, by the time parsing of the statement is complete, x will have a new value.
The approach we've been taking in this whole series is called “syntax-driven translation”. As you are aware by now, the structure of the parser is very closely tied to the syntax of the productions we parse. We have built Pascal procedures that recognize every language construct. Associated with each of these constructs (and procedures) is a corresponding “action”, which does whatever makes sense to do once a construct has been recognized. In our compiler so far, every action involves emitting object code, to be executed later at execution time. In an interpreter, every action involves something to be done immediately.
What I'd like you to see here is that the layout … the structure … of the parser doesn't change. It's only the actions that change. So if you can write an interpreter for a given language, you can also write a compiler, and vice versa. Yet, as you will see, there are differences, and significant ones. Because the actions are different, the procedures that do the recognizing end up being written differently. Specifically, in the interpreter the recognizing procedures end up being coded as functions that return numeric values to their callers. None of the parsing routines for our compiler did that.
Our compiler, in fact, is what we might call a “pure” compiler. Each time a construct is recognized, the object code is emitted immediately. (That's one reason the code is not very efficient.) The interpreter we'll be building here is a pure interpreter, in the sense that there is no translation, such as “tokenizing,” performed on the source code. These represent the two extremes of translation. In the real world, translators are rarely so pure, but tend to have bits of each technique.
I can think of several examples. I've already mentioned one: most interpreters, such as Microsoft BASIC, for example, translate the source code (tokenize it) into an intermediate form so that it'll be easier to parse real time.
Another example is an assembler. The purpose of an assembler, of course, is to produce object code, and it normally does that on a one-to-one basis: one object instruction per line of source code. But almost every assembler also permits expressions as arguments. In this case, the expressions are always constant expressions, and so the assembler isn't supposed to issue object code for them. Rather, it “interprets” the expressions and computes the corresponding constant result, which is what it actually emits as object code.
As a matter of fact, we could use a bit of that ourselves. The translator we built in the previous chapter will dutifully spit out object code for complicated expressions, even though every term in the expression is a constant. In that case it would be far better if the translator behaved a bit more like an interpreter, and just computed the equivalent constant result.
There is a concept in compiler theory called “lazy” translation. The idea is that you typically don't just emit code at every action. In fact, at the extreme you don't emit anything at all, until you absolutely have to. To accomplish this, the actions associated with the parsing routines typically don't just emit code. Sometimes they do, but often they simply return information back to the caller. Armed with such information, the caller can then make a better choice of what to do.
For example, given the statement
x = x + 3 - 2 - (5 - 4)
our compiler will dutifully spit out a stream of 18 instructions to load each parameter into registers, perform the arithmetic, and store the result. A lazier evaluation would recognize that the arithmetic involving constants can be evaluated at compile time, and would reduce the expression to
x = x + 0
An even lazier evaluation would then be smart enough to figure out that this is equivalent to
x = x
which calls for no action at all. We could reduce 18 instructions to zero!
Note that there is no chance of optimizing this way in our translator as it stands, because every action takes place immediately.
Lazy expression evaluation can produce significantly better object code than we have been able to so far. I warn you, though: it complicates the parser code considerably, because each routine now has to make decisions as to whether to emit object code or not. Lazy evaluation is certainly not named that because it's easier on the compiler writer!
Since we're operating mainly on the KISS principle here, I won't go into much more depth on this subject. I just want you to be aware that you can get some code optimization by combining the techniques of compiling and interpreting. In particular, you should know that the parsing routines in a smarter translator will generally return things to their caller, and sometimes expect things as well. That's the main reason for going over interpretation in this installment.