2.3. Binary Expressions

Now that we have that under our belt, let's branch out a bit. Admittedly, an “expression” consisting of only one character is not going to meet our needs for long, so let's see what we can do to extend it. Suppose we want to handle expressions of the form:

1+2

or

4-3

or, in general,

<term> +/- <term>

Note

That's a bit of Backus-Naur Form, or BNF.

To do this we need a procedure that recognizes a term and leaves its result somewhere, and another that recognizes and distinguishes between a + and a - and generates the appropriate code. But if Expression is going to leave its result in D0, where should Term leave its result? Answer: the same place. We're going to have to save the first result of Term somewhere before we get the next one.

OK, basically what we want to do is have procedure Term do what Expression was doing before. So just rename procedure Expression as Term, and enter the following new version of Expression:

{ Parse and Translate an Expression }
procedure Expression;
begin
   Term;
   EmitLn('MOVE D0,D1');
   case Look of
    '+': Add;
    '-': Subtract;
   else Expected('Addop');
   end;
end;

Next, just above Expression enter these two procedures:

{ Recognize and Translate an Add }
procedure Add;
begin
   Match('+');
   Term;
   EmitLn('ADD D1,D0');
end;

{ Recognize and Translate a Subtract }
procedure Subtract;
begin
   Match('-');
   Term;
   EmitLn('SUB D1,D0');
end;

When you're finished with that, the order of the routines should be:

  1. Term (The old Expression)
  2. Add
  3. Subtract
  4. Expression

Now run the program. Try any combination you can think of of two single digits, separated by a + or a -. You should get a series of four assembler-language instructions out of each run. Now try some expressions with deliberate errors in them. Does the parser catch the errors?

Take a look at the object code generated. There are two observations we can make. First, the code generated is not what we would write ourselves. The sequence

        MOVE #n,D0
        MOVE D0,D1

is inefficient. If we were writing this code by hand, we would probably just load the data directly to D1.

There is a message here: code generated by our parser is less efficient than the code we would write by hand. Get used to it. That's going to be true throughout this series. It's true of all compilers to some extent. Computer scientists have devoted whole lifetimes to the issue of code optimization, and there are indeed things that can be done to improve the quality of code output. Some compilers do quite well, but there is a heavy price to pay in complexity, and it's a losing battle anyway … there will probably never come a time when a good assembler-language programmer can't out-program a compiler. Before this session is over, I'll briefly mention some ways that we can do a little optimization, just to show you that we can indeed improve things without too much trouble. But remember, we're here to learn, not to see how tight we can make the object code. For now, and really throughout this series of articles, we'll studiously ignore optimization and concentrate on getting out code that works.

Speaking of which: ours doesn't! The code is wrong! As things are working now, the subtraction process subtracts D1 (which has the first argument in it) from D0 (which has the second). That's the wrong way, so we end up with the wrong sign for the result. So let's fix up procedure Subtract with a sign-changer, so that it reads

{ Recognize and Translate a Subtract }
procedure Subtract;
begin
   Match('-');
   Term;
   EmitLn('SUB D1,D0');
   EmitLn('NEG D0');
end;

Now our code is even less efficient, but at least it gives the right answer! Unfortunately, the rules that give the meaning of math expressions require that the terms in an expression come out in an inconvenient order for us. Again, this is just one of those facts of life you learn to live with. This one will come back to haunt us when we get to division.

OK, at this point we have a parser that can recognize the sum or difference of two digits. Earlier, we could only recognize a single digit. But real expressions can have either form (or an infinity of others). For kicks, go back and run the program with the single input line.

1

Didn't work, did it? And why should it? We just finished telling our parser that the only kinds of expressions that are legal are those with two terms. We must rewrite procedure Expression to be a lot more broadminded, and this is where things start to take the shape of a real parser.