6.5. The Parser

Now that we've gotten through the decision-making process, we can press on with development of a parser. You've done this with me several times now, so you know the drill: we begin with a fresh copy of the cradle, and begin adding procedures one by one. So let's do it.

We begin, as we did in the arithmetic case, by dealing only with Boolean literals rather than variables. This gives us a new kind of input token, so we're also going to need a new recognizer, and a new procedure to read instances of that token type. Let's start by defining the two new procedures:

{ Recognize a Boolean Literal }
function IsBoolean(c: char): Boolean;
begin
   IsBoolean := UpCase(c) in ['T', 'F'];
end;

{ Get a Boolean Literal }
function GetBoolean: Boolean;
var c: char;
begin
   if not IsBoolean(Look) then Expected('Boolean Literal');
   GetBoolean := UpCase(Look) = 'T';
   GetChar;
end;

Type these routines into your program. You can test them by adding into the main program the print statement

   WriteLn(GetBoolean);

OK, compile the program and test it. As usual, it's not very impressive so far, but it soon will be.

Now, when we were dealing with numeric data we had to arrange to generate code to load the values into D0. We need to do the same for Boolean data. The usual way to encode Boolean variables is to let 0 stand for FALSE, and some other value for TRUE. Many languages, such as C, use an integer 1 to represent it. But I prefer FFFF hex (or -1), because a bitwise NOT also becomes a Boolean NOT. So now we need to emit the right assembler code to load those values. The first cut at the Boolean expression parser (BoolExpression, of course) is:

{ Parse and Translate a Boolean Expression }
procedure BoolExpression;
begin
   if not IsBoolean(Look) then Expected('Boolean Literal');
   if GetBoolean then
      EmitLn('MOVE #-1,D0')
   else
      EmitLn('CLR D0');
end;

Add this procedure to your parser, and call it from the main program (replacing the print statement you had just put there). As you can see, we still don't have much of a parser, but the output code is starting to look more realistic.

Next, of course, we have to expand the definition of a Boolean expression. We already have the BNF rule:

 <b-expression> ::= <b-term> [<orop> <b-term>]*

I prefer the Pascal versions of the "orops", OR and XOR. But since we are keeping to single-character tokens here, I'll encode those with | and ~. The next version of BoolExpression is almost a direct copy of the arithmetic procedure Expression:

{ Recognize and Translate a Boolean OR }
procedure BoolOr;
begin
   Match('|');
   BoolTerm;
   EmitLn('OR (SP)+,D0');
end;

{ Recognize and Translate an Exclusive Or }
procedure BoolXor;
begin
   Match('~');
   BoolTerm;
   EmitLn('EOR (SP)+,D0');
end;

{ Parse and Translate a Boolean Expression }
procedure BoolExpression;
begin
   BoolTerm;
   while IsOrOp(Look) do begin
      EmitLn('MOVE D0,-(SP)');
      case Look of
       '|': BoolOr;
       '~': BoolXor;
      end;
   end;
end;

Note the new recognizer IsOrOp, which is also a copy, this time of IsAddOp:

{ Recognize a Boolean Orop }
function IsOrop(c: char): Boolean;
begin
   IsOrop := c in ['|', '~'];
end;

OK, rename the old version of BoolExpression to BoolTerm, then enter the code above. Compile and test this version. At this point, the output code is starting to look pretty good. Of course, it doesn't make much sense to do a lot of Boolean algebra on constant values, but we'll soon be expanding the types of Booleans we deal with.

You've probably already guessed what the next step is: The Boolean version of Term.

Rename the current procedure BoolTerm to NotFactor, and enter the following new version of BoolTerm. Note that is is much simpler than the numeric version, since there is no equivalent of division.

{ Parse and Translate a Boolean Term }
procedure BoolTerm;
begin
   NotFactor;
   while Look = '&' do begin
      EmitLn('MOVE D0,-(SP)');
      Match('&');
      NotFactor;
      EmitLn('AND (SP)+,D0');
   end;
end;

Now, we're almost home. We are translating complex Boolean expressions, although only for constant values. The next step is to allow for the NOT. Write the following procedure:

{ Parse and Translate a Boolean Factor with NOT }
procedure NotFactor;
begin
   if Look = '!' then begin
      Match('!');
      BoolFactor;
      EmitLn('EOR #-1,D0');
      end
   else
      BoolFactor;
end;

And rename the earlier procedure to BoolFactor. Now try that. At this point the parser should be able to handle any Boolean expression you care to throw at it. Does it? Does it trap badly formed expressions?

If you've been following what we did in the parser for math expressions, you know that what we did next was to expand the definition of a factor to include variables and parens. We don't have to do that for the Boolean factor, because those little items get taken care of by the next step. It takes just a one line addition to BoolFactor to take care of relations:

{ Parse and Translate a Boolean Factor }
procedure BoolFactor;
begin
   if IsBoolean(Look) then
      if GetBoolean then
         EmitLn('MOVE #-1,D0')
      else
         EmitLn('CLR D0')
      else Relation;
end;

You might be wondering when I'm going to provide for Boolean variables and parenthesized Boolean expressions. The answer is, I'm not! Remember, we took those out of the grammar earlier. Right now all I'm doing is encoding the grammar we've already agreed upon. The compiler itself can't tell the difference between a Boolean variable or expression and an arithmetic one … all of those will be handled by Relation, either way.

Of course, it would help to have some code for Relation. I don't feel comfortable, though, adding any more code without first checking out what we already have. So for now let's just write a dummy version of Relation that does nothing except eat the current character, and write a little message:

{ Parse and Translate a Relation }
procedure Relation;
begin
   WriteLn('<Relation>');
   GetChar;
end;

OK, key in this code and give it a try. All the old things should still work … you should be able to generate the code for ANDs, ORs, and NOTs. In addition, if you type any alphabetic character you should get a little <Relation> place-holder, where a Boolean factor should be. Did you get that? Fine, then let's move on to the full-blown version of Relation.

To get that, though, there is a bit of groundwork that we must lay first. Recall that a relation has the form

<relation>     ::= | <expression> [<relop> <expression]

Since we have a new kind of operator, we're also going to need a new Boolean function to recognize it. That function is shown below. Because of the single-character limitation, I'm sticking to the four operators that can be encoded with such a character (the “not equals” is encoded by #).

{ Recognize a Relop }
function IsRelop(c: char): Boolean;
begin
   IsRelop := c in ['=', '#', '<', '>'];
end;

Now, recall that we're using a zero or a -1 in register D0 to represent a Boolean value, and also that the loop constructs expect the flags to be set to correspond. In implementing all this on the 68000, things get a a little bit tricky.

Since the loop constructs operate only on the flags, it would be nice (and also quite efficient) just to set up those flags, and not load anything into D0 at all. This would be fine for the loops and branches, but remember that the relation can be used anywhere a Boolean factor could be used. We may be storing its result to a Boolean variable. Since we can't know at this point how the result is going to be used, we must allow for both cases.

Comparing numeric data is easy enough … the 68000 has an operation for that … but it sets the flags, not a value. What's more, the flags will always be set the same (zero if equal, etc.), while we need the zero flag set differently for the each of the different relops.

The solution is found in the 68000 instruction Scc, which sets a byte value to 0000 or FFFF (funny how that works!) depending upon the result of the specified condition. If we make the destination byte to be D0, we get the Boolean value needed.

Unfortunately, there's one final complication: unlike almost every other instruction in the 68000 set, Scc does not reset the condition flags to match the data being stored. So we have to do one last step, which is to test D0 and set the flags to match it. It must seem to be a trip around the moon to get what we want: we first perform the test, then test the flags to set data into D0, then test D0 to set the flags again. It is sort of roundabout, but it's the most straightforward way to get the flags right, and after all it's only a couple of instructions.

I might mention here that this area is, in my opinion, the one that represents the biggest difference between the efficiency of hand-coded assembler language and compiler-generated code. We have seen already that we lose efficiency in arithmetic operations, although later I plan to show you how to improve that a bit. We've also seen that the control constructs themselves can be done quite efficiently … it's usually very difficult to improve on the code generated for an IF or a WHILE. But virtually every compiler I've ever seen generates terrible code, compared to assembler, for the computation of a Boolean function, and particularly for relations. The reason is just what I've hinted at above. When I'm writing code in assembler, I go ahead and perform the test the most convenient way I can, and then set up the branch so that it goes the way it should. In effect, I “tailor” every branch to the situation. The compiler can't do that (practically), and it also can't know that we don't want to store the result of the test as a Boolean variable. So it must generate the code in a very strict order, and it often ends up loading the result as a Boolean that never gets used for anything.

In any case, we're now ready to look at the code for Relation. It's shown below with its companion procedures:

{ Recognize and Translate a Relational "Equals" }
procedure Equals;
begin
   Match('=');
   Expression;
   EmitLn('CMP (SP)+,D0');
   EmitLn('SEQ D0');
end;

{ Recognize and Translate a Relational "Not Equals" }
procedure NotEquals;
begin
   Match('#');
   Expression;
   EmitLn('CMP (SP)+,D0');
   EmitLn('SNE D0');
end;

{ Recognize and Translate a Relational "Less Than" }
procedure Less;
begin
   Match('<');
   Expression;
   EmitLn('CMP (SP)+,D0');
   EmitLn('SGE D0');
end;

{ Recognize and Translate a Relational "Greater Than" }
procedure Greater;
begin
   Match('>');
   Expression;
   EmitLn('CMP (SP)+,D0');
   EmitLn('SLE D0');
end;

{ Parse and Translate a Relation }
procedure Relation;
begin
   Expression;
   if IsRelop(Look) then begin
      EmitLn('MOVE D0,-(SP)');
      case Look of
       '=': Equals;
       '#': NotEquals;
       '<': Less;
       '>': Greater;
      end;
   EmitLn('TST D0');
   end;
end;

Now, that call to Expression looks familiar! Here is where the editor of your system comes in handy. We have already generated code for Expression and its buddies in previous sessions. You can copy them into your file now. Remember to use the single-character versions. Just to be certain, I've duplicated the arithmetic procedures below. If you're observant, you'll also see that I've changed them a little to make them correspond to the latest version of the syntax. This change is not necessary, so you may prefer to hold off on that until you're sure everything is working.

{ Parse and Translate an Identifier }
procedure Ident;
var Name: char;
begin
   Name:= GetName;
   if Look = '(' then begin
      Match('(');
      Match(')');
      EmitLn('BSR ' + Name);
      end
   else
      EmitLn('MOVE ' + Name + '(PC),D0');
end;

{ Parse and Translate a Math Factor }
procedure Expression; Forward;
procedure Factor;
begin
   if Look = '(' then begin
      Match('(');
      Expression;
      Match(')');
      end
   else if IsAlpha(Look) then
      Ident
   else
      EmitLn('MOVE #' + GetNum + ',D0');
end;

{ Parse and Translate the First Math Factor }
procedure SignedFactor;
begin
   if Look = '+' then
      GetChar;
   if Look = '-' then begin
      GetChar;
      if IsDigit(Look) then
         EmitLn('MOVE #-' + GetNum + ',D0')
      else begin
         Factor;
         EmitLn('NEG D0');
      end;
   end
   else Factor;
end;

{ Recognize and Translate a Multiply }
procedure Multiply;
begin
   Match('*');
   Factor;
   EmitLn('MULS (SP)+,D0');
end;

{ Recognize and Translate a Divide }
procedure Divide;
begin
   Match('/');
   Factor;
   EmitLn('MOVE (SP)+,D1');
   EmitLn('EXS.L D0');
   EmitLn('DIVS D1,D0');
end;

{ Parse and Translate a Math Term }
procedure Term;
begin
   SignedFactor;
   while Look in ['*', '/'] do begin
      EmitLn('MOVE D0,-(SP)');
      case Look of
       '*': Multiply;
       '/': Divide;
      end;
   end;
end;

{ Recognize and Translate an Add }
procedure Add;
begin
   Match('+');
   Term;
   EmitLn('ADD (SP)+,D0');
end;

{ Recognize and Translate a Subtract }
procedure Subtract;
begin
   Match('-');
   Term;
   EmitLn('SUB (SP)+,D0');
   EmitLn('NEG D0');
end;

{ Parse and Translate an Expression }
procedure Expression;
begin
   Term;
   while IsAddop(Look) do begin
      EmitLn('MOVE D0,-(SP)');
      case Look of
       '+': Add;
       '-': Subtract;
      end;
   end;
end;

There you have it … a parser that can handle both arithmetic and Boolean algebra, and things that combine the two through the use of relops. I suggest you file away a copy of this parser in a safe place for future reference, because in our next step we're going to be chopping it up.