12.1.2. Dealing With Semicolons

There are two distinct ways in which semicolons are used in popular languages. In Pascal, the semicolon is regarded as an statement separator. No semicolon is required after the last statement in a block. The syntax is:

<block> ::= <statement> ( ';' <statement>)*
<statement> ::= <assignment> | <if> | <while> ... | null

Note

The null statement is important!

Pascal also defines some semicolons in other places, such as after the PROGRAM statement.

In C and Ada, on the other hand, the semicolon is considered a statement terminator, and follows all statements (with some embarrassing and confusing exceptions). The syntax for this is simply:

<block> ::= ( <statement> ';')*

Of the two syntaxes, the Pascal one seems on the face of it more rational, but experience has shown that it leads to some strange difficulties. People get so used to typing a semicolon after every statement that they tend to type one after the last statement in a block, also. That usually doesn't cause any harm … it just gets treated as a null statement. Many Pascal programmers, including yours truly, do just that. But there is one place you absolutely cannot type a semicolon, and that's right before an ELSE. This little gotcha has cost me many an extra compilation, particularly when the ELSE is added to existing code. So the C/Ada choice turns out to be better. Apparently Nicklaus Wirth thinks so, too: In his Modula 2, he abandoned the Pascal approach.

Given either of these two syntaxes, it's an easy matter (now that we've reorganized the parser!) to add these features to our parser. Let's take the last case first, since it's simpler.

To begin, I've made things easy by introducing a new recognizer:

{ Match a Semicolon }
procedure Semi;
begin
   MatchString(';');
end;

This procedure works very much like our old Match. It insists on finding a semicolon as the next token. Having found it, it skips to the next one.

Since a semicolon follows a statement, procedure Block is almost the only one we need to change:

{ Parse and Translate a Block of Statements }
procedure Block;
begin
   Scan;
   while not(Token in ['e', 'l']) do begin
      case Token of
       'i': DoIf;
       'w': DoWhile;
       'R': DoRead;
       'W': DoWrite;
       'x': Assignment;
      end;
      Semi;
      Scan;
   end;
end;

Note carefully the subtle change in the case statement. The call to Assignment is now guarded by a test on Token. This is to avoid calling Assignment when the token is a semicolon (which could happen if the statement is null).

Since declarations are also statements, we also need to add a call to Semi within procedure TopDecls:

{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
   Scan;
   while Token = 'v' do begin
      Alloc;
      while Token = ',' do
         Alloc;
      Semi;
   end;
end;

Finally, we need one for the PROGRAM statement:

{ Main Program }
begin
   Init;
   MatchString('PROGRAM');
   Semi;
   Header;
   TopDecls;
   MatchString('BEGIN');
   Prolog;
   Block;
   MatchString('END');
   Epilog;
end.

It's as easy as that. Try it with a copy of TINY and see how you like it.

The Pascal version is a little trickier, but it still only requires minor changes, and those only to procedure Block. To keep things as simple as possible, let's split the procedure into two parts. The following procedure handles just one statement:

{ Parse and Translate a Single Statement }
procedure Statement;
begin
   Scan;
   case Token of
    'i': DoIf;
    'w': DoWhile;
    'R': DoRead;
    'W': DoWrite;
    'x': Assignment;
   end;
end;

Using this procedure, we can now rewrite Block like this:

{ Parse and Translate a Block of Statements }
procedure Block;
begin
   Statement;
   while Token = ';' do begin
      Next;
      Statement;
   end;
end;

That sure didn't hurt, did it? We can now parse semicolons in Pascal-like fashion.