7.10. Returning A Character

Essentially every scanner I've ever seen that was written in Pascal used the mechanism of an enumerated type that I've just described. It is certainly a workable mechanism, but it doesn't seem the simplest approach to me.

For one thing, the list of possible symbol types can get pretty long. Here, I've used just one symbol, Operator, to stand for all of the operators, but I've seen other designs that actually return different codes for each one.

There is, of course, another simple type that can be returned as a code: the character. Instead of returning the enumeration value Operator for a + sign, what's wrong with just returning the character itself? A character is just as good a variable for encoding the different token types, it can be used in case statements easily, and it's sure a lot easier to type. What could be simpler?

Besides, we've already had experience with the idea of encoding keywords as single characters. Our previous programs are already written that way, so using this approach will minimize the changes to what we've already done.

Some of you may feel that this idea of returning character codes is too mickey-mouse. I must admit it gets a little awkward for multi-character operators like <=. If you choose to stay with the enumerated type, fine. For the rest, I'd like to show you how to change what we've done above to support that approach.

First, you can delete the SymType declaration now … we won't be needing that. And you can change the type of Token to char.

Next, to replace SymType, add the following constant string:

   const KWcode: string[5] = 'xilee';

Note

I'll be encoding all idents with the single character x.

Lastly, modify Scan and its relatives as follows:

{ Get an Identifier }
procedure GetName;
begin
   Value := '';
   if not IsAlpha(Look) then Expected('Name');
   while IsAlNum(Look) do begin
     Value := Value + UpCase(Look);
     GetChar;
   end;
   Token := KWcode[Lookup(Addr(KWlist), Value, 4) + 1];
end;

{ Get a Number }
procedure GetNum;
begin
   Value := '';
   if not IsDigit(Look) then Expected('Integer');
   while IsDigit(Look) do begin
     Value := Value + Look;
     GetChar;
   end;
   Token := '#';
end;

{ Get an Operator }
procedure GetOp;
begin
   Value := '';
   if not IsOp(Look) then Expected('Operator');
   while IsOp(Look) do begin
     Value := Value + Look;
     GetChar;
   end;
   if Length(Value) = 1 then
      Token := Value[1]
   else
      Token := '?';
end;

{ Lexical Scanner }
procedure Scan;
var k: integer;
begin
   while Look = CR do
      Fin;
   if IsAlpha(Look) then
      GetName
   else if IsDigit(Look) then
      GetNum
   else if IsOp(Look) then begin
      GetOp
   else begin
      Value := Look;
      Token := '?';
      GetChar;
   end;
   SkipWhite;
end;

{ Main Program }
begin
   Init;
   repeat
      Scan;
      case Token of
        'x': write('Ident ');
        '#': Write('Number ');
        'i', 'l', 'e': Write('Keyword ');
        else Write('Operator ');
      end;
      Writeln(Value);
   until Value = 'END';
end.

This program should work the same as the previous version. A minor difference in structure, maybe, but it seems more straightforward to me.