3.5. Multi-Character Tokens

Throughout this series, I've been carefully restricting everything we do to single-character tokens, all the while assuring you that it wouldn't be difficult to extend to multi-character ones. I don't know if you believed me or not … I wouldn't really blame you if you were a bit skeptical. I'll continue to use that approach in the sessions which follow, because it helps keep complexity away. But I'd like to back up those assurances, and wrap up this portion of the parser, by showing you just how easy that extension really is. In the process, we'll also provide for embedded white space. Before you make the next few changes, though, save the current version of the parser away under another name. I have some more uses for it in the next installment, and we'll be working with the single-character version.

Most compilers separate out the handling of the input stream into a separate module called the lexical scanner. The idea is that the scanner deals with all the character-by-character input, and returns the separate units (tokens) of the stream. There may come a time when we'll want to do something like that, too, but for now there is no need. We can handle the multi-character tokens that we need by very slight and very local modifications to GetName and GetNum.

The usual definition of an identifier is that the first character must be a letter, but the rest can be alphanumeric (letters or numbers). To deal with this, we need one other recognizer function

{ Recognize an Alphanumeric }
function IsAlNum(c: char): boolean;
   IsAlNum := IsAlpha(c) or IsDigit(c);

Add this function to your parser. I put mine just after IsDigit. While you're at it, might as well include it as a permanent member of Cradle, too.

Now, we need to modify function GetName to return a string instead of a character:

{ Get an Identifier }
function GetName: string;
var Token: string;
   Token := '';
   if not IsAlpha(Look) then Expected('Name');
   while IsAlNum(Look) do begin
      Token := Token + UpCase(Look);
   GetName := Token;

Similarly, modify GetNum to read:

{ Get a Number }
function GetNum: string;
var Value: string;
   Value := '';
   if not IsDigit(Look) then Expected('Integer');
   while IsDigit(Look) do begin
      Value := Value + Look;
   GetNum := Value;

Amazingly enough, that is virtually all the changes required to the parser! The local variable Name in procedures Ident and Assignment was originally declared as char, and must now be declared string[8]. (Clearly, we could make the string length longer if we chose, but most assemblers limit the length anyhow.) Make this change, and then recompile and test. Now do you believe that it's a simple change?