13.4. Declaring A Procedure

If you're satisfied that our little program works, then it's time to deal with the procedures. Since we haven't talked about parameters yet, we'll begin by considering only procedures that have no parameter lists.

As a start, let's consider a simple program with a procedure, and think about the code we'd like to see generated for it:

     PROGRAM FOO;
     .
     .
     PROCEDURE BAR;                     BAR:
     BEGIN                                   .
     .                                       .
     .                                       .
     END;                                    RTS

     BEGIN { MAIN PROGRAM }             MAIN:
     .                                       .
     .                                       .
     FOO;                                    BSR BAR
     .                                       .
     .                                       .
     END.                                    END MAIN

Here I've shown the high-order language constructs on the left, and the desired assembler code on the right. The first thing to notice is that we certainly don't have much code to generate here! For the great bulk of both the procedure and the main program, our existing constructs take care of the code to be generated.

The key to dealing with the body of the procedure is to recognize that although a procedure may be quite long, declaring it is really no different than declaring a variable. It's just one more kind of declaration. We can write the BNF:

     <declaration> ::= <data decl> | <procedure>

This means that it should be easy to modify TopDecl to deal with procedures. What about the syntax of a procedure? Well, here's a suggested syntax, which is essentially that of Pascal:

     <procedure> ::= PROCEDURE <ident> <begin-block>

There is practically no code generation required, other than that generated within the begin-block. We need only emit a label at the beginning of the procedure, and an RTS at the end.

Here's the required code:

{ Parse and Translate a Procedure Declaration }
procedure DoProc;
var N: char;
begin
     Match('p');
     N := GetName;
     Fin;
     if InTable(N) then Duplicate(N);
     ST[N] := 'p';
     PostLabel(N);
     BeginBlock;
     Return;
end;

Note that I've added a new code generation routine, Return, which merely emits an RTS instruction. The creation of that routine is "left as an exercise for the student."

To finish this version, add the following line within the Case statement in DoBlock:

            'p': DoProc;

I should mention that this structure for declarations, and the BNF that drives it, differs from standard Pascal. In the Jensen & Wirth definition of Pascal, variable declarations, in fact ALL kinds of declarations, must appear in a specific sequence, i.e. labels, constants, types, variables, procedures, and main program. To follow such a scheme, we should separate the two declarations, and have code in the main program something like

     DoVars;
     DoProcs;
     DoMain;

However, most implementations of Pascal, including Turbo, don't require that order and let you freely mix up the various declarations, as long as you still don't try to refer to something before it's declared. Although it may be more aesthetically pleasing to declare all the global variables at the top of the program, it certainly doesn't do any HARM to allow them to be sprinkled around. In fact, it may do some GOOD, in the sense that it gives you the opportunity to do a little rudimentary information hiding. Variables that should be accessed only by the main program, for example, can be declared just before it and will thus be inaccessible by the procedures.

OK, try this new version out. Note that we can declare as many procedures as we choose (as long as we don't run out of single-character names!), and the labels and RTS's all come out in the right places.

It's worth noting here that I do not allow for nested procedures. In TINY, all procedures must be declared at the global level, the same as in C. There has been quite a discussion about this point in the Computer Language Forum of CompuServe. It turns out that there is a significant penalty in complexity that must be paid for the luxury of nested procedures. What's more, this penalty gets paid at RUN TIME, because extra code must be added and executed every time a procedure is called. I also happen to believe that nesting is not a good idea, simply on the grounds that I have seen too many abuses of the feature. Before going on to the next step, it's also worth noting that the "main program" as it stands is incomplete, since it doesn't have the label and END statement. Let's fix that little oversight:

{ Parse and Translate a Main Program }
procedure DoMain;
begin
     Match('b');
     Fin;
     Prolog;
     DoBlock;
     Epilog;
end;
.
.
.
{ Main Program }
begin
     Init;
     TopDecls;
     DoMain;
end.

Note that DoProc and DoMain are not quite symmetrical. DoProc uses a call to BeginBlock, whereas DoMain cannot. That's because a procedure is signaled by the keyword PROCEDURE (abbreviated by a 'p' here), while the main program gets no keyword other than the BEGIN itself.

And that brings up an interesting question: why?

If we look at the structure of C programs, we find that all functions are treated just alike, except that the main program happens to be identified by its name, “main”. Since C functions can appear in any order, the main program can also be anywhere in the compilation unit.

In Pascal, on the other hand, all variables and procedures must be declared before they're used, which means that there is no point putting anything after the main program … it could never be accessed. The "main program" is not identified at all, other than being that part of the code that comes after the global BEGIN. In other words, if it ain't anything else, it must be the main program.

This causes no small amount of confusion for beginning programmers, and for big Pascal programs sometimes it's difficult to find the beginning of the main program at all. This leads to conventions such as identifying it in comments:

     BEGIN { of MAIN }

This has always seemed to me to be a bit of a kludge. The question comes up: Why should the main program be treated so much differently than a procedure? In fact, now that we've recognized that procedure declarations are just that … part of the global declarations … isn't the main program just one more declaration, also?

The answer is yes, and by treating it that way, we can simplify the code and make it considerably more orthogonal. I propose that we use an explicit keyword, PROGRAM, to identify the main program (Note that this means that we can't start the file with it, as in Pascal). In this case, our BNF becomes:

     <declaration> ::= <data decl> | <procedure> | <main program>

     <procedure> ::= PROCEDURE <ident> <begin-block>

     <main program> ::= PROGRAM <ident> <begin-block>

The code also looks much better, at least in the sense that DoMain and DoProc look more alike:

{ Parse and Translate a Main Program }
procedure DoMain;
var N: char;
begin
     Match('P');
     N := GetName;
     Fin;
     if InTable(N) then Duplicate(N);
     Prolog;
     BeginBlock;
end;
.
.
.
{ Parse and Translate Global Declarations }
procedure TopDecls;
begin
     while Look <> '.' do begin
      case Look of
            'v': Decl;
            'p': DoProc;
            'P': DoMain;
          else Abort('Unrecognized Keyword ' + Look);
          end;
          Fin;
     end;
end;


{ Main Program }
begin
     Init;
     TopDecls;
     Epilog;
end.

Since the declaration of the main program is now within the loop of TopDecl, that does present some difficulties. How do we ensure that it's the last thing in the file? And how do we ever exit from the loop? My answer for the second question, as you can see, was to bring back our old friend the period. Once the parser sees that, we're done.

To answer the first question: it depends on how far we're willing to go to protect the programmer from dumb mistakes. In the code that I've shown, there's nothing to keep the programmer from adding code after the main program … even another main program. The code will just not be accessible. However, we COULD access it via a FORWARD statement, which we'll be providing later. As a matter of fact, many assembler language programmers like to use the area just after the program to declare large, uninitialized data blocks, so there may indeed be some value in not requiring the main program to be last. We'll leave it as it is.

If we decide that we should give the programmer a little more help than that, it's pretty easy to add some logic to kick us out of the loop once the main program has been processed. Or we could at least flag an error if someone tries to include two mains.