16.5. Booleans

The next step, as we've learned several times before, is to add Boolean algebra. In the past, this step has at least doubled the amount of code we've had to write. As I've gone over this step in my mind, I've found myself diverging more and more from what we did in previous installments. To refresh your memory, I noted that Pascal treats the Boolean operators pretty much identically to the way it treats arithmetic ones. A Boolean “and” has the same precedence level as multiplication, and the “or” as addition. C, on the other hand, sets them at different precedence levels, and all told has a whopping 17 levels. In our earlier work, I chose something in between, with seven levels. As a result, we ended up with things called Boolean expressions, paralleling in most details the arithmetic expressions, but at a different precedence level. All of this, as it turned out, came about because I didn't like having to put parentheses around the Boolean expressions in statements like:

IF (c >= 'A') and (c <= 'Z') then ...

In retrospect, that seems a pretty petty reason to add many layers of complexity to the parser. Perhaps more to the point, I'm not sure I was even able to avoid the parens.

For kicks, let's start anew, taking a more Pascal-ish approach, and just treat the Boolean operators at the same precedence level as the arithmetic ones. We'll see where it leads us. If it seems to be down the garden path, we can always backtrack to the earlier approach.

For starters, we'll add the “addition-level” operators to Expression. That's easily done; first, modify the function IsAddop in unit Scanner to include two extra operators: '|' for “or,” and '~' for “exclusive or”:

function IsAddop(c: char): boolean;
begin
        IsAddop := c in ['+','-', '|', '~'];
end;

Next, we must include the parsing of the operators in procedure Expression:

procedure Expression;
begin
        SignedTerm;
        while IsAddop(Look) do
                case Look of
                        '+': Add;
                        '-': Subtract;
                        '|': _Or;
                        '~': _Xor;
                end;
        end; 
end;

Note

The underscores are needed, of course, because “or” and “xor” are reserved words in Turbo Pascal.

Next, the procedures _Or and _Xor:

{ Parse and Translate a Subtraction Operation }
procedure _Or;
begin
        Match('|');
        Push;
        Term;
        PopOr;
end;
 
{ Parse and Translate a Subtraction Operation }
procedure _Xor;
begin
        Match('~');
        Push;
        Term;
        PopXor;
end;

And, finally, the new code generator procedures:

{ Or TOS with Primary }
procedure PopOr;
begin
        EmitLn('OR (SP)+,D0');
end;
 
{ Exclusive-Or TOS with Primary }
procedure PopXor;
begin
        EmitLn('EOR (SP)+,D0');
end;

Now, let's test the translator (you might want to change the call in Main back to a call to Expression, just to avoid having to type "x=" for an assignment every time).

So far, so good. The parser nicely handles expressions of the form:

x|y~z

Unfortunately, it also does nothing to protect us from mixing Boolean and arithmetic algebra. It will merrily generate code for:

(a+b)*(c~d)

We've talked about this a bit, in the past. In general the rules for what operations are legal or not cannot be enforced by the parser itself, because they are not part of the syntax of the language, but rather its semantics. A compiler that doesn't allow mixed-mode expressions of this sort must recognize that c and d are Boolean variables, rather than numeric ones, and balk at multiplying them in the next step. But this “policing” can't be done by the parser; it must be handled somewhere between the parser and the code generator. We aren't in a position to enforce such rules yet, because we haven't got either a way of declaring types, or a symbol table to store the types in. So, for what we've got to work with at the moment, the parser is doing precisely what it's supposed to do.

Anyway, are we sure that we don't want to allow mixed-type operations? We made the decision some time ago (or, at least, I did) to adopt the value 0000 as a Boolean “false,” and -1, or FFFFh, as a Boolean “true.” The nice part about this choice is that bitwise operations work exactly the same way as logical ones. In other words, when we do an operation on one bit of a logical variable, we do it on all of them. This means that we don't need to distinguish between logical and bitwise operations, as is done in C with the operators & and &&, and | and ||. Reducing the number of operators by half certainly doesn't seem all bad.

From the point of view of the data in storage, of course, the computer and compiler couldn't care less whether the number FFFFh represents the logical TRUE, or the numeric -1. Should we? I sort of think not. I can think of many examples (though they might be frowned upon as “tricky” code) where the ability to mix the types might come in handy. Example, the Dirac delta function, which could be coded in one simple line:

-(x=0)

or the absolute value function (definitely tricky code!):

x*(1+2*(x<0))

Please note, I'm not advocating coding like this as a way of life. I'd almost certainly write these functions in more readable form, using IFs, just to keep from confusing later maintainers. Still, a moral question arises: Do we have the right to enforce our ideas of good coding practice on the programmer, but writing the language so he can't do anything else? That's what Nicklaus Wirth did, in many places in Pascal, and Pascal has been criticized for it — for not being as “forgiving” as C.

An interesting parallel presents itself in the example of the Motorola 68000 design. Though Motorola brags loudly about the orthogonality of their instruction set, the fact is that it's far from orthogonal. For example, you can read a variable from its address:

        MOVE X,D0 (where X is the name of a variable)

but you can't write in the same way. To write, you must load an address register with the address of X. The same is true for PC-relative addressing:

        MOVE X(PC),DO	(legal)
        MOVE D0,X(PC)	(illegal)

When you begin asking how such non-orthogonal behavior came about, you find that someone in Motorola had some theories about how software should be written. Specifically, in this case, they decided that self-modifying code, which you can implement using PC-relative writes, is a Bad Thing. Therefore, they designed the processor to prohibit it. Unfortunately, in the process they also prohibited all writes of the forms shown above, however benign. Note that this was not something done by default. Extra design work had to be done, and extra gates added, to destroy the natural orthogonality of the instruction set.

One of the lessons I've learned from life: If you have two choices, and can't decide which one to take, sometimes the best thing to do is nothing. Why add extra gates to a processor to enforce some stranger's idea of good programming practice? Leave the instructions in, and let the programmers debate what good programming practice is. Similarly, why should we add extra code to our parser, to test for and prevent conditions that the user might prefer to do, anyway? I'd rather leave the compiler simple, and let the software experts debate whether the practices should be used or not.

All of which serves as rationalization for my decision as to how to prevent mixed-type arithmetic: I won't. For a language intended for systems programming, the fewer rules, the better. If you don't agree, and want to test for such conditions, we can do it once we have a symbol table.