OK, assuming that you're willing to accept the grammar I've shown here, we now have syntax rules for both arithmetic and Boolean algebra. The sticky part comes in when we have to combine the two. Why do we have to do that? Well, the whole subject came up because of the need to process the “predicates” (conditions) associated with control statements such as the IF. The predicate is required to have a Boolean value; that is, it must evaluate to either TRUE or FALSE. The branch is then taken or not taken, depending on that value. What we expect to see going on in procedure Condition, then, is the evaluation of a Boolean expression.
But there's more to it than that. A pure Boolean expression can indeed be the predicate of a control statement … things like
IF a AND NOT b THEN …
But more often, we see Boolean algebra show up in such things as
IF (x >= 0) and (x <= 100) THEN …
Here, the two terms in parens are Boolean expressions, but the individual terms being compared: x, 0, and 100, are numeric in nature. The relational operators >= and <= are the catalysts by which the Boolean and the arithmetic ingredients get merged together.
Now, in the example above, the terms being compared are just that: terms. However, in general each side can be a math expression. So we can define a relation to be:
<relation> ::= <expression> <relop> <expression> |
where the expressions we're talking about here are the old numeric type, and the relops are any of the usual symbols
=, <> ( or !=), <, >, <=, and >=
If you think about it a bit, you'll agree that, since this kind of predicate has a single Boolean value, TRUE or FALSE, as its result, it is really just another kind of factor. So we can expand the definition of a Boolean factor above to read:
<b-factor> ::= <b-literal> | <b-variable> | (<b-expression>) | <relation> |
That's the connection! The relops and the relation they define serve to wed the two kinds of algebra. It is worth noting that this implies a hierarchy where the arithmetic expression has a higher precedence that a Boolean factor, and therefore than all the Boolean operators. If you write out the precedence levels for all the operators, you arrive at the following list:
Level | Syntax Element | Operator |
---|---|---|
0 | factor | literal, variable |
1 | signed factor | unary minus |
2 | term | *, / |
3 | expression | +, - |
4 | b-factor | literal, variable, relop |
5 | not-factor | NOT |
6 | b-term | AND |
7 | b-expression | OR, XOR |
If we're willing to accept that many precedence levels, this grammar seems reasonable. Unfortunately, it won't work! The grammar may be great in theory, but it's no good at all in the practice of a top-down parser. To see the problem, consider the code fragment:
IF ((((((A + B + C) < 0 ) AND …. |
When the parser is parsing this code, it knows after it sees the IF token that a Boolean expression is supposed to be next. So it can set up to begin evaluating such an expression. But the first expression in the example is an arithmetic expression, A + B + C. What's worse, at the point that the parser has read this much of the input line:
IF ((((((A |
it still has no way of knowing which kind of expression it's dealing with. That won't do, because we must have different recognizers for the two cases. The situation can be handled without changing any of our definitions, but only if we're willing to accept an arbitrary amount of backtracking to work our way out of bad guesses. No compiler writer in his right mind would agree to that.
What's going on here is that the beauty and elegance of BNF grammar has met face to face with the realities of compiler technology.
To deal with this situation, compiler writers have had to make compromises so that a single parser can handle the grammar without backtracking.