14.16. To Coerce Or Not To Coerce

In case you haven't gotten this message yet, it sure appears that TINY and KISS will probably not be strongly typed languages, since I've allowed for automatic mixing and conversion of just about any type. Which brings up the next issue:

Is this really what we want to do?

The answer depends on what kind of language you want, and the way you'd like it to behave. What we have not addressed is the issue of when to allow and when to deny the use of operations involving different data types. In other words, what should be the semantics of our compiler? Do we want automatic type conversion for all cases, for some cases, or not at all?

Let's pause here to think about this a bit more. To do so, it will help to look at a bit of history.

FORTRAN II supported only two simple data types: Integer and Real. It allowed implicit type conversion between real and integer types during assignment, but not within expressions. All data items (including literal constants) on the right-hand side of an assignment statement had to be of the same type. That made things pretty easy … much simpler than what we've had to do here.

This was changed in FORTRAN IV to support “mixed-mode” arithmetic. If an expression had any real data items in it, they were all converted to reals and the expression itself was real. To round out the picture, functions were provided to explicitly convert from one type to the other, so that you could force an expression to end up as either type.

This led to two things: code that was easier to write, and code that was less efficient. That's because sloppy programmers would write expressions with simple constants like 0 and 1 in them, which the compiler would dutifully compile to convert at execution time. Still, the system worked pretty well, which would tend to indicate that implicit type conversion is a Good Thing.

C is also a weakly typed language, though it supports a larger number of types. C won't complain if you try to add a character to an integer, for example. Partly, this is helped by the C convention of promoting every char to integer when it is loaded, or passed through a parameter list. This simplifies the conversions quite a bit. In fact, in subset C compilers that don't support long or float types, we end up back where we were in our earlier, simple-minded first try: every variable has the same representation, once loaded into a register. Makes life pretty easy!

The ultimate language in the direction of automatic type conversion is PL/I. This language supports a large number of data types, and you can mix them all freely. If the implicit conversions of FORTRAN seemed good, then those of PL/I should have been Heaven, but it turned out to be more like Hell! The problem was that with so many data types, there had to be a large number of different conversions, and a correspondingly large number of rules about how mixed operands should be converted. These rules became so complex that no one could remember what they were! A lot of the errors in PL/I programs had to do with unexpected and unwanted type conversions. Too much of a Good Thing can be bad for you!

Pascal, on the other hand, is a language which is “strongly typed,” which means that in general you can't mix types, even if they differ only in name, and yet have the same base type! Niklaus Wirth made Pascal strongly typed to help keep programmers out of trouble, and the restrictions have indeed saved many a programmer from himself, because the compiler kept him from doing something dumb. Better to find the bug in compilation rather than the debug phase. The same restrictions can also cause frustration when you really want to mix types, and they tend to drive an ex-C-programmer up the wall.

Even so, Pascal does permit some implicit conversions. You can assign an integer to a real value. You can also mix integer and real types in expressions of type Real. The integers will be automatically coerced to real, just as in FORTRAN (and with the same hidden cost in run-time overhead).

You can't, however, convert the other way, from real to integer, without applying an explicit conversion function, Trunc. The theory here is that, since the numerical value of a real number is necessarily going to be changed by the conversion (the fractional part will be lost), you really shouldn't do it in “secret.

In the spirit of strong typing, Pascal will not allow you to mix Char and Integer variables, without applying the explicit coercion functions Chr and Ord.

Turbo Pascal also includes the types Byte, Word, and LongInt. The first two are basically the same as unsigned integers. In Turbo, these can be freely intermixed with variables of type Integer, and Turbo will automatically handle the conversion. There are run-time checks, though, to keep you from overflowing or otherwise getting the wrong answer. Note that you still can't mix Byte and Char types, even though they are stored internally in the same representation.

The ultimate in a strongly-typed language is Ada, which allows no implicit type conversions at all, and also will not allow mixed-mode arithmetic. Jean Ichbiah's position is that conversions cost execution time, and you shouldn't be allowed to build in such cost in a hidden manner. By forcing the programmer to explicitly request a type conversion, you make it more apparent that there could be a cost involved.

I have been using another strongly-typed language, a delightful little language called Whimsical, by John Spray. Although Whimsical is intended as a systems programming language, it also requires explicit conversion every time. There are NEVER any automatic conversions, even the ones supported by Pascal.

This approach does have certain advantages: The compiler never has to guess what to do: the programmer always tells it precisely what he wants. As a result, there tends to be a more nearly one-to-one correspondence between source code and compiled code, and John's compiler produces very tight code.

On the other hand, I sometimes find the explicit conversions to be a pain. If I want, for example, to add one to a character, or and it with a mask, there are a lot of conversions to make. If I get it wrong, the only error message is “Types are not compatible.” As it happens, John's particular implementation of the language in his compiler doesn't tell you exactly which types are not compatible … it only tells you which LINE the error is in.

I must admit that most of my errors with this compiler tend to be errors of this type, and I've spent a lot of time with the Whimsical compiler, trying to figure out just where in the line I've offended it. The only real way to fix the error is to keep trying things until something works.

So what should we do in TINY and KISS? For the first one, I have the answer: TINY will support only the types Char and Integer, and we'll use the C trick of promoting Chars to Integers internally. That means that the TINY compiler will be much simpler than what we've already done. Type conversion in expressions is sort of moot, since none will be required! Since longwords will not be supported, we also won't need the MUL32 and DIV32 run-time routines, nor the logic to figure out when to call them. I like it!

KISS, on the other hand, will support the type Long.

Should it support both signed and unsigned arithmetic? For the sake of simplicity I'd rather not. It does add quite a bit to the complexity of type conversions. Even Niklaus Wirth has eliminated unsigned (Cardinal) numbers from his new language Oberon, with the argument that 32-bit integers should be long enough for anybody, in either case.

But KISS is supposed to be a systems programming language, which means that we should be able to do whatever operations that can be done in assembler. Since the 68000 supports both flavors of integers, I guess KISS should, also. We've seen that logical operations need to be able to extend integers in an unsigned fashion, so the unsigned conversion procedures are required in any case.