Secrets and Lies

Some days ago I helped a coworker with an oddly behaving Makefile. I am a long-time user of this tool and am no longer surprised at ‘make’ doing the unexpected in many subtle ways. This time the problem was that a bunch of source files in a recursively invoked Makefile were compiled with the host C compiler rather than the cross-compiler as configured. Make, in an attempt to ease the poor programmer’s life, pre-defines a set of dependencies with a corresponding set of re-make rules. One of these implicit rules states how to build an object file (.o) from a C source file (.c). The rule is somewhat like this:

.o: %.c
    $(CC) -c $(CPPFLAGS) $(CFLAGS)

And by default, the CC variable is set to ‘cc’, i.e. the default C compiler on Unix systems. Bear in mind that this is a recursively invoked make, therefore it is expected to be hidden at least one level away from the programmer. In the other hand the build has configured the top-level make to use the cross-compiler arm-linux-gcc. The problem could also happen because ‘make’ has a local scope for variables, i.e. variables are not exported by default to the recursively invoked makefiles.

The hard part in spotting the problem is that everything works as expected, i.e. the build operation completes without a glitch a you are left wondering why your shared libraries are not loaded on the target system.

Once you know, the problem is easily fixed, but if you are an occasional Makefile user you may experience some bad hours seeking what the heck is going on.

Hiding isn’t always bad – you need to hide details for abstraction and consider complex objects as black boxes to simplify their handling. One of the three pillars of OOP is “encapsulation”, which basically translates as data opaqueness, the object user is not allowed to peek inside the used object.
The question arises – how much “hiding” is good and how much is wrong?

The C compiler is hiding away from the programmer the nits and bits of assembly programming so that he/she can think of the problem with a higher level set of primitives (variables instead of registers, struct instead of memory, and so on).

If you want to go up with the abstraction level you must accept two things:

  • you are losing control of details;
  • something will happen under the hood, beyond your (immediate) knowledge;

Going up another level we meet the C++ language, with a greater deal of things working below the horizon. For example, constructors implicitly call parent class constructors; destructors for objects instantiated as automatic variables (i.e. on the stack) are invoked when the execution leaves the scope where the objects had been instantiated.

If you are a bit fluent in C++ these implicit rules are likely not to surprise or to harm you. If you consider a traditional programming language such as C, Pascal, or even Basic (!), you will notice quite a difference. In traditional language, you cannot define code that is executed without an explicit invocation. C++ (and Java for that matter) is more powerful and expressive by hiding the explicit invocation.

In many scripting languages (such as Python, Lua, Unix shell, PHP… I think the list could go on for very long) you don’t have to declare variables. Moreover, if you use a variable that has not yet been assigned you get it initialized by default. Usually an empty string, a null value, or zero, depends on the language. This could be considered handy so that the programmer could save a bunch of keystrokes and concentrate on the algorithm core. I prefer to consider it harmful because it can hide one or more potential errors. Take the following pseudo-code as an example

# the array a[] is filled somewhere with numbers.
while( a[index] != 0 )
{
    total += a[index];
    index++;
}
print total;

If uninitialized variable values can be converted to number 0, then the script will correctly print the sum of the array content. But, what if some days later I add some code that uses a ‘total’ variable before that loop?

I will get a hard-to-spot error. Hard because the effect I see can be very far from the cause.
Another possible error is from mistyping. If the last line would be written as:

print tota1;

(where the last character of “tota1” is a one instead of a lowercase L)

I would get no parsing and no execution error, but the total would be always computed as zero (or with some variations in the code, could be the last non-zero element of the a[] array). That’s evil.

I think that one of the worst implicit variable definitions is the one made in Rexx. By default, Rexx variables are initialized by their name in upper case. At least 0 or nil is a pretty recognizable default value.

Time to draw some conclusions. You can recognize a pattern – evil hiding aims to help the programmer to save coding time but doesn’t scale, good hiding removes details that prevent the program from scaling up.

As you may have noticed lately, the world is not black or white, many are the shades and compromises are like the Force – they could yield both a light side and a dark side. E.g. C++ exceptions offer the error handling abstraction, at the cost of preventive programming nearly everywhere to avoid resource leaks or worse.

Knowing your tools and taking a set of well-defined idioms (e.g. explicitly initialize a variable, or use constructor/destructor according to the OOP tenets) are your best friends.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.