Engineers are supposed to deal with systems, usually large and complex devices that are expected to work. They are expected to work at least, but the common expectation is that they keep working reliably under a wide set of circumstances and mostly regardless of the environment. Consider any system that results from an engineering process (say a TV set or a car) from this point of view – it is not enough that the device works fine only at the seller’s location. This should put under the right light one of the most used programmer defenses: “It works on my PC”.
For us, as programmers and programming team leaders, the point of concern is how to deliver such robust products that could reliably work even in non-clean room conditions. A lot of entropy has been sacrificed to this goal and I don’t want to start anything new or propose any existing process methodology, I’d just like to recall what a friend of mine reported from one of his senior coworkers (thanks Xté).
This is a quick analysis that allows you to understand how much in good shape your project (or task) is.
Consider two nearly binary variables – working and understanding. You can get four combinations that identify four states of the system composed of the object the engineer is working on and the engineer itself. Here we go to analyze the four states (I swear, I’ll be brief).
It doesn’t work and you don’t know why. This is the typical state of the project start, you have this black box not working as expected and you are supposed to fix or implement it. This is not so bad if you have enough time to study, ask other people, analyze, and get your knowledge on what you need to do to fix it. Summing it up, this can be fair or bad according to the time you have.
It doesn’t work and you know why. This is quite good. After all, knowledge is power. With the knowledge you can both devise strategies and solutions and figure out the time or the means you need. Moreover, you have plenty of facts to explain the situation to the management and ask for the most suitable resources you need to accomplish the task.
It works and you know why. Perfect, you achieved your goal. You fully understand your system, and why is it working so that you can predict to a good extent when and how it is going to work.
It works, but you don’t know why. This is the worst case of all. Unfortunately, the rush and the wrong assumption that if it works then who cares, can lead to this terrible situation. The real problem here is that you cannot make any assumption for when the system will cease to work, you have no clue how to deal with it both to repair, move, or change. You have no warrant that the system will continue to work.
The fact that it works usually leads the project management to consider it complete and to make pressure to move on. A false sense of security may affect the team cleaning the way for greater disasters.
There are some quick corollaries to this analysis. First, always try to understand what are you doing even if it may seem like a waste of time. Second, always include a learning time in your estimations. Third, poking randomly around to fix things is a dangerous way to further damage a system while creating the illusion of work.
Programmers strongly rely on tools, basically, there is direct contact with the matter we design and develop, our tools are our manipulators and probes into the hidden work of electrons. We have to know our tools by heart. We cannot go away with a rough understanding of the language we use because we cannot afford our ignorance would let something in.
The natural question that could arise is: “What is the extent required for the knowledge?”. Does programming require me to understand OS internals? Digital electronics? Semiconductor physics?
Well, it depends. Gone are the times when a single man could brace the whole human knowledge and art in his lifespan. Nowadays we have to stop at a given interface, taking for granted that what is going on behind us is good enough. I think that you have at least a good knowledge of the first interface you are using, be it OS system calls, environment libraries, or hardware if you are working closer to the metal. Anyway, an average knowledge up to the next barrier could only be good when you are hunting for problems since it could help you to better exploit the environment and to get helpful hints to get you out of trouble… troubles where engineers spend most of their time.