Tag: programming

Not just Chrome

The next step in Google’s march towards world domination appears to be Chrome. It is not that odd that Google aims for the web browser market, after all, they are providing web services and a tight integration with the client counterpart could just benefit their business. What is odd is that they try to enter the crowded market with a brand-new product. The most used web browser by far, at least according to the statistics is Internet Explorer which has the power of defaultness – i.e. it is the default browser on every Windows machine. Then, at a distance, Firefox is next widely used with a 16% share. Next is Safari at 3% and all the rest at 1% and below. For Chrome’s success, two factors are strongly needed – google needs to show marketing muscle and Chrome needs to be a superior product.

And this appears to be perfectly clear to Google, just have a look at the Chrome presentation.
Chrome is impressive for several things, the main reason is that it has been built from scratch. The Internet today is a very different place from what it used to be 10 years ago and architectures tailored for a simpler world may no longer suit today’s needs.

Computers as well have gone far on the roads of memory and computing power. What held as mandatory 10 years ago, may be ignored today. Take for example the “Threads are faster than processes” assumption, though still valid in principle, could be pointless on today’s hardware for most applications. Processes are safer, we know that from the dawn of computer science because different operations are insulated from each other.

Chrome will be open source. I think this is more of a need than a free decision. Today Google handles a lot of sensitive information – they have your mail, they have your documents, your photos, they know which sites you have visited, and so on. In change, you have free services, and their word they are not going to do any evil with all that knowledge about you.
A closed browser from Google would raise such privacy concerns that could seriously prevent its acceptance.

Finding the Right Path

After yesterday’s post, PaoloMan reminded me about how he and Piggi changed the CEngine pathfinding system simplifying the level design by at least an order of magnitude. In fact I had the intention of writing about their bread crumb pathfinding algorithm, but the good will got lost in some interruption of my editing. Anyway the matter is interesting and would be a pity to settle this matter with few lines embedded in another topic, so I am happy that my subattentive mind left out this yesterday.

https://youtube.com/watch?v=Kshhr7Ds7_4
In the beginning it was [https://www.maxpagani.org/CV-projectDesc-RS.html|Rainbow Six: Rogue Spear]. This was the first game of the studio on the then new GameBoy Advance platform with the quite ambitious goal of recreating the playing experience of the PC version of the game.
Pathfinding was needed for a number of actions of non-playing characters. Terrorists had to be able to find their way toward noises they heard, or to the last known position of their pals. Counter-terrorists had to manage to stay together following the player controller team leader.
A full A* algorithm would have been likely to be daunting on the poor ARM 7, so I devised a lighter system for all these pathfinding needs (well the idea was mine, but later I discovered it was already employed by other game engines). The idea was to define a routing network of nodes connected by straightly “walkable” rails. When a character needed to go from its actual position P to a remote position Q it queried the pathfinding system for the routing node closest to its position P. Then it asked the node for the next node in the network in order to get close to Q. In overly-simplified pseudo code it would turn out like this:

currentNode = findNodeClosestTo( P );
targetNode = findNodeClosestTo( Q );
walkTo( getNodePosition( currentNode ));
while( currentNode != targetNode )
{
    nextNode = getNextNode( currentNode, targetNode );
    walkTo( getNodePosition( nextNode ));
    currentNode = nextNode;
}
walkTo( Q );

Although the gameboy side of the algorithm was simple (even with extra code for a more realistic behaviour of characters) and fast (provided a way to quickly find the nearest node you can walk to) two non trivial problems had to be solved on the editor side.
First – networks had to be hand drawn on each map and – second – routing tables had to be computed for each node.
The reason for requiring manual work to draw networks was simply because we didn’t have enough programmer time to put the right amount of AI in the editors. One of the constraints of the project was the maximum reuse of GameBoy Color editors and tools we developed for [https://www.maxpagani.org/CV-projectDesc-RM.html|Rayman GBC] and minimally improved over the next two GBC platform games. This meant that editors didn’t understand the 3rd dimension.
So level designers had to hand draw the network taking care of connecting nodes with walkable line, trying to be very careful when more than one terrain level was involved. Also it wasn’t easy to test and debug the network. First you had to flash the game on a GBA and play it. A quick play throughout the level could not show any flaw even with a flawed network.
A properly designed routing network affected how realistic would be the movement of the characters, so networks were subject to several fine tunings.
Routing tables creation was not that hard, but the “easy way” my fellow programmers took first involved several hours of number crunching on the editor PCs. Level designers were quite mad at us for this daunting weight.
Lear produced a new and much more efficient construction algorithm, trimming down the network computation to a bunch of seconds.
But we were talking about Lara and the prophecy. Well it was clear that laying out routing networks was a big problem, luckily the AI needs for this game were quite modest. The pathfinding would have been employed for enemies to chase the playing character only after a walkable contact have been established.
PaoloMan and Piggi came out with a clever mechanism that allowed us to get rid of pathfinding networks and the related burden. The idea was to have the main character to drop virtual (and invisible on screen) breadcrumbs. When the chasing enemy loses direct sight of Lara (maybe she just turned around a corner), he can search the most recently dropped crumb for walkable way. If the most recent is not accessible, then he should go backward in time until he find a suitable one. It is like the main character defines a subset of a routing network just for the area where she is.
Talking about pseudocode, this approach would be something like:

// breadcrumb 0 is the current character position,
// 1 is the most recent and so on
breadcrumb = getBreadCrumb( Lara, 0 );
// find the most recent breadcrumb you can walk to
i=0;
while( !canWalkTo( breadcrumb ) && i < MAX_BREADCRUMB)
{
    ++i;
    breadcrumb = getBreadCrumb( Lara, i );
}
if( i==MAX_BREADCRUMB )
{
    // give up
}
// now just follows the trail
while( i >= 0 && ! canWalkTo( Lara ))
{
    walkTo( getBreadCrumb( Lara, i ));
    --i;
}
if( i > 0 )
{
    walkTo( Lara );
}

What I find notable is that, when working with games, standard solution to known problems (e.g. the A* for pathfinding) could be overwhelming or too daunting both to implement, to run and to manage. Having a clear vision of the needs of the game can point you to a proper, more efficient and simpler solution for the problem.

Free as in beer

Writing free software is like being paid for doing something you would do anyway, but without the “being paid” part.I suspect that the double meaning of the english word “free” is causing a major damage to the software industry. Although I plenty support the “free-as-in-speech” concept for the software, I am quite contrary to the “free-as-in-beer” wildly applied to every kind of software.
I consider “free-as-in-speech”, as I understand it, a sort of right of the customer – she/he is entitled (possibly for an extra) to have the source of the software you bought. That makes sense because your needs may be different from those of anyone else, and, in this way, you can customize the software to suit your needs.
You pay professionals to write an industrial strength, well polished, product, then you twiddle the ends to match your environment.
The “free-as-in-beer” is quite the opposite, you get the software for free sources and all, then, if you need, you pay someone to fix the loose ends.
From the customer point of view this is great. It would be like someone designs and builds you house for free, then you decide to keep it as is, or to pay someone to move a wall or a door.
From the software industry is a major damage at two entangled levels – money and competition. This model pushes much less money in the developers’ pockets, because customers pick free alternatives and even if they decide to pay someone for customizing them, the total money are less than what would be if everybody paid for a non-free product.
The “free-as-in-beer” has moved in the past 30 or so years, from filling empty niches for small utilities (where it could make sense) to competing against full featured applications. As Netscape teaches – you can’t compete with something given away for free.
Competition against a free product is hard, not to say, impossible. You have to compete on quality, innovation and features. And all those cost money. Though there are exceptions, free software usually tends to copy innovation from industry leaders. Quality is hard to achieve, but quality itself does not sell. What sells is “perceived quality”, i.e. the quality that the customer believes your product has. This is even harder to achieve because you have to issue focus groups, interviews, you have to work on your brand and promote it. That means a lot of money, too.
Features is another hard field, because most users exploit a few percents of all the feature-load that comes with the application. It is hard to invent something new that could appeal the customer to make a choice. New features come either from increased computing power and from research. Computing power is provided by the hardware manufactures and it is out of the developers control. Research is expensive if done in-house.
This train of thoughts brings me to the following question: “Why are we doing this?” Why programmers are happy to work for free and not, say, dentists? Ok, let’s take a less dependable example – plumbers. Why plumbers do not their job for free? I suppose that there are two reasons for this. The first is that a plumber’s work consumes materials. They need to buy and lay pipes that are not for free. A programmer does not consume any material, it is just a matter of time. Next the work of a plumber is not freely replicable, i.e. if the plumber installs a building, than he can’t copy’n’past his work on the next building.
It is all about perception – programmers are caught in a trend that is grinding music and movie industries. Among the three industries the software one is the one in the worst position, in fact the other two can count on well established labels and brands – usually you don’t consider novice singers and amateur musicians as a free alternative, a valid replacement of the work of known composers and singers.
Then there are economic interests – IBM and Sun are two of the most prominent free-as-in-beer software supporters and it has nothing to do with philanthropy. IBM and Sun business is selling hardware, usually expensive hardware and free software helps in selling more hardware because customers do not need to pay software licenses when upgrading or expanding their installed base. Google also sponsors good free-as-in-beer (but not free-as-in-speech) software because their business is advertising and free software provides the vehicle for their business.
All said, what are the chances for us programmers in the future to be paid for programming? I think there are three options – work on integration, i.e. customizing free software to suit specific user needs; work for niches where no free software exists; work for IBM, Sun or Google where people gets paid to write free-as-in-beer software.

c++0x and auto

According to the acronym C++0x, we shouldn’t be waiting for the next language standard more than one year and half. According to the uneasiness I read in the messages from the committee board, maybe the waiting could last slightly more. Anyway the next C++ is about to debut. By reading what is going to pop up in the next standard, I got the strong impression that this is another step away from the original language. Let’s face it, C++ 98 (ISO/IEC 14882:1998) is a first step away from the original, non-standard language that included templates as an afterthought. It took years for compiler vendors to reach compliance, leaving the developer community in a dangerous interregnum, a nobody’s land where portability and maintainability concerns where stronger than writing proper code. Also the standardization process left a number of lame aspects in the language – the iostream library, inconsistencies in the string class and the other containers, a mind boggling i18n support, no complete replacement for C standard library, just to name the firsts that come to mind.
The next standard seems to take a number of actions to plug the holes left in the previous one and a number of actions to define a new and different language. For example there will be a new way to declare functions, that will make appear pale an unoffensive the transition from K&R style to ANSI C. What today is declared as:

int f( char* a, int b )

is going to be declared also as:

[] f( char* a, int b ) -> int

I understand there’s a reason for this, it’s not just out of madness, nonetheless, this is going to puzzle the Joe Average developer. Once the astonishment for the new notation is expired, how is he supposed to declare functions?
The impression I got about C++0x going to be a different language has also been reinforced by a sort of backpedaling on the “implicit” stuff.
C++ has a lot of stuff going on under the hood. Good or bad, you chose, nonetheless you get by default a number of implict stuff, e.g. a set of default methods (default constructor, destructor, copy constructor and assignment operator), and a number of implicit behaviour, such as using constructors with a single argument as conversion constructor.
Now this has been considered no longer apt for the Language, so modifiers to get rid of all this implicit-ness has been introduced. E.g. conversion operators may be declared “explicit” meaning that they will not be used implicitly when an object is evaluated in a suitable context. In a class each single default method can be either disabled:

class Foo
{
    public:
        Foo() = delete;
};

Or explicitly defined as the default behaviour:

class Foo
{
    public:
        Foo() = default;
};

 

 

Again I see the rationale behind this, but I find that changing the rules of the language after 30 years of its inception is going to surprise many a developer.
One of the most welcomed addition in the new standard, at least in my humble opinion, is the new semantic for the auto keyword. If you use the STL part of the standard on a daily base, I’m quite sure you are going to agree. Let’s take for example something like:

std::vector< std::pair<bool,std::string> > bar;

After some manipulation say you want to sweep through the vector with the iterator idiom. You can wear a bit your keyboard by writing the short poem:

for( std::vector< std::pair<bool,std::string>>::iterator i=bar.begin(); i != bar.end(); ++i ) ...

 

 

I usually go for a couple of typedef so that the iterator type can be written more succinctly. The new standard allows the programmer to take a shortcut. If the type of the iterator is defined by the return type of bar.begin() then it could be catch internally and used to declare i. That turns out as:

for( auto i=bar.begin(); i != bar.end(); ++i ) ...

As you see, this is extremely more readable (and writable altogether).
Well, well, well, too bad we have to wait at least one year for the standard and an uncountable number of years before vendors update their compilers. But, if you use GNU C++ then you may not be helpless.
GNU C++ implements the typeof extension. Just like sizeof evaluates to the size of the type resulting from the expression to which it is applied, typeof evaluates to the type resulting from the expression to which it is applied. E.g.:

int i;
typeof( &i ) // evaluates to int*.

(this works much like the decltype keywords of the next C++ standard). Given that the expression to which typeof is applied is not evaluated, then no side effects could happens. And this calls for the handy preprocessor:

#define AUTO(V__,E__) typeof(E__) V__ = (E__)

Now this macro does much the yet-to-come auto keyword does:

for( AUTO(i,bar.begin()); i != bar.end(); ++i ) ...

Note that typeof doesn’t do well with the references, so, in some cases (such as the example below) could behave unexpectedly:

#include
#include

std::string foo( "abc" );

std::string const& f()
{
    return foo;
}

int main()
{
    AUTO( a, f() );
    std::string const& b = foo();
    std::cout << "a=" << a << "n";
    std::cout << "b=" << b << "n";
    foo = "cde";
    std::cout << "a=" << a << "n";
    std::cout << "b=" << b << "n";

    return 0;
}

If your compiler of choice doesn’t support typeof (or decltype), then you have to wait for it to become C++0x compliant.

Deadlock in practice

Hardly you can get a better idea of what a deadlock is than looking at this picture:
(from The daily WTF)
Incidentally this appears to be not just a real life deadlock, but also the results of software bug as most of the deadlocks I’m faced with turn out to be.
I also guess that most of the drivers and passengers involved in the jam would have heartedly approved that extra bit of testing, or that extra money required to hire an experienced professional that would have prevented a deadlock-avoiding device such as a traffic light transformed itself in the deadlock-causing device.
Talking about problem-solving, how would you recover from this situation?

I am very old

It is official, at least according to the young man (he’s 20, I can’t call him a boy) I give private programming lessons to. He said that his class is full of “old people” – most of them is even 30 or older! Scary! Anyway I have been programming for more years of my life than not and most of them have been spent on the C language. Maybe that classifies me as very old nonetheless. After all C had been designed and developed nearly 4 decades ago, when I was 4.
One of the question in an exam attended by my tutored boy was:

Define the result of the following expression given x=N and y=k

x = (y+1>x) ? x++ : y++;

Suddenly a bell rang in my head. Red alert was buzzing at full-volume, but another bell was ringing as well.
The main red alert was labeled as “Undefined Behavior”. Every bad thing could happen when U.B. is invoked by the unknowing programmer. If he is lucky then he just gets something out of the order he expects, otherwise he can blow everything up.
Back in the days where Real Men wrote their own compiler, the C designers decided to relax some constraints in the language semantics so that more aggressive optimizations could be implemented in the compilers. So within a C expression it is not defined the relative order of the side effects. If two or more side effects apply to the same operand, then you have a problem. Moreover the assignment operator (=) in the C language is just an operator with a side effect, so the expression:

x=x++;

is undefined behavior.
The day after I investigated on the C language FAQ and found that the ternary operator (? 🙂 introduces sequence points (a jargon to say that all the side effects have to happen before a given point in a language phrase), so the expression may, after all, be well defined.
The scaring part is that a random C programmer like me, with over 20 years of exercise in the language, could be baffled by such expression and unable to tell apart the result of such a line.
And that’s why the second buzzer triggered. Why young blank minds have to be troubled with such visions? If a program contains one expression like that chances are high that it is a bug. If it is not a bug you may well count it as such because you are going to have a hard time figuring out what’s going on in that code.
There are some values in programming that ought to be taught before putting hands to the keyboard. Simplicity is one of the most valuable principles – keep it simple. You have to understand it, even after months, others have to read and understand it as well.
Simplicity is meant to fight against the common distortion of the young programmer trying to assert himself considering (and unfortunately, writing) “concise, hard to read” for good code. “Hey, I bet you cannot find what this piece of code does”.
I think teachers should make serious effort, especially when dealing with languages such as C, where cryptic and short code could be easily written, to discourage one-liner approaches and striking that the great programmer is the one who is able to tackle complex subjects with simple and clear code.
Writing this could be another sign of my age…

Out of the way

It is a few day I have this thought in and out my head. There is no particular fact that triggered this off, I think it’s just some random observations.We programmers like computers and software. That’s obvious – we find them entertaining, funny, sometimes even a sort of lifestyle. Wouldn’t it be the case we would have chosen something different, more economical rewarding (such as the plumber), more socially oriented (e.g. the plumber) or even more funny (water’n’pipes are fun, aren’t they?).
I find than more than often our programs have features and behaviors just to amuse the other programmers or even ourselves, forgetting that real users, those who will use our program to do some actual work, aren’t so fond of computers beside having some actual work to do.
I think we should enforce a new paradigm for program development: the computer is actually an obstacle, a barrier between the user and her goal. The software has to clean the road and keep off the way.
Just ask yourself if that “Error Blah blah” dialog box you are coding is something that helps the user to reach his goal and perform his task, or is just something that will annoy him. Is really a user fault if the file has the wrong name? Are we really sure that the user has to know what a file is?
For another example consider the need of a user to deliver an amount of data (files) to another user. The obvious choice, since everyone has an email account, is to use it. That is the need for the user and the clear, logical solution she sees.
So, the size limit for attachments is just an obstacle to the activity she has to perform. But the real solution is not to grow the limit, but to take the limit off. If the mail protocol is not suitable for bulk transfers, then the software has to use other means. Why can’t it arrange an ftp transfer? Sure it’s not as easy as I put it, but it definitively can be done.
On the same issue, there are users that fall in love with technology and forget any other sensible way of doing things. So their activities could become extremely time wasting. I needed a reference on how to attach cables to an electronic board for testing. I looked in the usual repository just to find every kind of documentation but the one I was looking for. I asked the guy who was supposed to provide that kind of information and he told me that he was unable to provide the board layout because the software is not yet able to do it and that we need another module for … and so on. I took paper and pen and sketched the layout on a sheet of paper, then scanned it and put on-line on the server. It took me about 20 minutes because I had to look up the schematics. The document I produced is not polished, but… who cares?

Software project failures

Yesterday I stumbled upon an article about why software projects fail. Though hardly you can find any surprising claim while reading, it is nonetheless a quick paper you can hand over to your colleagues and/or managers when trying to enhance the chances for a successful project.What I find quite interesting is the table near the beginning, the one titled “Project Challenged Factors”. What draw my attention is that the “other” item weights more than each other single entry in the top ten. In other words the causes for challenging projects are so widespread and so equally important that there is no a main culprit.
We programmers are working on a so complex and delicate mechanism that any one in hundreds of factors could drive the entire stuff crazy. On the other hand the number of variables involved is so high that it is not possible to shot at one target to likely ensure project reliability.
Take software requirements for example, (missing or incomplete accounts for a 12,3%), while I am not advocating about entering a project without them, it is clear that in some circumstances the lack of such documents is not dooming per se. Take for example an expert team that knows all the ins and outs of the domain and has a crystal clear idea of what is needed. In this case maybe the project could get along with no SRS.
So one might be tempted to strictly follow every “best practice” to minimize the risks of a failure. But this is not likely to work either – every practice has its cost and some practices (take the “code review”) are extremely daunting.
Therefore is all a matter of balancing, picking the right decisions, having good reflexes and willing to occasionally work harder to fix what has gone wrong.
Following this line of thought the best insurance you can have for your projects is an experienced and well jelled team with: a) an experienced leader, b) a good success history. This isn’t rocket science either, you can find similar assertions in many software project management books (Peopleware is the first coming to mind).
On the other hand projects keeps failing. This leads me to two considerations, first is that even with the best premises a challenging project could fail – we have to accept this for today’s projects and try to minimize the failure impact. Second how is changing the failure rate in time? The first study quoted in the article is more than 13 years old! That’s a huge time in software industry.

Spaces within

“Unix doesn’t works properly with filenames with spaces”. This assertion from a coworker of mine prompted my harsh reply: “Unix works perfectly with spaces, it’s just programmer sloppiness that prevents it from properly handling blanks”. I was true (as always, I would humbly add) but my pal wasn’t completely wrong, at least when talking about the specific subpart of Unix named shell. Although far superior than MS-DOS batch shell (that’s about the same command line you find in Windows) I bet it originated in more or less the same way – a core with functionalities clustered over during time in response of new needs or new opportunities.
Unix shell (be it bash, ksh, zsh or fish) is nowadays a powerful programming tool allowing the programmer to craft rather complex artifacts. This scope is for sure much broader than the one envisioned by first developers, and this turns out in multiple ways to do the same thing, different ways to do similar things and cryptic ways to do simple things.
The Unix command line conception dates nearly 40 years back in time! Things were pretty different, but I won’t annoy you with details, just leave your imagination wild… likely it was even worse. Fish is a recent attempt to overcome most of the shell problems, but it is not widespread as bash could be. As put by a professional some time ago: emacs may have tons of neat features, but you are SURE you’ll always find vi on any Unix, while you are not certain you’ll have emacs. So better invest your learning time on vi.
Well, back to shell. What’s wrong with blanks? The main problem is that a space is a valid character in a file name and, at the same time, it is a separator for command line arguments. Back to when every single byte could make the difference, it seemed the right thing to do to have optional quotes if the filename doesn’t contain any space. So you can write:

$ ls foo/

To get the listing of directory foo, but you have to write:

$ ls "bar baz/"

if you want the listing of directory “bar baz” (or you could escape the space with a backslash). This could be boring on interactive shells, but is usually overcome by the auto-completion feature (type ‘ba’ then tab and the line gets completed with the available options, in this case: bar baz).
From boring it turns in the range from annoying to irritating in shell scripts, where variables are not real variables like those you are used in high level languages, but just convoluted macros. For example:

a="bar baz"
ls $a

is processed and interpreted as:

ls bar baz

As you see the quotes disappears because they are processed by the assignment to put bar+space+baz in the ‘a’ variable. Once ‘$a’ is expanded, quotes are just forgotten memories. In order to write proper shell scripts you have to do something like:

a="bar baz"
ls "$a"

Of course this is error prone, because not only the syntax is valid, not only the script is likely to work perfectly in the simple test case the programmer uses to test the script, but also it is likely to work fine most of the times. After all the space character is used only by those Windows naïve users that aren’t aware of the blanks-hating-device they keep hidden under their desks.
Well, I proud myself of writing space-safe shell scripts, at least until I tried to write a script to find duplicated files on a filesystem.
The goal is simple, after many virtual relocations, multiple pet-projects and home-works, I have many files scattered around with the same content. It is not a matter of space saving, rather it is a question of order. Avoid redundant information or make sure that it is really the same stuff.
My design was to have a command similar to ‘find’, something that accepts any number of directories or files on the command line, such as:

$ find_dupes dir1/ dir2/ file …

Shell has two ways for operating this pattern – use the shift command or use one of the special variables $@ and $*.
The first way is useful if you want the shell to process one argument at time, while the latter is handy when you want to relay the command line to a command. In my case I wanted to pass the entire command line to the ‘find’ command, something like:

$ find $@ -type f -exec md5sum {} ;

This line works fine until a filename with space is encountered. In this case, since variables are indeed macros, a single argument with space is expanded into two (or more) distinct arguments. And there is no way to work around the limitation, unless you read the manual J. In this area, the discoverability of bash is quite lacking. The man page states that $* expands in the sequence of arguments separated by the first character of the IFS environment variable. E.g. if IFS is set to dash (‘-‘) and the command line has the following arguments foo bar baz, then $* expands to foo-bar-baz.
Conversely $@ expands to a space separated sequence of arguments, but if you enclose it in quotes, then single arguments are expanded in quotes. E.g. $@ expands to foo bar baz, and “$@” expands to “foo” “bar” “baz”. Eventually this is the solution.
So, basically it is true that Unix has no problem whatsoever with spaces inside filenames, it is also true that shell programming can handle them as well and ultimately is up to programmer sloppiness if the batch script fails, but it has to be recognized that a great effort and investment is required to the programmer to climb out his sloppiness.

Mind if I comment?

When I worked at Ubisoft I conducted some job interviews for programmer positions. As the interviewer, the company provided me with a grid of questions/topics to ease the candidate evaluation. A list of questions, when not taken too rigidly or in a quiz-like fashion, offers the interviewer a good source for firing up some interesting talks.One of these questions was “What are the qualities of a good source code?” In fact this question is open ended enough to let the interviewed express himself (or, in very rare occasions, herself) on what he (she) considers good or bad programming habits, helping the interviewer understanding the programming-wise maturity of the job applicant.

Counter-intuitively, one of the most common answers I got first was not “don’t goto” as you may expect from school fresh programmers, but “comments”.

The virtues of commenting are taught in every entry level programming class and I think that “commenting” is even more agreeable for everyone than “don’t goto”. After all it takes a few days off the code you were writing to recognize the importance of stuffing your work with the code equivalent of post-its.

What actually most teachers fail to convey to their audience is that comments are just a tool, actually quite easy to abuse and thus causing the opposite of the desired effect. Misuse of comments reduces code readability.

I’m going to demonstrate how to actually remove comments while IMPROVING readability. Just don’t misquote me; I am not against comments at all. If in doubt, comment your code! What I’m going to show you is that by removing comments AND carefully crafting your code the readability will improve.

First: terse is beautiful. Consider prose for example, the shorter the sentences the more comprehensible. Comments add repetitions, extra information and dilute what’s going on with redundancy. If we could write good code first, then, maybe, we could leave out most of the comments.

Moreover comments burden maintenance. An old programmers’ saying tells that if a comment mismatches with the commented code then both are wrong. Maybe this is not always true, at least not for the most trivial cases, but it for sure marks a point – comments have to be maintained. An outdated comment may be from disturbing to misleading causing code maintenance problem. In fact rather than helping you in figure out the way the clockwork turns it confuses your understanding.

Let’s start trimming from variables. It is way too common to comment variable declaration:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
int b; // number of bits set to one
int b; // number of bits set to one
int b; // number of bits set to one

Why can’t b be named as oneBitCount? By picking the right name not only we avoid the comment, but we’ll make the lines where this variable is used more clear – our brain need not to remember what b is and we don’t have to look it up in the declaration section.

Using b instead of a more meaningful name denotes two levels of laziness – typing laziness and naming laziness. About the first, a good editor could do much. My editor of choice allows me just to type the first letters of any symbols and then cycle through all the available completion in the file, or the included files or the ctags file.

About the second, there’s no excuse, take your time to chose right names, the ones that best describe the purpose of what they identifies. Names are powerful handles to concepts, the right name is not just for better readability, is for better grasping and processing ideas.

This could be the right time for a little digression on Hungarian Notation, but I don’t want to get this post too crowded, better save it ol’ HN for another time.

Another common misuse of comments is to mark the semantic of a block of code within a function. E.g.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
// count bits set to 1
b=0;
while( v != 0 )
{
if( v & 1 )
{
++b;
}
v >>= 1;
}
// ... the function continues
// count bits set to 1 b=0; while( v != 0 ) { if( v & 1 ) { ++b; } v >>= 1; } // ... the function continues
// count bits set to 1

b=0;

while( v != 0 )

{

    if( v & 1 )

    {

        ++b;


    }

    v >>= 1;

}

// ... the function continues

This kind of code is easily turned into functions and the block can be reduced to the single, very clear, line:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
b = countBitsAtOne( v );
b = countBitsAtOne( v );
b = countBitsAtOne( v );

This has also a number of side beneficial effects such as reducing the size of the including function, clearly defining inputs and outputs of the code block, clearly marking where the code block ends.

In other words not only the code is more readable, but it becomes automatically more maintainable because the maintainer has less chance to mess something up.

I have heard two objections to this approach – execution inefficiency and programmer inefficiency. The first is very unlikely since a common optimization operation performed by every compiler is to make an inline copy of short functions where they are called. For long functions the call cost is not relevant. Anyway the old optimization tenet is always to be remembered and applied – measure before optimizing. The programmer judgment is far from fair in most but the trivial programs. (There is another good rule about optimization: Don’t.)

About programmer efficiency, it may be true that a newbie codes like a rocket and feels that spending time in anything but writing code is a waste of time, but the result is usually a write-only source code that no one is willing, nor able to fix or modify. Moreover the time taken to write the good code at first time is much less than writing bad code and come back later to refactor it in something more convenient. In fact for most code blocks is very difficult to pick up inputs and outputs and be sure the no side effect has been missed in refactoring.

This very same technique can be employed to reduce the size and improve readability of large switch-case statements.

There is another use of comments similar to the one just described. An obscure piece of code is commented with the intended functionality. This goes in a quite grey area. I mean, there are times that code has to be obscure because it is handling unusual or complicate stuff. So explanation for, let’s say, floating point math implementations is more than welcome. In all the other cases it is just smoke hiding what’s really going on. There may be several causes (ego, job insurance, pretending being smart…), but the result is always the same unreadable code. Here holds the principle: don’t do anything that may surprise the reader.

Write simple code unless there is a compelling reason to do otherwise.