Blog

Spaces within

“Unix doesn’t works properly with filenames with spaces”. This assertion from a coworker of mine prompted my harsh reply: “Unix works perfectly with spaces, it’s just programmer sloppiness that prevents it from properly handling blanks”. I was true (as always, I would humbly add) but my pal wasn’t completely wrong, at least when talking about the specific subpart of Unix named shell. Although far superior than MS-DOS batch shell (that’s about the same command line you find in Windows) I bet it originated in more or less the same way – a core with functionalities clustered over during time in response of new needs or new opportunities.
Unix shell (be it bash, ksh, zsh or fish) is nowadays a powerful programming tool allowing the programmer to craft rather complex artifacts. This scope is for sure much broader than the one envisioned by first developers, and this turns out in multiple ways to do the same thing, different ways to do similar things and cryptic ways to do simple things.
The Unix command line conception dates nearly 40 years back in time! Things were pretty different, but I won’t annoy you with details, just leave your imagination wild… likely it was even worse. Fish is a recent attempt to overcome most of the shell problems, but it is not widespread as bash could be. As put by a professional some time ago: emacs may have tons of neat features, but you are SURE you’ll always find vi on any Unix, while you are not certain you’ll have emacs. So better invest your learning time on vi.
Well, back to shell. What’s wrong with blanks? The main problem is that a space is a valid character in a file name and, at the same time, it is a separator for command line arguments. Back to when every single byte could make the difference, it seemed the right thing to do to have optional quotes if the filename doesn’t contain any space. So you can write:

$ ls foo/

To get the listing of directory foo, but you have to write:

$ ls "bar baz/"

if you want the listing of directory “bar baz” (or you could escape the space with a backslash). This could be boring on interactive shells, but is usually overcome by the auto-completion feature (type ‘ba’ then tab and the line gets completed with the available options, in this case: bar baz).
From boring it turns in the range from annoying to irritating in shell scripts, where variables are not real variables like those you are used in high level languages, but just convoluted macros. For example:

a="bar baz"
ls $a

is processed and interpreted as:

ls bar baz

As you see the quotes disappears because they are processed by the assignment to put bar+space+baz in the ‘a’ variable. Once ‘$a’ is expanded, quotes are just forgotten memories. In order to write proper shell scripts you have to do something like:

a="bar baz"
ls "$a"

Of course this is error prone, because not only the syntax is valid, not only the script is likely to work perfectly in the simple test case the programmer uses to test the script, but also it is likely to work fine most of the times. After all the space character is used only by those Windows naïve users that aren’t aware of the blanks-hating-device they keep hidden under their desks.
Well, I proud myself of writing space-safe shell scripts, at least until I tried to write a script to find duplicated files on a filesystem.
The goal is simple, after many virtual relocations, multiple pet-projects and home-works, I have many files scattered around with the same content. It is not a matter of space saving, rather it is a question of order. Avoid redundant information or make sure that it is really the same stuff.
My design was to have a command similar to ‘find’, something that accepts any number of directories or files on the command line, such as:

$ find_dupes dir1/ dir2/ file …

Shell has two ways for operating this pattern – use the shift command or use one of the special variables $@ and $*.
The first way is useful if you want the shell to process one argument at time, while the latter is handy when you want to relay the command line to a command. In my case I wanted to pass the entire command line to the ‘find’ command, something like:

$ find $@ -type f -exec md5sum {} ;

This line works fine until a filename with space is encountered. In this case, since variables are indeed macros, a single argument with space is expanded into two (or more) distinct arguments. And there is no way to work around the limitation, unless you read the manual J. In this area, the discoverability of bash is quite lacking. The man page states that $* expands in the sequence of arguments separated by the first character of the IFS environment variable. E.g. if IFS is set to dash (‘-‘) and the command line has the following arguments foo bar baz, then $* expands to foo-bar-baz.
Conversely $@ expands to a space separated sequence of arguments, but if you enclose it in quotes, then single arguments are expanded in quotes. E.g. $@ expands to foo bar baz, and “$@” expands to “foo” “bar” “baz”. Eventually this is the solution.
So, basically it is true that Unix has no problem whatsoever with spaces inside filenames, it is also true that shell programming can handle them as well and ultimately is up to programmer sloppiness if the batch script fails, but it has to be recognized that a great effort and investment is required to the programmer to climb out his sloppiness.

One more rant

Oh, I forgot yesterday, maybe it wasn’t yet clear in my mind and I needed some more dig and headscraping on laws and funds to have it more vivid. Our government is trying to protect Italians from harming themselves by not subscribing an integrative pension fund. This is for sure a laudable intention – I warn you, if you want to survive the 3rd age, you have to put aside some money.
The problem is that this strongly contrast with the current fund schemes that offer several lines of investment with different aggressiveness and features. In other words the money that the government forces you to put aside to grant a wealthy old age are allowed to go into the slot machine of the financial trades and you could end with empty hands even emptier than if you had put money in the mattress. The law requires only the most conservative line to grant the sum you put into (that is 0% interest, you are nonetheless wasting money in taxes and losing power of purchasing because of the inflation). The law set the TFR interest rate as a goal, not as a requirement for the most conservative line. Nothing happens if the goal is not reached.
Even worse the law allows to chose more aggressive investment that could bring higher positive rates or dragging you into losing money, maybe helping fund managers interests but going against the noble goal to avoid social problems when the current working class will reach the retirement age.
There is an underlying problem of trust, lacking of trust. If the government wants Italians to trust this new pension form, then the new form must grant everything the old one granted. The law must grant at least the same rate and at least the chance to get all the money at retirement.
After all if everyone is so staunch about the TFR performing worse than the financial fund why don’t they grant the fund to perform at least as bad as the TFR?

A Bleak Future?

What awaits for us in the midst of the future? How current choices are going to impact on our future existance? Public pensions has been for a long time now a solid foundation for Italian families. Pensions allowed parents to keep their sons and daughter with them, helping them to have a roof and warm meals while occasionally earning 1000€ per month. Basically elderly pensions balanced precarious and occasional work, unemployment and dramatically high house prices.
There’s no way for a young couple to start a life together in any of the Northern Italy cities with such poor wages, with the ghost of sudden unemployment and such high apartment rents, not to talk about buying one.
(Well not that a young couple could start a life together if they live with their parents, neither… that could be part of the reasons for Italy has such a low natality rate).
Unfortunately the Italian pension system is lame.
Rather than implementing a system where everyone put aside a bit of money that are wisely invested and then given back as a life annuity after worker retirement, politicians and unions decided a long ago an inter-generation agreement. Under this agreement the current generation of workers pays the pensions of the current generation of retired. Brilliant? Maybe until the money set aside for pensions of the current work class is enough to pay for the current pension expense.
What is going wrong is that Italians make less and less babies and tend to survive (despite of the public health care) longer. So the agreement between generations is broken, the promise that we workers are going to receive the same treatment we are providing to our elders is void.
According to current law, when the time will come to retire I’ll have an annuity of about 40% of the average salary of the past 5 years. Not enough to survive unless I’m starting a extra bright career right now.
Although this trend is clear from about 15 years ago, only now drastic decisions have been taken. Workers money (the so called TFR a form a delayed salary awarded when leaving the job) have been confiscated leaving little or no choice to the former owners. Well, it isn’t completely true, you can leave everything as is, but you are on the verge of a cliff. The idea is to force most of the workers to move that money into complementary contributive pension funds. And when in a fund this money is no longer completely yours. You cannot move in another kind of investment or get them back to the former delayed salary form. Moreover you get no real warranty. If you chose for a conservative fund line you are granted that you will have back your money, no interest. Given the current inflation levels, that means that you could lose 2.5% of purchasing value of your money per year. The delayed salary is bound by law to produce an interest of 1,5% + 0,75 * inflation.
All this funds are privately held and what you get is an eased fiscal drag on the earnings, a bit more convenient than the TFR (19% against 23%).
The worst nightmare is not only having a ludicrous public pension, but losing every bit of the TFR money thanks to financial market volatility and irresponsible fund choices.
We have time up to the end of the month to decide what to do with our money, what is sure is that there will be lot of problems (the social kind) when the time will come for us to retire – precarious workers won’t have any pension at all, excessively conservative people won’t do any complementary form and will have just the TFR to add to the 40% of the last salaries, a number of funds will have problems (Cirio, Parmalat, Enron and 09/11 don’t ring any bell?) returning little or no money.
But there is a loop, the only way to provide a better future is to increase current wages so that everyone could save for retirement to achieve at least a decent life. Traditional manufacturing is in a difficult position with emerging country economies that allow for extremely low work costs. We have to do like the rest of Europe and follow the advanced services and technologies direction, we have to focus on industries where the cost of work is negligible respect to the cost of delivered good or service. And to do this we need both public investments in Research, in Universities, we need easier way to start business activities and to attract foreign investments in the advanced industries.
Are we hopeless?

Alsace on-line

The more ESP talented of you have already noted that I made our [images/photoalbum/Alsace/200704/index.html|last vacation photos] on-line. For those who can read Italian I just uploaded the [downloads/Diario_Alsazia_2007.pdf|travel log] as well.I hope you enjoy both. As usual comments and critics are welcome.

RSS broadcasting

This feature is still beta, but I am so proud, I’m going to tell you without waiting for a complete bug-free version.If you point your favourite feed reader to [https://www.maxpagani.org/blog.php?rss] you will receive my blog post without having to visit daily (or so) my website.
If your favourite feed reader is Firefox Live Bookmarks, then maybe you HAVE to wait for a more bug-free version 🙂

Mind if I comment?

When I worked at Ubisoft I conducted some job interviews for programmer positions. As the interviewer, the company provided me with a grid of questions/topics to ease the candidate evaluation. A list of questions, when not taken too rigidly or in a quiz-like fashion, offers the interviewer a good source for firing up some interesting talks.One of these questions was “What are the qualities of a good source code?” In fact this question is open ended enough to let the interviewed express himself (or, in very rare occasions, herself) on what he (she) considers good or bad programming habits, helping the interviewer understanding the programming-wise maturity of the job applicant.

Counter-intuitively, one of the most common answers I got first was not “don’t goto” as you may expect from school fresh programmers, but “comments”.

The virtues of commenting are taught in every entry level programming class and I think that “commenting” is even more agreeable for everyone than “don’t goto”. After all it takes a few days off the code you were writing to recognize the importance of stuffing your work with the code equivalent of post-its.

What actually most teachers fail to convey to their audience is that comments are just a tool, actually quite easy to abuse and thus causing the opposite of the desired effect. Misuse of comments reduces code readability.

I’m going to demonstrate how to actually remove comments while IMPROVING readability. Just don’t misquote me; I am not against comments at all. If in doubt, comment your code! What I’m going to show you is that by removing comments AND carefully crafting your code the readability will improve.

First: terse is beautiful. Consider prose for example, the shorter the sentences the more comprehensible. Comments add repetitions, extra information and dilute what’s going on with redundancy. If we could write good code first, then, maybe, we could leave out most of the comments.

Moreover comments burden maintenance. An old programmers’ saying tells that if a comment mismatches with the commented code then both are wrong. Maybe this is not always true, at least not for the most trivial cases, but it for sure marks a point – comments have to be maintained. An outdated comment may be from disturbing to misleading causing code maintenance problem. In fact rather than helping you in figure out the way the clockwork turns it confuses your understanding.

Let’s start trimming from variables. It is way too common to comment variable declaration:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
int b; // number of bits set to one
int b; // number of bits set to one
int b; // number of bits set to one

Why can’t b be named as oneBitCount? By picking the right name not only we avoid the comment, but we’ll make the lines where this variable is used more clear – our brain need not to remember what b is and we don’t have to look it up in the declaration section.

Using b instead of a more meaningful name denotes two levels of laziness – typing laziness and naming laziness. About the first, a good editor could do much. My editor of choice allows me just to type the first letters of any symbols and then cycle through all the available completion in the file, or the included files or the ctags file.

About the second, there’s no excuse, take your time to chose right names, the ones that best describe the purpose of what they identifies. Names are powerful handles to concepts, the right name is not just for better readability, is for better grasping and processing ideas.

This could be the right time for a little digression on Hungarian Notation, but I don’t want to get this post too crowded, better save it ol’ HN for another time.

Another common misuse of comments is to mark the semantic of a block of code within a function. E.g.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
// count bits set to 1
b=0;
while( v != 0 )
{
if( v & 1 )
{
++b;
}
v >>= 1;
}
// ... the function continues
// count bits set to 1 b=0; while( v != 0 ) { if( v & 1 ) { ++b; } v >>= 1; } // ... the function continues
// count bits set to 1

b=0;

while( v != 0 )

{

    if( v & 1 )

    {

        ++b;


    }

    v >>= 1;

}

// ... the function continues

This kind of code is easily turned into functions and the block can be reduced to the single, very clear, line:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
b = countBitsAtOne( v );
b = countBitsAtOne( v );
b = countBitsAtOne( v );

This has also a number of side beneficial effects such as reducing the size of the including function, clearly defining inputs and outputs of the code block, clearly marking where the code block ends.

In other words not only the code is more readable, but it becomes automatically more maintainable because the maintainer has less chance to mess something up.

I have heard two objections to this approach – execution inefficiency and programmer inefficiency. The first is very unlikely since a common optimization operation performed by every compiler is to make an inline copy of short functions where they are called. For long functions the call cost is not relevant. Anyway the old optimization tenet is always to be remembered and applied – measure before optimizing. The programmer judgment is far from fair in most but the trivial programs. (There is another good rule about optimization: Don’t.)

About programmer efficiency, it may be true that a newbie codes like a rocket and feels that spending time in anything but writing code is a waste of time, but the result is usually a write-only source code that no one is willing, nor able to fix or modify. Moreover the time taken to write the good code at first time is much less than writing bad code and come back later to refactor it in something more convenient. In fact for most code blocks is very difficult to pick up inputs and outputs and be sure the no side effect has been missed in refactoring.

This very same technique can be employed to reduce the size and improve readability of large switch-case statements.

There is another use of comments similar to the one just described. An obscure piece of code is commented with the intended functionality. This goes in a quite grey area. I mean, there are times that code has to be obscure because it is handling unusual or complicate stuff. So explanation for, let’s say, floating point math implementations is more than welcome. In all the other cases it is just smoke hiding what’s really going on. There may be several causes (ego, job insurance, pretending being smart…), but the result is always the same unreadable code. Here holds the principle: don’t do anything that may surprise the reader.

Write simple code unless there is a compelling reason to do otherwise.

Friends and namespaces

There are C++ behaviors that may leave you a bit astonished, staring at the lines on the monitor and wondering why the code isn’t compiling, or doesn’t work as expected. Just stumbled in one of these cases.
I usually follow these steps to recover from the puzzled face. First I write a minimal example that reproduces the behavior. It should be a bunch of lines in a single file. Sometimes this could be a daunting task, but I have that it is always worth to grasp the problem.
In fact, once you have the minimal code, you can easily experiment, changing and twiddling bits to see how the behavior changes.
Then you have two options – you can ask your local C++ guru about the problem (if you have one), or you can google the Internet for a clever selection of keywords that describes your problem.
So what happened today?
I decide to move some code I developed into a namespace-constrained library. Everything compiled happily outside the namespace, but failed to do so in the namespace. After some headscraping, I started cutting and shaping a minimum file with the same odd behavior. Here you are:

/** prova.cc
 *
 * @author Massimiliano Pagani
 * @version 1.0
 * @date 24/04/2007
 *
 * @notes
 * @history
 *
*/

#if defined( USE_NAMESPACE )
namespace NS
{
#endif

    class A
    {
        public:
        private:
            struct B { int x; };
            friend bool fn( B const& b );
    };

#if defined( USE_NAMESPACE )
}

using namespace NS;

#endif

namespace NS
{

    bool fn( A::B const& b )
    {
        return b.x != 0;
    }
}

Now, if you compile it defining the symbol USE_NAMESPACE (e.g. via g++ -Wall -DUSE_NAMESPACE -c prova.cc), then you get the odd looking error:

prova.cc: In function 'bool fn(const NS::A::B&)':
prova.cc:21: error: 'struct NS::A::B' is private
prova.cc:31: error: within this context

While if you compile without the namespace everything works as expected. Since the error was quite meaningless to me, I started investigating on friend and namespace. After some mailing list browsing, I figure it out. And it was simpler than what appeared – just a case for a misleading error.
In fact the friend statement declares a function fn somewhere in the NS namespace, while actually fn is defined in the global namespace. In fact there is just a using statement. To fix the problem, just move the fn function into the NS namespace.
Well and I have figure it out alone, without the need of calling my uber-C++-guru friend Alberto.
On a completely unrelated topic, today is the 25th anniversary of the marvelous ZX Spectrum. Happy Birthday Dear Speccy.

Don’t try this

Ok, I warned you. If you try it out chances are high that your productivity could seize up. On the other hand, with a sensible usage, you’ll waste less time while gaining in knowledge (and since knowledge is power, all sort of savings are within reach). Reader is the name, Google Reader. It is a news feed reader that access Atom feeds and present them in a gmail like fashion on your favorite browser.
You can even mark interesting news for sharing and have them published in your website within an applet:


or you can look them up on a web page, or they can be fed as an RSS feed, or you can mail them via gmail. Pretty impressive!

This or That?

Now in one of my last rants I talked about how difficult is to chose a camcorders nowadays. Now I would talk a bit more specifically of the camcorder models that made it into the finals. Let’s start from the Canon HV20. This camcorder is really new, it is just coming through the usual distribution channels so chances are high that you can order it via Internet but it is still not available at your local dealer. The camera is high definition with miniDV storage and color viewfinder. Canon brochure is lengthy and equally parted between video and still shot functions. Depicting it as a very hybrid device.
Reviews are very good, but I am not enough geeky to jump on the HD train. First I have not the right TV set, nor has any of my friends or relatives, then the computational cost of HD rendering far exceeds the computational resource of Pagani’s Manor, finally I have not the proper storage media for playing it back… worse it is still undecided which the storage media for high definition will be. Blue Ray? HD-DVD? I think that some 10 years are still needed for the wide spread of these technologies and it makes no sense for me to pay money for something that I could use at the end of the camcorder life cycle.
Also, if talking about quality, I have some doubts that a consumer device, even if at the top of its range, could match the quality of a professional device of the previous generation. I mean that the Canon XM2 probably will outperform the HV20 in terms of quality even if it cannot achieve the same pixel resolution.
So, since I have introduced it, let’s talk about the Canon XM2. This is a professional entry level costing around twice the other models I considered. I found a good deal on used device, provided with 1-year shop warranty, in the right price range.
The XM2 is miniDV based and has a standard definition broadcast quality whatever this means. It has plenty of features and although you can use it in fully automatic mode you can set individually shot parameters. Reading the specifications I found lot of stuff that I don’t exactly know what is, but, I suppose, is there to improve the quality in difficult situations.
I think that the most appealing feature is the optics, large fluorite lenses that promise to deal greatly with even the darkest scenes, zooms and wide-angle far superior than the consumer models and a sun shield.
Against this camera count the size (it is 30 cm long) and the look which is very professional and that could attract too much unwanted (and possibly evil-intentioned) attentions. Given the dimension I reckon it is quite uncomfortable to carry during trekking and walks.
Last I got a suggestion about the Panasonic NV-GS500. This is a standard definition, miniDV based, and consumer top-range camera. It has large lenses, 3CCDs of more 1M pixels each, that’s near the double of other standard resolution devices. Weights about half a kilo. It is for sure easier to handle than the Canon XM2 and much less expensive-looking.
I read a couple of reviews (1 and 2) of the NV-GS500. It is praised for its video and audio quality, while its weakness are: the lack of a headphone jack, and that some commands are not directly accessible via buttons, but you have to navigate the camera menu to operate them.
Since I have never used the headphone jack with my current camera in the past 8 years, I don’t think I’ll miss it too much. As for the other problem, I guess it is more a shortcoming for those used to professional or semi-professional cameras. For example I even don’t know that something called zebra-pattern was a camcorder command. (No, I don’t think it is for filming pedestrian crossings).
Well, all that said, yesterday my wife and I went to the shop where the XM2 was on sale. We wanted to have another look at the camera and more specifically to transportability. Once in the shop I asked the clerk: “we’re here to have another look at the Canon XM2”, the clerk answered: “It’s been sold”, me: “… , … ? … ! … “, he: “yes, sold”.
And that was the end of the decision process, yesterday evening I ordered the NV-GS500.

Ten minutes of your time

(English below) Strano scrivere qui in italiano, ma visto che devo segnalare un’iniziativa in questa lingua mi sembra la maniera più semplice.
Fino al 19 Aprile c’è un questionario in linea sulle donazioni. Ci vogliono circa 10 minuti per compilarlo e alla fine è possibile scegliere la destinazione di un euro (donato da SWG, la società che esegue il sondaggio) ad una ONLUS a scelta da un elenco. La segnalazione mi giunge da un’associazione che ha la mia piena fiducia.

(English)
Odd writing here in Italian, but this seems to me the simplest way to let you know about an initiative in this language.
Until April, 19th there is an on-line poll about donations. It takes about 10 minutes to fill in and at the end you can choose the destination of one euro (offered by SWG, the company that set up the poll) to one non-profit company from a proposed list. I learned about this from a trusted association.