Some time ago, there was a tv-ad about a syringe named “Pic indolor” (I guess, if not clear, it could be translated as “Pic painless”). Fast forward some decades, long is gone that ad – now Pic, to me, is only Pic painful, the regrettable MCU from Microchip that I am so unwillingly forced to use in my daily job. I already wrote about it, but there are still complains.
The current device I am working on sports quite a comprehensive set of complex features that are expected to run at once:
• proprietary field bus communication with failure detection and avoidance;
• distributed monitoring;
• Usb communication;
• Graphical User Interface, with status and menus;
• Audio (analogical, thanks god);
We have the top range Pic 18f, meaning 128k of program memory and about 3.5k of Ram (or, as Microchip engineers meant us to say, general purpose registers or shortly GPR).
Pic is renowned for its code density, or, to put it better is known for the lack of thereof. The Harvard architecture makes things even worse. In fact there isn’t such a thing as a generic pointer. The good ol’ void* is not. Pointers have to be differentiated in Program Memory pointers (24 bit wide) and GPR pointers (16 bits wide). But the difference does not end here: it goes down to the assembly level – different instructions and registers have to be used whether you want to access Program Memory or GPR. That means that the same algorithm, that can be coded in C with the help of some macro tricks, has to be translated in two copies or into a combinatorial explosion of copies if the pointers involved are more than one. A perfect example of this nightmare is in the standard library that comes with Microchip C Compiler (MCC18) – take a strcpy and you will find four versions since you may want to copy from ram to ram, ram to rom, rom to rom or rom to ram. That is annoying only up to the point where you run out of memory. At this point you can no longer afford the flexibility.
That was my case for image blitting functions – I left open the chance to copy images either from Gpr or Program Memory. Now that chance is gone.
The troubles do not end here.
Pic 18f architecture provides two levels of interrupts – high and low priority, since the name tells it all, I will not insult your wits by explaining it.
High level interrupt has some hardware facilities to save and restore non-interrupt context, but we can not even think of use it since it is strictly dedicated to the proprietary field bus driver. This ISR has some hard time constraints needing a run every bunch of microseconds, but this is another story.
So we use the low priority interrupt vector for anything else, most notably timers.
Since we are a gang of C programmers we try to stay away as much as possible from assembly. That’s fine, but writing interrupt code on Pic18 with MCC18, comes with a hefty price.
On entering the interrupt the C language run time support dumps 63 bytes of context into the stack. Given that, if you don’t want to incur in extra penalties, the stack size is 256 bytes, CRT eats up one quarter of the stack.
To be fair it is not just MCC18 fault, it is more how the PIC architecture has been designed – rather than having real CPU registers and operations that work on those registers, PIC has memory mapped hardware registers that both implements CPU registers and addressing modes.
For example, when saving the context you have to save:
• FSR0, FSR2 indirect addressing into RAM registers;
• TBLPTR indirect addressing into Program Memory;
• TABLAT indirect addressing into Program Memory read value,
• PRODH, PRODL multiplication operands.
By comparison you save the whole Z80 context in 20 bytes, but if you restrict code to not use the alternate register set, then you just save 12 bytes moreover you don’t have dumb limitation on the stack size.
Well enough with digression. What is in my ISR?
There is an interrupt source detection routine that calls the specific ISR for the occurred interrupt. If the specific ISR is a timer tick, the timer list is swept and triggered timers are notified by using a deferred procedure call.
That’s another piece of code I am proud of having written. Basically rather than performing callbacks from within the interrupt callbacks you just register your callback with your defined arguments and let a handler to perform the call later from non interrupt context.
This has the main advantage that the callback code can take its time to do what it is supposed to do since interrupts are enabled. Also the called back code could mostly ignore interrupt re-entrance problems since it is always called synchronously with the non-interrupt main loop.
This proved to be also a life saver in this occasion so that stack doesn’t grow too much in interrupt adding to those 63 bytes. To ease the impact I moved all the timer code from interrupt to non-interrupt via deferred procedure call.
This bought me some oxygen, but still the application was suffocating in limited amount of stack.
I turned my attention to display driver and its functions. Their weight on the stack was considerable and basically they don’t need to be re-entrant. Consider masked blit aside from offset registers, coordinates, read and write value you have one pointer to data, one pointer to mask and, in my case two pointers to the previous data and mask line, that’s quite a lot of stuff living in stack space.
With a deep sense of sadness I moved all those auto variables into the static universe and almost by magic I solved both the stack problem and freed enough Program Memory to fit in the memory space even without optimization.
The emergency alert just went off, but I am not sleeping relaxed sleeps because I know that it is just a question of time, soon or later, well before the end of the project, the problem will rise its ugly head again.
The cost impact on the project is unlikely to be light…
Managing Project
The current project I am working on could be considered a medium size. It involves 5 software engineers and 1 hardware engineer for about 8 months. I contributed to the planning of the software components, but I am quite critical about my skill in predicting future. Considering the pressure from top and middle management to complete the project on a given date I would take my own planning prediction with great care. So I dared to ask the project manager if he was doing some sort of risk assessment. His answer – “What the heck! If I had to do _even_ risk assessment then I would have no time at all for anything”.
Harmful + Evil = RAII for C
Some days ago I read an article about goto heresy that triggered me to write about my personal experience with the infamous goto instruction in C. The only reason I found for employing goto in C was error handling. Valentino provided me with a good idea on which I elaborate a bit. Thanks to this idea you can achieve a passable approximation of RAII in C.
It is not free (beer sense) as it is in C++ or other languages that support automatic object destruction, but if you stick to a set of IMHO acceptable conventions you should be happy with it. Shouldn’t these conventions fit for you, you may easily bend the solution to satisfy your taste.
Before delving into the technical description of the idea I am going to list the conventions that are requested for the idea to work.
First-class names have to be defined with the typedef
instruction. E.g.
typedef struct { int x; } Foo;
Then each class needs a constructor named like the class name with a trailing “_ctor”. In the same way, the destructor has a trailing “_dtor”. The first argument of the constructor is a pointer to the object to construct. Moreover, the constructor returns true if the operation has been successful or false in case of construction failure. It is up to the constructor to clean up in case of failure and not to leak any resources.
In the same way, the destructor has a single argument – the pointer to the object to destruct. By the way, by constructing I mean to receive a chunk of raw memory and turn it into a well-formed, invariant-ready, usable, and valid object. It has nothing to do with memory allocation – memory is provided by the code that calls the constructor. The destructor does the opposite – takes a valid object and by freeing the associated resources makes it a useless bunch of raw bytes, ready to be recycled by someone else. Now the idea is simple (as most of them after you know) – you need a way to keep track of what you construct so that when an error occurs you can go back and call destructors for each object already built. Since you don’t know how many objects are going to be constructed the data structure that fits best is the linked list. And, if you are clever enough you may avoid the dynamic allocation at all by employing cleverly crafted node names. When an object is successfully built a node of the list is created. Inside the node, the pointer to the built object is stored along with the pointer to the destructor. You know which is the destructor because you have the object type. When a constructor fails the execution jumps (via a goto) to the error handling trap. The trap simply sweeps the linked list and processes each node by calling the destructor on the object. Thanks to the C preprocessor the implementation is not so convoluted.
#define RAII_INIT typedef void DtorFn( void* ); struct DtorNode { DtorFn* dtor; void* object; struct DtorNode* next; } * dtorHead__ = NULL #define RAII_CTOR( x__, T__, ... ) RAII_CTOR_WITH_LINE( __LINE__, x__, T__, __VA_ARGS__ ) #define RAII_CTOR_WITH_LINE( L__, x__, T__, ... ) struct DtorNode dtor_##T__##_##L__; if( T__##_ctor( x__, __VA_ARGS__ ) ) { dtor_##T__##_##L__.dtor = (DtorFn*)T__##_dtor; dtor_##T__##_##L__.object = x__; dtor_##T__##_##L__.next = dtorHead__; dtorHead__ = &dtor_##T__##_##L__; } else { goto failureTrap__; } #define RAII_TRAP failureTrap__: while( dtorHead__ != NULL ) { dtorHead__->dtor( dtorHead__->object ); dtorHead__ = dtorHead__->next; }
RAII_INIT
the mechanism by defining the type of the linked list node and the pointer to the head of the list. Note that a single link list is enough since I want to have FIFO behavior (the first constructed object is the last to be destroyed). Also, the name of the type will be local to the function where this macro will be instantiated, therefore there won’t be a collision in the global namespace.
RAII_CTOR
macro is used to invoke an object constructor. The real work is done by the RAII_CTOR_WITH_LINE
, which accepts the same arguments as RAII_CTOR
plus the line where the macro is expanded. The line is needed to create unique node identifiers within the same function.
RAII_CTOR
needs the name of the object type in order to be able to build the name of the constructor and the name of the destructor. From this information, the macro is able to call the constructor and add a node to the destruction list if successful or jump to the destructor trap if the constructor fails.
RAII_TRAP
is the trap, to be located at the end of the function. It intercepts a constructor failure and performs the required destruction by scanning the list.
In order to use the macros you lay out the function according to the following canvas:
bool f( /* whatever */ ) { RAII_INIT; // some code RAII_CTOR( ... ); // one or more ctor(s) return true; // everything was fine RAII_TRAP; // code below is executed only in case of error. return false; }
As you see the trap performs the destruction, but leaves you the space to add your own code (in the example the “return false;
” statement).
So far so good, but you may argue that memory allocation and file open/close already have their conventions set in the standard library that doesn’t fit my macro requirements.
Don’t worry, it is quite straightforward to hammer malloc
/free
and fopen
/fclose
in the _ctor
/_dtor
schema. It is as simple as:
#define malloc_ctor(X__,Y__) (((X__) = malloc( Y__ )) != NULL) #define malloc_dtor free #define fopen_ctor(X__,NAME__,MODE__) (((X__) = fopen( NAME__, MODE__ ))!= NULL ) #define fopen_dtor fclose
Here is an example of how the code that employs my RAII macros could look:
bool f( void ) { RAII_INIT; Foo foo; FILE* file; void* memory; RAII_CTOR( memory, malloc, 100 ); RAII_CTOR( file, fopen, "zippo", "w" ); RAII_CTOR( &foo, Foo, 0 ); return true; RAII_TRAP; return false; }
This code has some great advantages over the solutions I presented in my old post. First, it has no explicit goto (the goto is hidden, as much as it is in any other structured statement). Then you don’t have to care about the construction order and explicitly write the destructor calls.
Though there are some drawbacks. First, the linked list has an overhead that I don’t think the optimizer will be able to avoid. The space overhead is 1 function pointer and 2 data pointers (plus alignment padding) for each constructed object. This space is taken from the stack, but it is completely released when the function returns.
The code requires a C99 compliant, or, at least a compiler that allows you to declare variables anywhere in the code (and not just at the block beginning). I think that the function pointer and argument pointer juggling are a bit on (or maybe beyond) the edge of the standard compliance. I tested the code on a PC, but maybe it fails on more exotic architectures.
So, what do you think?
Considering Goto Harmful, but…
Since I started programming in C until a few months ago I religiously practiced the rule “Don’t use goto” (totaling for about 23 years of abstinence). I remember I was puzzled at first – coming from BASIC programming, I hardly believed you could get along without the infamous instruction. To change my habits I took this as a challenge, and in a short time I was able to get rid of the evil statement.
In practice I was helped by a bunch of C statements that are basically disguised goto instructions: break, continue, and return.
Break and continue allows you to jump out of a loop or at the next iteration; while return is a jump out of the current function.
Single exit point (i.e. just one return per function) is often preached as a “Right Thing”, but when programming in C, single exit fights madly with error management, forcing you either to deeply nest conditionals or to add boolean variables with the sole purpose of skipping code in case of error.
Amiga was the first computer I programmed in C, it was an advanced machine for that time, but experimental in many ways. For example Amiga operating system provided you with full multitasking capabilities, but the hardware lacked an MMU therefore no protected memory was in place. This forced the programmer to be very careful about error conditions – one unhandled error and the entire system could be nuked by a single failing program.
That’s probably why I have been always attentive to error handling and graceful exit.
It was back then that I started using the idiom:
bool ok1; bool ok2; bool ok3; ok1 = f1(); ok2 = f2(); ok3 = f3(); if( ok1 && ok2 && ok3 ) { // f1(), f2() and f3() returned ok. } if( ok1 ) free1(); if( ok2 ) free2(); if( ok3 ) free3();
This helps to avoid some nesting but fails in tracking which function succeeded and which didn’t. That could be fine in some situations, but not in others. For example, if you have to free some resources allocated in f2(), you must know if f2() succeeded.
Conversely, the idiom below:
bool ok1; bool ok2; bool ok3; ok1 = f1(); ok2 = f2(); ok3 = f3(); if( ok1 && ok2 && ok3 ) { // f1(), f2() and f3() returned ok. } if( ok1 ) free1(); if( ok2 ) free2(); if( ok3 ) free3();
Performs proper cleanup, but fails to capture that f2() has to be executed if, and only if, f1() succeeds.
Then I went the C++ way for several years and gained a markedly object-oriented approach.
Using C++ you don’t have to worry much about these details if you happen to use the RAII idiom. That is, an automatic object (i.e. local instances) gets automatically destroyed when the scope is left regardless of the reason that causes the execution to leave the scope.
In other words, if a function fails, be it with an exception or by reporting a specific error and triggering a return, objects that were built are destroyed, leaving the system in a good, non-leaking state.
Fast forward some years I am back to C programming with a heavy legacy of object-oriented approach. This means that I try to design modules in an Object-oriented way – modules define classes, and each class has one constructor that prepares the instance for usage. Each class also has one destructor (that may be empty, but this is an implementation detail, so if it changes in the future you don’t have to change the calling code).
This is the setting were the C error management issue arose again. I want to mimic a C++-like behavior so that when in the constructor there are 3 “sub-objects” to construct I want proper clean up (i.e. destructor calls) are invoked in case of error.
If you follow a strictly structured approach (without exception support), you get a very convoluted code:
if( f1_ctor() ) { if( f2_ctor() ) { if( f3_ctor() ) { // succesfull return true; } else { f2_dtor(); f1_dtor(); } } else { f1_dtor(); } } return false;
The lack of “fall through” semantics forces you to duplicate code and therefore makes the coding and the maintenance more error-prone. In fact, suppose you have to add a third call f0_ctor() that must be called before f1_ctor(). Then you have to change nearly everything, indentation included.
It’s time to reconsider my mind framework. I would need something that selects a portion of the “destructor” sequence. Something like a switch with fall through:
progress = 1; if( f1_ctor() ) { progress = 2; if( f2_ctor() ) { progress = 3; if( f3_ctor() ) { progress = 0 } } } switch( progress ) { case 0: return true; case 3: f2_dtor(); // fall through case 2: f1_dtor(); // fall through case 1: return false; }
This can do, it is somewhat error prone when writing and/or changing the code. If you duplicate one of the progress codes you get a wrong cleanup that can go undetected.
Moreover, it doesn’t seem to add much to the goto-based error management:
if( !f1_ctor() ) { goto error1; } if( !f2_ctor() ) { goto error2; } if( !f3_ctor() ) { goto error3; } return true; error3: f2_dtor(); error2: f1_dtor(); error1: return false;
This notation is more terse (thus more readable) and appears to be more robust than the previous ones.
So why should I refrain from the goto statement in this case? There isn’t any good motivation.
I don’t want to provide any sort of “free for all” authorization in the wild usage of goto instruction. On the contrary, my claim is that first, you have to come of age without using goto (i.e. write programs for at least 18 years), practice Object Oriented Programming, and carefully deal with error handling, then if you find yourself in a language lacking suitable error management, you may use goto… only if everything else is worse.
Z80 vs. PIC
Yesterday I wrote some lines of PIC assembly code to manage interrupt service routine so that I can select from C the code to execute when an interrupt occurs. Just to give you an idea of the pain, I will show you a comparison between a Z80 (1971) and a PIC18 (2002) in an indirect call. Let’s say that you want to jump at a program address stored in two bytes at address TL and TH.
Z80 | PIC18 |
---|---|
jr L1 L2: ld hl,(TL) jp (hl) L1: call L2
|
bra L1 L2 movff TH,PCLATH movlb bank(TL) movf TL,W,B movf PCLAT L1 rcall L2
|
Z80 routine is 9 bytes long while PIC18 spreads over 14 bytes. When it comes to execution times things are not so bad for PIC18 – 36 machine cycles compared to 53 of the Z80. My guess is that a 2002 architecture involves a pipeline that allows the CPU to crank out an instruction for machine cycle. In fact modern incarnations of the Zilog CPU have a revised architecture that runs 4 times faster or more than the original Z80.
I’ve got the PIC
It was 1971 when the first single chip CPU hit the shelves. 4004 was the name. Rough by today standards, nonetheless it featured several impressive features – among which 16 registers and 5 instructions operating at 16 bits. One year later and it was the time for 8008, with 6 registers of 8 bit each. This chip was the base for the 8080, Z80 and 8086. I am quite familiar with 8080 (basically the cpu powering the GameBoy Color) and with Z80 (the heart of many of 80s home computers – ZX Spectrum and CPC Amstrad). Later it was time for extremely elegant and rational architectures – the MC68000 and the ARM.
So I supposed that the evolution of CPUs drove to better chip with rational architecture and with legacy kludges slowly moved into the oblivion. I was happy.
Then I met the Microchip PIC. To give you an idea I would say that the PIC is to CPUs what the Cobol is to programming languages.
PIC has basically one single register, plus a set of memory location with hardwired operations. For example if you want to indirect access a memory location, you write the memory location address at a specific address, then you read another specific address and you got the indirect addressing.
PIC features an harvard architecture that is program memory is separate from data memory. Program memory can address up to 24bits, while data memory holds no more than 64kbyte, but usually you get a few kilos.
The CPU has a 31 level hardware stack for call. That means that on this stack only return addresses can be stored. If you want to use the stack to pass parameters and/or to store local variables you have to implement your software stack. In the latest PIC you get some specialized memory addresses that helps you in this task.
But obviously this architecture is not thought for modern language (if you can call modern C with his 40 years of history). So at microchip they decided that some extended instruction set was needed. I think they had best intentions and that, being engineers, they took the best decisions. But the result leave me headscraping… Basically they added a static configuration bit to the CPU. This bit is stored in the program memory, so you can’t change it without rebooting. When this bit is set the meaning of nearly half of the instruction set is altered so that rather than accessing a fixed memory address, that address is used as a displacement from a pointer half in a given memory location.
Kiss backward compatibility goodbye.
I would add that the harvard architecture doesn’t mate well with C (at least with the compiler you can buy at microchip). In fact, C language pointers may have different size according to the pointed type. But a void pointer is large enough to accommodate any size pointer. With the PIC C compiler this is not true – the size of the pointer depends of a non-standard modifier “rom” or “ram” (ram is the default). So if you point in ram the pointer is 16bits wide, but if you point in rom the pointer is 24 bits. If you move a rom pointer into a void pointer you lose the 8 most significant bits. The drawback is that you cannot write code agnostic to the location of pointed data.
Considering all, the fact that the compiler requires “main” to be declared as “void main(void)” can be well ignored.
The Web of Doom
I started programming a couple of days before yesterday, so I had several occasions of peeking a look at code written by others. According to Jeff best programmers are those who answer “mine” at the question “Which is the worst code you have ever seen?” Probably I wouldn’t qualify for the elite since I consider my code not so bad. For example, in the past days, I stumbled upon a piece of code quite crappy. Most out of curiosity I analyzed the dependency among modules. Note I am talking about real straightforward dependency, the one defined by module A calling a function in module B. I got the pretty picture you see here:
You may argue that F.c is quite independent, but the truth is that F.c is empty and not used in the project, I just included it in the graph because it was in the directory.
Now it would be just an awesome example of dependency-hell applied to programming (others may call “Big Ball of Mud”) would not be the case I have the task of adding features to this mess.
As every ball of mud of respectable dimensions, a number of anti-patterns have been consistently applied with generosity. For example, around 110 global variables have been defined in a .h file.
Ok, read that sentence again.
Variables have been defined in a header file, not just declared. This brings us to an interesting anti-pattern: each .c file has the corresponding .h file which is intended to be included only once by the .c file. I.e. F.h is the header file for F.c in the sense that just F.c has to include F.h. In this way, each header file has a section for prototypes of the module and a section for externs.
One of the most relevant members of the global community in this code is variable ‘i’. You may argue about the problem of having such a short name in such a large scope. This was my reaction until I realized that such variable was used everywhere for loop indexing. What the f*! I mean … why?! The compiler is expected to optimize away the index of a loop, possibly by moving the variable into a CPU register or by unrolling the loop. This is simply not possible when a global is involved since the compiler is not granted that someone else is using the global or it is affected by some side effect of the loop.
I tried to enter the mind of the original programmer… and I imagined a young at his (or her) first job, arriving a day at the workplace and claiming: “Yo! I had a wonderful idea! Why don’t we use a global for ‘i’ so that we don’t need to declare it everywhere!”
I won’t talk about common anti-patterns such as never-ending function body (900+ lines) or deeply nested (16 levels… a record I guess). The other aspects that stroke me are hilariously long lines and tactical comments everywhere.
80 columns per line may be a bit old-fashioned but I like it and force you to be concise and avoid too nesting, but long lines in this code easily reach 160 columns and I suppose there are some around the 200 columns.
Tactical comments are those notes left by the programmer explaining what every line is supposed to do. I am not fond of this practice that actually reduces the readability but this code reaches really unexplored worlds of tactics. Several functions have each line commented (from column 100 to column 160). If a single comment doesn’t fit in a single line (which could happen when you hit column 200) it is split along multiple lines even if the code no longer relates to the description.
Yes, I am convinced that I am not the best programmer in the world (nor one in the top ten), but I strive as I can not to write code such as this. And the reason is that a) I don’t want to lose my sanity trying to debug it (and possibly change it in the future) and b) I want everyone to live in love and peace… programmers included.
Simple I/O Messing Up
Basic I/O should be simple. I guess you agree. And I guess that’s way many C++ or Java programmers that look back to humble printf with some nostalgia. In fact it is hard to beat the conciseness and the clarity of something like:
printf("%03d", x );
When it comes to create formatted output C printf is usually one of the best tool available.
Unfortunately things are not so simple. One of the first limitations acknowledged for this little gem is that it lacks of robustness, or, put from a different perspective, it doesn’t type check.
What happens if ‘x’ in the above example is a float number? or a pointer? Or worst if the format string specifies a string pointer and an integer is passed?
This problem is mostly overcome in the GNU compiler via a custom extension that allows the compiler to check for consistency between format string and arguments.
The mechanism is enough flexible to be applied to user defined functions. Suppose you have a logging function that behaves like printf, something like
void log( LogLevel level, char const* message, ... );
That’s handy so you don’t have to perform string processing to build your message to log when you want just log. If you use Gcc and declare the function like:
void log( LogLevel level, char const* message, ... ) __attribute__((format(printf,2,3)));
Then the compiler will kindly check all the invocation of function log in your code to ensure that specifiers and arguments match.
So far so good, but enter C99. In the old days there was an integer type (int) with two available modifier (short and long). That was reflected in printf specifier/modifier: %d is for straightforward ints, %hd for shorts and %ld for longs.
And this is fine until you work on the same compiler and platform. If your code needs to be portable, then some complications are ready for you.
The last standard (C99) mandates a header, namely stdint.h, where a number of typedefs provide a wealth of integer types: grouped by size and by efficiency, you have (if I count correctly) some 30 types.
From one side this is jolly good since poses an end to the critics against C for not having an integer type with a declared bit size valid for all platforms (like Java has).
Unfortunately, on the other size printf is not able to autodetect types and thus you have to write a different format string whether your int32_t is defined as long int, or just int.
To leave the nightmare behind C99 mandates another header file – inttypes.h that provides the proper specifier for each one of those 30 integer types. For example, if you want to print an int32_t, you have to write:
printf( "here's an int32_t: %" PRId32 " and that's alln", x );
As you can see it relies on the C preprocessor that merges two consecutive strings into one.
That does the job, but, IMO some simplicity of the original idea is lost.
So long XML, welcome Json!
There was a time, some years ago, when, thanks to the then-young HTML, the markup idea with angular parenthesis was considered cool. It wasn’t infrequent to <humor> abuse it </humor> everywhere. Then, as every trend does, it went out of fashion and markup is no longer considered trendy. For bad or good we have some legacies from those days, the most notable is XML.
Xml originated from a good idea – standardize a way to describe complex data. Also Xml has a great plus, if you do everything right, you may be able to validate that an Xml description is compliant with your definition. In other words you may check if that 3d scene you are loading is really a 3d scene and not something else or something garbled up.
Beside of these good intentions I never felt at ease with Xml. First I found it very verbose, although you can write it by hand, you surely don’t want to – it’s too easy to lose some closing statement and you are likely to continuously lookup the specification to check whether that data is data or attribute.
Moreover it’s not straightforward to parse an XML file. Likely you have to employ an existing library (there are dozens for each language) and learn it.
When confronted with Xml I always thought there should be a better way to store something in a structured way. Someway it could be more readable, more writable and that doesn’t require too much brain power to write parser and interprets.
In some occasions I figured out a sort of pseudo C with braces to group values and so on.
Some time ago a coworker pointed me to json which, in fact, is a simple standard for structured data that comes from Javascript notation. Json files are easy to write and read for a human being, and writing a C parser for this format requires a couple of evenings (but if you prefer, you can use a ready made parser).
Btw, as we are in this time of the year, I wish you a Merry Christmas and Happy New Year!
Requirement baseline
I thought that by nowadays two main consensuses had been established towards requirements. The XP-ish “waste-of-time” school of thought that pretends this stuff belongs to NASA and similarly priced development and clearly are NAH (Not Applicable Here). And the “we-need-them-right” school of thought that believes in properly written and managed requirement documents.Of course, I am biased, I belong to the second school since I don’t buy the XP gibberish and I am a firmly believer that a sound methodology may not be The Solution to all software development troubles, but it is surely part of it.
So I am a bit surprised when I received a 35 pages requirements document that’s actual crap. Joel Spolsky wrote a sort of basic requirements for requirements in four parts. Well below is my list. I don’t want to compete with Joel (really, do read his post, even parts 1 and 2 and 3, it’s worth), but mine is shorter and should fit even in a tight schedule should you be requested to write specifications.
Think of it as a baseline rules of thumb:
- start by describing the system to which the specification refers. What it does, which kind of existing systems do more or less the same.
- don’t go ahead. These are specification, not software design. Leave out the “how”s.
- define every acronym you are using. Not everyone knows even the most basic acronym, left out the more exotic ones (I’m still wondering what “TBP” does mean). Also define terms that are meaningful only for those who already knows what you are talking about.
- list requirements in a way you can refer to them. Number them, or better use textual tags so that you can refer them in the same document or in the documentation that is following.
- Don’t use the following words: “some”, “etc.”, “good”, “bad”, “fine” and the ellipses. The idea is that you have to be precise and define the entire system without leaving open holes.
- use a spell checker. It’s easy, every modern Word Processor has it, just switch it on and fix the words underlined with the red snake (you should really do this for everything, not just the requirement document).
- re-read everything. People is going to read and work on that document, just re-read and fix, iterate until you don’t fix anything.
As trivial as they seems, all these rules were broken at least once in the document I was handed.