During the past week PIC 18f continued to be my main headaches supplier. I think I already wrote enough about the messiness of the Harvard architecture combined with “modern” languages such as C. Well, since two separate address spaces didn’t seem enough, the chip designers opted to add a third for call return stack. The place where the CPU stores the return address before jumping to a subroutine is into a distinct address space… that’s not so addressable – since you can read and write only the topmost location.
Being our model the top of the range, it features 31 levels in the stack. That means that you can sub-call no more than 31 times without breaking the stack (or, you can’t compute factorials greater than 31! :-)).
Well 31 is a reasonable amount, or at least I thought so. Then we ran out of Program Memory, so we enabled the compiler space optimization. I must admit that these optimizations are quite effective being able to trim down the size of your binary of some 30%. Most of this percentage is achieved by the so-called “procedural abstraction” optimization. This procedure factorize common sequence of assembly into sub-routine. By repeatedly applying the technique, more and more blocks get extracted and replaced by a CALL instruction.
The side effect is that this technique has a dramatic impact on the stack – it may doubles the number of nested calls.
In our case the stack exploded when a deeply nested graphic routine was interrupted by the low-level interrupt that in turn was interrupted by the high level interrupt.
I had the terrible “game-over” feeling – I had to use the optimization to fit in the memory, but I had no control over the nesting of the calls. In fact every action I could take to soften the nesting would be undone automatically by the optimizer.
I don’t like to give up, so I fired up acrobat reader on the datasheet trying to dig a way out of troubles.
PIC 18f datasheet about call return stack isn’t terribly clear, so I had to read it several times. Eventually I concluded that stack overflow and underflow have two kinds of handling selectable once for all by configuration bits (sort of fuses you set when programming the device). In the first way when the execution attempts to push the 31st entry in the stack, just flips a flag. Next pushes just overwrite the entry. I thought hard about this, but I found no use whatsoever.
Next mode provides a sort of interrupt. On the 31st push the execution jumps to 0x000000 (the reset vector) with the overflow condition flagged.
Woha, I realized, I could trap this address and check for the overflow before handing the execution flow to the standard firmware. If I am here for an overflow, I could dump the hardware stack (one entry at time, it is possible) into somewhere in RAM and continue the execution where…
Exactly, where?
The CPU discards the target address of the call, it just pushes the return address into the stack. Well I could peek around the return address to see whether a CALL either relative or absolute has been issued and recover the target address.
I really enjoyed the idea, but the more I though of it, the more I found critical cases. What if the jump was caused by a call to a pointer? The PIC code for such an addressing is all but straightforward. Maybe I could detect the call pattern and perform some recovery.
Eventually I found the real killer of my idea in interrupts. In fact the call return stack is shared for normal execution and interrupts. The code for recovering the target address would have to be rather convoluted – check for a CALL/RCALL instruction, check for a CALL to a pointer, check for interrupt flags (there are tons of them) and decided whether a low or high priority interrupt has triggered…
Dead end. I wondered what was the idea of the chip designer when implementing the stack overflow mechanism. I found it useless, done this way, and it could have worked fine with some additional circuitry – it needed just to save the target address and it would have been great.
Back to paper and headache.
I did some search on internet and found another compiler (a bit too experimental to be employed in an industrial project) that save the return address from the hardware stack into the software stack. Doing this, the hardware stack never grew over 3 entries. Good! But the MCC18 compiler is really basic, it doesn’t allow this technique, nor it allows the programmer to specify user defined prologue/epilogue for functions.
At this point occurred to me that the stack explosion had always been during interrupt time. So, I thought, I may save the stack when I enter the low level interrupt and the stack level is over a certain threshold, and then restore everything before leaving the interrupt.
The implementation was quite straightforward (almost flawless, if you pardon my ubiquitous off-by-one bug). The only problem is that sometimes the low level interrupt incurs in a bad penalty of copying some 100 bytes from the stack into RAM in order to avoid the stack explosion.
But it makes a good war story.