Fun with Functors in C

C language is old, actually, a very old language that resisted all the innovations that impacted the software development industries. Even ISO standard Cobol integrated object-oriented capabilities in 2002!

This didn’t prevent programmers from applying object-oriented programming to C. With some macro juggling and a lot of self-control not to break framework rules, you can have inheritance and dynamic dispatching.

Generic programming is, in the same way, foreign to C language1. You can use preprocessor macros to implement generic containers or functions that can be instantiated on the type you need, but it is a painful endeavor.

Nonetheless, when I worked in C on the PIC18 I developed both OO-system (mainly for the UI) and a generic container library. It turned out to be useful in trimming down development time and lowering the bug count despite requiring some discipline to use this approach. My conclusion was that if you can, you should use a language in which these concepts are built-in, your life will be easier and the compiler will be able to produce better code because it will understand the higher-level intent of your code.

In the past few days, I have wondered whether the same rules-and-macros approach could be used for functional programming in C. I tried some code and the answer is that – with some limitations – you could do something, but not everything. Let’s have a look.

The first question I tried to answer was “Can I write a functor in C?”. Quick refresh – a functor is a generic container object that exposes the map function. Invoke map on a container of X with a function that takes an X and gives you a Y and you will get a container of Y:

extern float foo( int x );
List_int li = ...;
List_float lf = map( li, foo );

Map implementation is usually simple – take one item at a time from the source container, apply the function to it, and store the resulting value in the destination container.

If you have a C background with no functional programming exposure (and you made it this far), you may wonder why I care about implementing this. The reason is that the map operation is very convenient because it lets you decouple from the details of iteration, creation, and population of the new container so that you can focus just on the transformation. Other transformations (such as filter and fold) may be used to construct complex operations.

First things first, let’s define the list type. I want it to be a typed list (i.e. a list of a type, not just a list with a void*). I want this because types are important, they prevent you from sending apples where oranges are expected. Moreover, types express their constraint at a very convenient moment – the compile time. By employing and enforcing types you can write code that will never incur run-time surprises about unexpected values.

Also, I don’t want to write the same code more than once. In C++ you can use template classes for this, but in C you have to invent something using preprocessor macros.

Before that, the first problem you face is name clashing – in C++ (again) this is avoided by using namespaces, i.e. a named scope that can be used to create a hierarchy of scopes and a way to reach the symbols contained there. This is a critical feature for integrating software components. It somewhat amazes me that C hasn’t added anything in this regard, considering that even PHP added namespaces in 2009 (version 5.3.0).

The only way to get around this limitation is by manually prefixing symbols with scope identifiers. The preprocessor may help by using the concatenation operator (##).

Let’s start by defining the type of the list:

#define DEFINE_LIST(T)                                      \
    typedef struct List_##T                                 \
    {                                                       \   
        struct List_##T* next;                              \ 
        T data;                                             \ 
    }                                                       \
    List_##T;                                               \
                                                            \
    typedef T List_##T##_Type;                              \
...   

This is just the beginning of a very long preprocessor macro, I’ll present it in chunks for better understanding, nonetheless picture it as a single long macro. The source code is available in this Gitlab repository.

To keep the verbosity low, we just give the macro the type (T) we want to store in the list, the name of the list type is automatically produced to be List_stored-type. It is a bit forced, but also in C++, the name of a specific list type is std::list<stored-type>.

No surprise is coming from the struct content. I added a typedef that given the name of the List type produces the type of the stored-type. This may come in handy in other macros. Now that the type is defined, let’s try to construct a node.

    static inline List_##T* List_##T##_new( bool (*construct)(T*) )   \
    {                                                           \
        List_##T * node = malloc( sizeof( List_##T ));          \
        if( node == NULL )                                      \
        {                                                       \
            return NULL;                                        \
        }                                                       \
        if( !construct( &node->data ))                          \
        {                                                       \
            free( node );                                       \
            return NULL;                                        \
        }                                                       \
        node->next = NULL;                                      \
        return node;                                            \
    }                                                           \
                                                                \
...

(the backslash on the right side is because we are continuing the macro).

The name of the list node constructor is fixed. This function takes a function pointer that will be called to initialize the new instance of the contained type. We could have avoided the problem by requiring the list not to contain data by value, but just holding a reference to it. But this solution would have been less performing requiring extra indirection to reach data. Moreover, it would have required some form of memory management to host the contained data.

The other choice I had to make was to perform dynamic memory allocation via malloc. The alternative was to either add a macro argument for an allocator/deallocator or to leave the memory management entirely out of the component. The first solution would have made the macro more verbose, while the latter would have broken the encapsulation principle opening the door to memory mismanagement bugs.

Now let’s go on by adding a list removal function –

    static inline void List_##T##_delete( List_##T* list )      \
    {                                                           \
        while( list != NULL )                                   \
        {                                                       \
            List_##T* current = list;                           \
            list = list->next;                                  \
            T##_dtor( current->data );                          \
            free( current );                                    \
        }                                                       \
    }                                                           \
...

The catch in this function is that we may want to deinitialize the contained data. We could pass a function for destroying the object as we did in the constructor. Here I opted for another approach, requiring the existence of a symbol (either a function or a preprocessor macro), called T_dtor to invoke on the object to destroy.

This convention is quite common in OOP frameworks for C because they face the same deinitialization problem.

Now that I have the tools for building and destroying, I would like to have something for initializing a list with some content, without the need to write too much boilerplate code. In many languages, you can write something like: list<int> x = { 1, 2, 3, 4}. Can we do something like that in C? Well, yes, sort of.

We can use a code like:

List_int list = ListInit( int, { 1, 2, 3, 4 } );

ListInit must be a macro because it accepts the name of a type (needed to produce the list of the right type) and the initialization list that has no type (unless you cast it to something).

#define ListInit( T, ... )                                      \
    ({                                                          \
        typedef List_##T Node;                                  \
        T source[] = __VA_ARGS__;                               \
        size_t const Size = sizeof(source)/sizeof(T);           \
        Node* head = NULL;                                      \
        Node** prev = &head;                                    \
        for( size_t i=0; i != Size; ++i )                       \
        {                                                       \
            bool move( T* dest ) { *dest = source[i]; return true; };   \
            *prev = List_##T##_new( move );                     \
            if( *prev == NULL )                                 \
            {                                                   \
                /* destroy everything & return */               \
                List_##T##_delete( head );                      \
                head = NULL;                                    \
                break;                                          \
            }                                                   \
            prev = &(*prev)->next;                              \
        }                                                       \
        head;                                                   \
    })

There are some tricks in this macro so it is worth to spend a few words.

First, the macro expands to something delimited by ({ … }). This is a GCC extension known as a compound statement. This is an expression that can contain any kind of code (loops, variable declarations, switches, …), but that produces a result that can be used in an expression. The last expression of the statement is the value. In the code above, the value is the value of the head variable at the end of the statement.

Another relevant trick is the use of __VAR_ARGS__. This is needed because the preprocessor doesn’t understand the C language, so the sequence { 1, 2, 3, 4 } is interpreted as 4 comma-separated arguments: ‘{ 1’, ‘2’, ‘3’ and ‘4 }’ (note that the first and last argument is made by the brace, the space and the number Using a variadic macro preserves the initialization argument structure.

Last, but not least, is the nested function at the first line of the for loop – bool move( T* dest ) { ... This is a GCC extension that allows you to define a function inside a function. This has two reasons –

  • helps to avoid global symbols pollution, and
  • provides a lambda-like function that can be passed to the list constructor.

The list constructor takes a function with the signature bool (*)( T* ), meaning that only one argument is passed to the function. Here we need to assign a number to the constructed node and we can do this by capturing the value (i) in the nested function. The nested functions are not a full replacement for lambda functions, mainly because you must use them before the enclosing function is returned. By looking at the generated assembly, I doubt that nested functions can be re-entrant but I wasn’t able to find any reference about this.

Before implementing the map macro, it would be nice to have a foreach construct to help us print a list. Here is such a macro:

#define ListForeach( L, Fn )                                    \
    {                                                           \
        for( typeof(L) scan = (L);                              \
            scan != NULL;                                       \
            scan = scan->next )                                 \
        {                                                       \
            Fn( &scan->data );                                  \
        }                                                       \
    }

Foreach is an imperative construct and, as such, there are no big challenges in implementing it (once you get the multi-line macro definition right).

We are now ready to write the map macro:

#define ListMap( T, L, Fn )                                     \
    ({                                                          \
        typedef typeof(*(L)) Lf;                                \
        typedef List_##T Lt;                                    \
        Lt* head = NULL;                                        \
        Lt** prev = &head;                                      \
        Lf const* scan = L;                                     \
        for( ; scan != NULL; scan = scan->next )                \
        {                                                       \
            bool f( T* p ) {                                    \
                *p = Fn( &scan->data );                         \
                return true;                                    \
            };                                                  \
            *prev = List_##T##_new( f );                        \
            if( *prev == NULL )                                 \
            {                                                   \
                /* destroy everything & return */               \
                List_##T##_delete( head );                      \
                head = NULL;                                    \
                break;                                          \
            }                                                   \
            prev = &(*prev)->next;                              \
        }                                                       \
        head;                                                   \
    })

Map is another compound statement. The Fn macro argument is the function that takes an item of the source list and converts it into an item of the target list. Regretfully here is where the language agnosticism of the preprocessor hits harder. C language lets you determine the return type of a function, but it offers it to you as a type. You can use it to define language objects. All this is beyond the understanding of the preprocessor which works only on textual replacement. You have no way to lexically compose the name of the result type using a typeof operator. So we need to pass the destination type name as an argument.

And now we finally can test the functor –

static inline void int_dtor( int* x ) { (void)x; }
static inline void char_dtor( char* x ) { (void)x; }

DEFINE_LIST(int)
DEFINE_LIST(char)

#define lambda(return_type, function_body) \
  ({ \
    return_type anon_func_name_ function_body \
    anon_func_name_; \
  })

int main()
{
   List_int* a = ListInit( int, { 1, 2, 3, 4 } );
   ListForeach( a, lambda( void, (int const* x){ printf( "%d\n", *x );} ));

   List_char* b = ListMap( char, a, lambda(char, (int const* x){ return 'A'+*x; }) );
   ListForeach( b, lambda( void, (char const* x){ printf( "%c\n", *x );} ));
}

the test is a bit silly, but I didn’t want to implement some dynamic strings for testing only.

In the end, I could say that a sprinkle of functional is possible in C if you can endure the pain of endless macro and rigorous application of rules (e.g. defining the destructors). Is the price worth it? Possibly not. Functional tools are like legos, the real power comes from composing them to create wonderful structures. C is too limited to allow you to create those tools while staying in the C language2.

After the success with the list functor, I tried to explore other functors such as options and either, but the limitations in type deduction were too vexing making macros require much more argument than what was sane.

I wouldn’t advise copying and pasting this code into your production codebase unless you want your coworkers to be mad at you.

Nonetheless, I think that here there are some useful techniques and tricks, that may come in handy for C coding, such as the compound statement, the local function declaration, the __VA_ARGS__ to deal with initializers, and the use of macro to build generic code.

  1. TBH, C11 introduced generic selection – a way to select a function based on type matching. Although a step in the generic direction, the capabilities are minimal and cannot be applied to types that were not fully known and to types not expressed by the programmer in that context. ↩︎
  2. I mean that you could write a library for functional programming whose concepts are opaque to C and C would be just a glue for library operations. This would be a bit like cheating 🙂 You would have some performance penalties and would miss the compile-time type checking. ↩︎

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.