TIL what Catamorphism and Anamorphism, and about a new useful function. Functional programmers are not shy to use mathematical terms that could frighten the casual programmer listening. Functor, monoid, applicative just to name the first that comes to mind. Usually, they turn out to be relatively simple concepts (which get combined together until they are no longer simple).
Let’s start with Catamorphism. I’m going to give a totally informal, intuitive, and possibly imprecise description of the concept, if you want the correct definition and description you know where to go.
Suppose you have a sequence of items such as a list or an array and you need to aggregate them to produce a single result. Let’s say we want to sum the integers in a sequence. The imperative programmer would write something like this:
int a = 0;
for( int i=0; i != v.size(); ++i ) {
a += v[i];
}
BTW, you could write this as a range-for reducing the verbosity a bit, but that’s not important. What is important is that you are mixing two different logic – the logic to scan the container and the logic to compute the result. The example is so simple that here the mixup barely matters, but for sure the approach gets in the way if the sequence you want to operate on is identified by two iterators. In this case you need to change the code, even if the relevant part is how the result is computed, not the details on how to iterate the sequence.
C++ STL recognized this need and proposed the std::accumulate:
#include <numeric>
//...
int result = std::accumulate( v.begin(), v.end(), 0, []( int a, int b ){ return a+b; } );
If you can get over the hideous lambda syntax you can appreciate the simplification – the iteration logic is hidden and you need just to set the relevant parts: the initial value and the aggregation function.
As with much of the C++ library, this is a short-sighted API, which has evolved by integrating range support in C++20. The problem with std::accumulate
is that it doesn’t compose. You can’t actually apply the accumulation chaining to a temporary range
With range, you can chain in a brief and elegant form code such as:
#include <ranges>
int n = std::views::iota( 0, 9 ) |
std::filter( [](int n){ return n%2 == 0; }) |
std::accumulate( 0, [](int a, int b){ return a+b;} );
Nice, isn’t it? Pity that’s not supported. If you want to perform accumulation you need to wait for a C++23 compiler and library and use the fold_left operation:
include <ranges>
int n = std::views::iota( 0, 9 ) |
std::filter( [](int n){ return n%2 == 0; }) |
std::fold_left( 0, [](int a, int b){ return a+b;} );
Eventually, we got to the right name: fold (left just means that the operation folds left to right). Fold is a very interesting tool – it scans a sequence, keeps a “state” and combines the current element with the state to produce the new state.
It is not clear why such a generic tool has been archived under the name “accumulate” and put in the “numeric” header.
The power of the tool stays in having no constraints on the state. As long as you provide a function with the signature State f( State, T ). When summing integers the state is, of course, the sum, but you could accumulate whatever thing you want –
std::vector<char> v = { 'a', 'b', 'c' };
std::string r = std::accumulate(
v.begin(),
v.end(),
std::string{},
[]( std::string& s, char c ){
return s += c;
}
);
Interestingly you can use fold to implement filter and map, but this is another story. Now the important part is that functions like fold are called catamorphisms. If the definition “identifies functions like fold” is not enough, you can look up the definition.
Is there a reversed function? Something that I call repeatedly passing a state to produce a value and the new state? In other words, is there a function “unfold” that takes an initial state and produces a sequence by repeatedly calling a function with the signature std::pair<State,T> f( State )
?
Well, indeed the concept is known, named, and used in languages that actually support the functional programming paradigm and don’t just claim to support it (any reference to an actual language is absolutely intended).
This function is usually called unfold and the closest thing you can get in C++ is std::generate
and its sibling std::generate_n
.
Regretfully this is not functional. Here is the signature:
template< class ForwardIt, class Generator >
void generate( ForwardIt first, ForwardIt last, Generator g );
Where Generator is the function that gets repeatedly called to get the sequence provided by iterators. What’s bad is that Generator is called without an argument, so it cannot be a pure function, since it has to store a state somewhere and every time is called a (potentially) different value is returned.
The unfold operation is as useful as the fold, as it allows you to model several algorithms. Consider pseudo-random number generation. The algorithm uses a seed value from which a pseudo-random number and a new seed are computed.
Mathematicians like greek and pompous names, so they call unfold anamorphism.
Those with a background in EE will have noticed that catamorphism and anamorphis are somewhat resembling the name of diode pins – cathode and anode. These terms come from the Greek κάθοδος (kathodos) and ἄνοδος (anodos) which means way down and way up respectively. The word μορφή (morphi) means “form, shape”. So catamorphism may be memorized as shape downward (meaning that from many terms in the collection a single term in computed) and anamorphism can be thought as shape upward (from one term, multiple terms are produced).
The Good Programmer is expected to know what catamorphism (or anamorphism) is to be a good programmer? Possibly not. But surely fold and unfold are two powerful tools that are worth knowing and using.
I have to credit the excellent podcast Corecursive for triggering the idea of this post. Should you pick a single programming podcast to list, consider this.