Recently I had to spend some time trying to adapt my imperative/OO background to a piece of code I need to write in functional paradigm. The problem is quite simple and can be described briefly. You have a collection of pairs containing an id and a quantity. When a new pair arrives you have to add the quantity to the pair with the same id in the collection, if exists, otherwise you have to add the pair to the collection.
Functional programming is based on three principles (as seen from an OO programmer) – variables are never modified once assigned, no side-effects (at least, avoid them as much as possible), no loops – work on collections with collective functions. Well maybe I missed something like monad composition, but that’s enough for this post.
Thanks to a coworker I wrote my Scala code that follows all the aforementioned principles and is quite elegant as well. It relies on the “partition” function that transforms (in a functional fashion) a collection into two collections containing the elements of the first one partitioned according to a given criteria. The criteria is the equality of the id so that I find the element with the same id if it exists, or just an empty collection if it doesn’t.
Here’s the code:
type Collection = List[Data]
def merge( collection : Collection, data : Data ) : Collection = {
val result = collection.partition( _.id == data.id )
result match {
case (List(),other) =>
data :: other
case (List(old),other) =>
Data( data.id, data.quantity+old.quantity ) :: other
}
}
Yes, I could have written more concisely, but that would have been too much write-only for me to be comfortable with.
Once the pleasant feeling of elegance wore off a bit I wondered what is the cost of this approach. Each time you invoke merge the collection is rebuilt and, unless the compile optimizer be very clever, also each list item is cloned and the old one goes to garbage recycling.
Partitioning scans and rebuild, but since I’m using an immutable collection, also adding an item to an existing list causes a new list to be generated.
Performance matters in some 20% of your code, so it could acceptable to sacrifice performance in order to get a higher abstraction level and thus a higher coding speed. But then I wonder what about premature pessimization? Premature pessimization, at least in context where I read the them, means the widespread adoption of idioms that lead to worse performances (the case was for C++ use of pre or post increment operator). Premature pessimization may cause the application to run generally slower and makes more difficult to spot and optimize the cause.
This triggered the question – how is language idiomatic approach impacts on performances?
To answer the question I started coding the same problem in different languages.
I started from my language of choice – C++. In this language it is likely you approach a similar problem by using std::vector. This is the preferred collection and the recommended one. Source is like this:
typedef std::vector<Data> Collection;
void
merge( Collection& collection, Data data )
{
auto iter = std::find_if( collection.begin(),
collection.end(),
[&]( Data const& probe ){ return probe.id == data.id; } );
if( iter == collection.end() )
{
collection.push_back( data );
}
else
{
iter->quantity += data.quantity;
}
}
Code is slightly longer (consider that in C++ I prefer opening brace on a line alone, while in Scala “they” forced me to have opening braces at the end of the statement line). Having mutable collections doesn’t require to warp your mind around data to find which aggregate function could transform your input data into the desired output – just find what you are looking for and change it. Seems simpler to explain to a child.
Then I turned to Java. I’m not so fond of this language, but it is quite popular and has a comprehensive set of standard libraries that really allow you to tackle every problem confidently. Not sure what a Java programmer would consider idiomatic, so I staid traditional and went for a generic List. The code follows:
static class Data
{
public int id;
public int quantity;
}
private static void merge( List<Data> collection, Data data )
{
ListIterator<Data> iterator = collection.listIterator();
while( iterator.hasNext() )
{
Data probe = iterator.next();
if( probe.id == data.id )
{
probe.quantity += data.quantity;
return;
}
}
collection.add( 0, data );
}
I’m not sure why the inner class Data needs to be declared static, but it seems that otherwise the instance has a reference to the outer class instance. Anyway – code is decidedly more complex. There is no function similar to C++ find_if nor to Scala partition. The loop is simple, but it offers some chances to add bugs to your code. Anyway explaining the code is straightforward once the iterator concept is clear.
Eventually I wrote a version in C. This language is hampered by the lack of basic standard library – beside some functions on strings and files you have nothing. This could have been fine in the 70s, but today is a serious problem. Yes there are non-standard libraries providing all the needed functionalities, you have plenty of them, gazillions of them, all incompatible. Once you chose one you are locked in… Well clearly C shows the signs of age. So I write my own single linked list implementation:
struct Data
{
int id;
int quantity;
};
struct ListNode
{
struct ListNode* next;
struct Data data;
};
typedef struct ListNode* List;
void merge( List* list, struct Data data )
{
for( struct ListNode* scan = *list; scan != NULL; scan = scan->next )
{
if( scan->data.id == data.id )
{
scan->data.quantity += data.quantity;
return;
}
}
pushFront( list, data );
}
Note that once cleaned of braces, merge function is shorter in C than in Java! This is a hint that Java is possibly verbose.
I just wrote here the merge function. The rest of the sources is not relevant for this post, but it basically consists in parsing the command line (string to int conversion), getting some random numbers and getting the time. The simplest frameworks for this operation are those based on the JVM. The most complex is C++ – it allows a wide range of configuration (random and time), but I had to look up on internet how to do it and… I am afraid I wouldn’t know how to exploit the greater range of options. Well, in my career as a programmer (some 30+ years counting since when I started coding) I think I never had the need to use a specific random number generator, or some clock different from a “SystemTimeMillis” or Wall Clock Time. I don’t mean that because of this no one should ask for greater flexibility, but that I find daunting that every programmer should pay this price because there is case for using a non default RNG.
Anyway back to my test. In the following table I reported the results.
|
C++ |
Scala |
Java |
C |
C++ |
|
vector |
|
|
|
list |
time (ms) |
170,75 |
11562,45 |
2230,05 |
718,75 |
710,9 |
lines |
81 |
35 |
69 |
112 |
81 |
Times have been taken performing 100000 insertions with max id 10000. The test has been repeated 20 times and the results have been averaged in the table. The difference in timing between C++ and Scala is dramatic – with the first faster about 70 times the latter. Wildly extrapolating you can say that if you code in C++ you need 1/70 of the hardware you need to run Scala… there’s no surprise (still guessing wildly) that IBM backs this one.
Java is about 5 times faster than Scala. I’m told this is more or less expected and possibly it is something you may be willing to pay for higher level.
In the last column I reported the results for a version of the C++ code employing std::list for a more fair comparison (all the other implementations use a list after all). What I didn’t expected was that C++ is faster (even if slightly) than C despite using the same data structure. It is likely because of some template magic.
The other interesting value I wrote in the table is the number of lines (total, not just the merge function) of each implementation. From my studies (that now are quite aged) I remember that some researches reported that the speed of software development (writing, testing and debugging), stated as lines of code per unit of time, is the same regardless of the language. I’m starting having some doubt because my productivity in Scala is quite low if compared with other languages, but … ipse dixit.
Let’s say that you spend 1 for the Scala program, then you would pay 2.31 for C++, 1.97 for Java and 3.20 for C.
Wildly extrapolating again you could draw a formula to decide whether it is better to code in C++ or in Scala. Be H the cost of the CPU and hardware to comfortably run the C++ program. Be C the cost of writing the program in Scala. So the total cost of the project is:
(C++) H+C×2.31
(Scala) 68×H+C
(C++) > (Scala) ⇒ H+C×2.31 > 68×H+C ⇒ C×1.31 >67×H ⇒ C > 51.14×H
That is, you’d better using Scala when the cost of the hardware you want to use will not exceed the cost of Scala development by a factor of 50. If hardware is going to cost more, then you’d better use C++.
Beside of being a wild guess, this also assumes that there is no hardware constraint and that you can easily scale the hardware of the platform.
(Thanks to Andrea for pointing out my mistake in inequality)