Category: Blog

Step Choreography Danger Level

Choreographic step is one of my favorite fitness activities. As I wrote in the past I like step because it requires both coordination, memory and challenges your muscle memory at the rhythm of dance music. In the past year and half I was so lucky to have a stellar step instructor. She is able to propose new and exciting and sometimes crazy choreographies at each lesson. Her choreographies are usually a lot of fun but sometimes require you to perform some moves that may yield some danger level. If you get distracted or you are not fast enough to retrain your muscle memory you may miss the step or smash into another stepper. Here is a table (similar to my Week-end Effectiveness Degree table) that defines a number of skull for each dangerous feature of the choreography. Just add one for each of the feature in the choreography and you get the danger level. Today we reached 3 skulls.

Step across the step – you need to extend you leg across the step risking to hit the step edge with your talon, losing your balance.
Turn backward on the floor and end with a feet on the step – you can’t see the step though you need a fairly good idea about where the step is, otherwise you may hit with the feet or put your feet half on the step, losing your balance.
two steps per person – usually steps are quite close each other and the choreography expects you to move placing a feet between the two steps. You can stumble into a step, or half place a feet on a step. If the steps are really close you may stumble into your own legs.
close steps – when your step is adjacent to another person’s step, you need to properly synchronize, otherwise you may smash into each other. Knowing left from right is usually helpful.
shared steps – at some point of the choreography two people share the same step. Usually the choreography manages to get each one enough space to move, but again you need good timing and good sync to move the right way.
exchanging steps – each one has his/her own step, but at a given time everyone moves to another step all together. Usually the choreography manages to create a timed corridor to reach your destination without smashing into anyone. “Timed” is the key.
close synchro – two persons are close enough, likely on the same step and performs different moves in turn. Moves combine together at the right time so that there is no collision. This is one of the most dangerous since you can’t rely on symmetry or get any hint from your pal; also any slight error will cause a collision.

Reacting to the movies: reactive systems and the lessons of narrative.

Another Reactive Summit 2016 talk. Speaker was Steve Moore – IBM story strategist. My comments at the end. (When writing about heros I used “she” as pronoun, if you are sensitive to this use sed s/she/she/he/g).

Thinking differently helps you gaining new perspectives on your work. That’s the work for story strategist and this is the idea behind this talk. Finding analogies between reactive systems and movies.

Stories behave like reactive systems you can see three analogies:

  1. hero as scalability,
  2. plot as resilience,
  3. genre as elasticity.

Scalability : heroes are under constant pressure to scale. Scaling is usually achieved via some resource. The best resource a hero has at her disposal is a new understanding of the world.

There are three form of escaping in the scaling up of the hero.

Escape the skin. The hero is trying to escape the fake reality she’s living to reach the real and more complex reality behind. Paranoia and questioning foundation of the system are the mean to achieve this escape. This can be summed by the question: “Are you being paranoid enough”?

Escape stoicisms. The hero lives in a simple world that avoids emotional entanglement. The hero takes more responsibility that she think she can bear. Scalability and generosity go hand in hand. Question: “Are you being generous enough?”

Escape the story. The hero lives in a world she wants to escape. This is different from the deception of the escape the skin (first form). The question is “are you being patient enough?” Sometime you need to take the time to observe and understand.

plotasresilience

The plot directs the hero where is needed by the story. Stories are all about failures – the plot is just a sequence of failures that get the hero further and further away from his goal.

Heroes never ever give up. Likeability is not required by heros.

3 ways the hero fails:

Lure of the invisible. This is a kind of accumulation of something that is somewhat hiding. Can be a log file that is increasing without anyone noticing, or some peril that grows unnoticed over time. “What is the system accumulating?” This is not an issue until it is. This way of failure includes also non technical accumulation such as management attention, customer complaints…

Lure of inclination. The hero would like something, but she has to put away that to pursue his goal. Following inclinations may get the hero in hot water, but in the end brings benefits.  “Can you make the system fragile?” [NdM sorry, it is likely I missed something in the translation]

Lure of independence. The story has a lot of moving parts apparently in an unrelated way. Using micro services the system may be better managed and written by smaller teams. “Could coordination be fun?”

Coordination among the teams is needed.

Genre is elasticity.

Elasticity means that response does not change when workload changes or accidents happen. What defines the genre is a set of rules. Genres are categories of antagonists. An antagonist puts strain on the hero.

Three different kinds of genres:

Human v. Human. The analogy on system is threats posed by human hostile actions. “Can you change the game?”

Human v. Nature. The natural world is challenging the hero. Natural antagonists are simple, unbeatable and unchangeable. These stories are about survival. If you live, you deserve to live. “do you deserve to live?” If the system were in danger do these people would do something to protect it?

Human vs self. Hero is also the antagonist. The hero doesn’t want to push away something that is abundant in his life. This kind of abundance forces the hero to questioning the reality “can you endure abundance?” We need to transform ourselves and the system at the same time.

This is the only way the system has to survive.

This talk was very interesting even if not very practical or even theorical. I like the idea that adding word to vocabulary and concepts to the programmer resources may help to devise better system or to enhance the understanding of the system. It makes food for lateral thinking (lateral food?).

One of the first person we met at the luminaries barbecue the day we arrived, had the following job title “Chief Storyteller”. Maybe this is a sign of times.

A Journey to Modern Apps Through Containers and Microservices

Here I am, warped through jet leg in the sunny Texas. Austin. The reason is to attend the Reactive summit 2016. Two days crammed with interesting speeches by luminaries of the reactive community. I’ll try my best to provide a brief summary of each talk I will attend, reportin faithfully what the speaker said and adding my thoughts. If you know about reactive systems and you read something unreasonable it is likely I missed something in the process.

This talk is by Edward Hwu – mesosphere.

(I’ll leave out the quote about Justin Bibier being the reason for having reactive systems… in part because I missed something).

If you compare the number of Internet users today with the same number a few years ago you will notice that it has increased manyfolds. Almost everyone owns a smartphone and almost everyone use it to interact with Internet. Regardless companies that make business over Internet need to be sure to be able to comply with these large number of interactions in order not to lose customers and business opportunities.

Because of the increase in the number of Internet users an architecture shift was needed: distributed systems are the only way for industry to cope with this growth.

Modern applications that runs websites are made from open source software, VMs have been replaced by containers that have shorter startup time and are lighter to run.

Additionally containers provide the means for elasticity – they can be used just for the time needed for run the service – and allows for fast and frequent releases.

Data services provide connection and persistence.

Mesos is the core component of DC/OS whose purpose is to manage and abstract data center in the same way a traditional OS abstracts from hardware. Where the traditional OS operates on a single computer, DC/OS operates on the whole data center.

Operations such as installing Kafka or Spark on a cluster, with the proper configuration and replica count, is handled with a single click. You can find services ready to install and use on the open universe app store. As of today you can find 40+ apps

Here’s my thought. Well DC/OS seems to be the real shift in distributed systems. My impression is that the era of custom made software to handle the specific aspects of distributed system (such as load balancing, or provisioning) is going the way of the dodo since structured and standard solutions are being developed.

Having the facility to handle the whole datacenter with simple commands is possibly going to revolutionize datacenter in the same way docker revolutionized application deployment. The latter avoiding developer to keep track manually of all the dependencies and providing a sensible installer, the former avoiding manual install on multiple node with configuration twiddling.

Maybe this technology is a bit ahead – I’m saying this because I found no trace of the universe app store on the internet (maybe I didn’t search enough), but for sure is very promising and being open source, free for all, is something that for sure will provide widespread adoption.

Comparisons

Recently I had to spend some time trying to adapt my imperative/OO background to a piece of code I need to write in functional paradigm. The problem is quite simple and can be described briefly. You have a collection of pairs containing an id and a quantity. When a new pair arrives you have to add the quantity to the pair with the same id in the collection, if exists, otherwise you have to add the pair to the collection.

Functional programming is based on three principles (as seen from an OO programmer) – variables are never modified once assigned, no side-effects (at least, avoid them as much as possible), no loops – work on collections with collective functions. Well maybe I missed something like monad composition, but that’s enough for this post.

Thanks to a coworker I wrote my Scala code that follows all the aforementioned principles and is quite elegant as well. It relies on the “partition” function that transforms (in a functional fashion) a collection into two collections containing the elements of the first one partitioned according to a given criteria. The criteria is the equality of the id so that I find the element with the same id if it exists, or just an empty collection if it doesn’t.

Here’s the code:

    type Collection = List[Data]

    def merge( collection : Collection, data : Data ) : Collection = {
        val result = collection.partition( _.id == data.id )
        result match {
            case (List(),other) =>
                data :: other
            case (List(old),other) =>
                Data( data.id, data.quantity+old.quantity ) :: other
        }
    }

Yes, I could have written more concisely, but that would have been too much write-only for me to be comfortable with.

Once the pleasant feeling of elegance wore off a bit I wondered what is the cost of this approach. Each time you invoke merge the collection is rebuilt and, unless the compile optimizer be very clever, also each list item is cloned and the old one goes to garbage recycling.

Partitioning scans and rebuild, but since I’m using an immutable collection, also adding an item to an existing list causes a new list to be generated.

Performance matters in some 20% of your code, so it could acceptable to sacrifice performance in order to get a higher abstraction level and thus a higher coding speed. But then I wonder what about premature pessimization? Premature pessimization, at least in context where I read the them, means the widespread adoption of idioms that lead to worse performances (the case was for C++ use of pre or post increment operator). Premature pessimization may cause the application to run generally slower and makes more difficult to spot and optimize the cause.

This triggered the question – how is language idiomatic approach impacts on performances?

To answer the question I started coding the same problem in different languages.

I started from my language of choice – C++. In this language it is likely you approach a similar problem by using std::vector. This is the preferred collection and the recommended one. Source is like this:

typedef std::vector<Data> Collection;

void
merge( Collection& collection, Data data )
{
    auto iter = std::find_if( collection.begin(),
                              collection.end(),
                              [&]( Data const& probe ){ return probe.id == data.id; } );
    if( iter == collection.end() )
    {
        collection.push_back( data );
    }
    else
    {
        iter->quantity += data.quantity;
    }
}

Code is slightly longer (consider that in C++ I prefer opening brace on a line alone, while in Scala “they” forced me to have opening braces at the end of the statement line). Having mutable collections doesn’t require to warp your mind around data to find which aggregate function could transform your input data into the desired output – just find what you are looking for and change it. Seems simpler to explain to a child.

Then I turned to Java. I’m not so fond of this language, but it is quite popular and has a comprehensive set of standard libraries that really allow you to tackle every problem confidently. Not sure what a Java programmer would consider idiomatic, so I staid traditional and went for a generic List. The code follows:

    static class Data
    {
        public int id;
        public int quantity;
    }

    private static void merge( List<Data> collection, Data data )
    {
        ListIterator<Data> iterator = collection.listIterator();
        while( iterator.hasNext() )
        {
            Data probe = iterator.next();
            if( probe.id == data.id )
            {
                probe.quantity += data.quantity;
                return;
            }
        }
        collection.add( 0, data );
    }

I’m not sure why the inner class Data needs to be declared static, but it seems that otherwise the instance has a reference to the outer class instance. Anyway – code is decidedly more complex. There is no function similar to C++ find_if nor to Scala partition. The loop is simple, but it offers some chances to add bugs to your code. Anyway explaining the code is straightforward once the iterator concept is clear.

Eventually I wrote a version in C. This language is hampered by the lack of basic standard library – beside some functions on strings and files you have nothing. This could have been fine in the 70s, but today is a serious problem. Yes there are non-standard libraries providing all the needed functionalities, you have plenty of them, gazillions of them, all incompatible. Once you chose one you are locked in… Well clearly C shows the signs of age. So I write my own single linked list implementation:

struct Data
{
    int id;
    int quantity;
};

struct ListNode
{
    struct ListNode* next;
    struct Data data;
};

typedef struct ListNode* List;

void merge( List* list, struct Data data )
{
    for( struct ListNode* scan = *list; scan != NULL; scan = scan->next )
    {
        if( scan->data.id == data.id )
        {
            scan->data.quantity += data.quantity;
            return;
        }
    }
    pushFront( list, data );
}

Note that once cleaned of braces, merge function is shorter in C than in Java! This is a hint that Java is possibly verbose.

I just wrote here the merge function. The rest of the sources is not relevant for this post, but it basically consists in parsing the command line (string to int conversion), getting some random numbers and getting the time. The simplest frameworks for this operation are those based on the JVM. The most complex is C++ – it allows a wide range of configuration (random and time), but I had to look up on internet how to do it and… I am afraid I wouldn’t know how to exploit the greater range of options. Well, in my career as a programmer (some 30+ years counting since when I started coding) I think I never had the need to use a specific random number generator, or some clock different from a “SystemTimeMillis” or Wall Clock Time. I don’t mean that because of this no one should ask for greater flexibility, but that I find daunting that every programmer should pay this price because there is case for using a non default RNG.

Anyway back to my test. In the following table I reported the results.

C++ Scala Java C C++
vector list
time (ms) 170,75 11562,45 2230,05 718,75 710,9
lines 81 35 69 112 81

Times have been taken performing 100000 insertions with max id 10000. The test has been repeated 20 times and the results have been averaged in the table. The difference in timing between C++ and Scala is dramatic – with the first faster about 70 times the latter. Wildly extrapolating you can say that if you code in C++ you need 1/70 of the hardware you need to run Scala… there’s no surprise (still guessing wildly) that IBM backs this one.

Java is about 5 times faster than Scala. I’m told this is more or less expected and possibly it is something you may be willing to pay for higher level.

In the last column I reported the results for a version of the C++ code employing std::list for a more fair comparison (all the other implementations use a list after all). What I didn’t expected was that C++ is faster (even if slightly) than C despite using the same data structure. It is likely because of some template magic.

The other interesting value I wrote in the table is the number of lines (total, not just the merge function) of each implementation. From my studies (that now are quite aged) I remember that some researches reported that the speed of software development (writing, testing and debugging), stated as lines of code per unit of time, is the same regardless of the language. I’m starting having some doubt because my productivity in Scala is quite low if compared with other languages, but … ipse dixit.

Let’s say that you spend 1 for the Scala program, then you would pay 2.31 for C++, 1.97 for Java and 3.20 for C.

Wildly extrapolating again you could draw a formula to decide whether it is better to code in C++ or in Scala. Be H the cost of the CPU and hardware to comfortably run the C++ program. Be C the cost of writing the program in Scala. So the total cost of the project is:

(C++) H+C×2.31

(Scala) 68×H+C

(C++) > (Scala) ⇒ H+C×2.31 > 68×H+C ⇒ C×1.31 >67×H ⇒ C > 51.14×H

That is, you’d better using Scala when the cost of the hardware you want to use will not exceed the cost of Scala development by a factor of 50. If hardware is going to cost more, then you’d better use C++.

Beside of being a wild guess, this also assumes that there is no hardware constraint and that you can easily scale the hardware of the platform.

(Thanks to Andrea for pointing out my mistake in inequality)

Scala Days 2016 my thoughts

Scala Days 2016 is over. I’m sorry I didn’t make it to take notes of all the talks I attended, but some speakers spoke very fluently and my phone is not the best way to write down notes. English doesn’t help as well, and sometimes my understanding of the matter was lagging behind.
So what are my impressions? That’s a good question and I’m not sure about the answer. I think Scala is a very fitting solution for a specific niche. Then as every other programming language can be used for everything (I just heard that Dropbox employs 2.7 million lines of Python code).
The niche I have in mind is made of large, distributed applications, handling zettabytes of streaming data, performing math functions over them. A part of the niche could also be composed by high traffic, highly reliable web services (where service is just a general term and refers to any kind of service, including serving web pages).
In this niche using Scala, with actors and possibly Spark makes a lot of sense.
In other contexts you risk to pay the extra run for something you don’t really need – not every server software needs to scale up, not every process needs to be modeled using events. Although functional paradigm eases writing code that copes with these contexts, you still pay an extra cost.
It is hard to quantify how much. The naïve experiment of having two programmers of comparable proficiency in two different languages working at the same task is very hard to setup.
According to an old research the productivity of a programmer, intended as line per unit of time, is more or less the same regardless of the language. That means, for example, that it is cheaper to write programs in C than in assembly, because C is more expressive and higher level than assembly. I’d like to know if the same holds for Scala, where lines tend to be long concatenation of function applications. For sure this is at a more abstract level, but it requires quite an effort to write, lot of effort to understand and it is close to impossible to debug, at least with today tools.
Well back to my impressions on the state of Scala. Scala is comparatively young and suffers from its youth. There are pitfalls and shortcomings in its design (just as naturally is for every other language) that are starting to be acknowledged by the language owners. The solution they seem to prefer is to rewrite and go for a next incompatible version. This is a dangerous move as Python would teach. Also is somewhat that does not acknowledge the industry. It makes sense for a teaching tool, for a research language, but it impacts badly on industry investments.
In several talks the tenet was that although Scala allows the programmer to chose any of the supported paradigms, only functional programming is the proper way to code. Surprisingly, at least to me, Martin Odersky, the language father, doesn’t agree, when attending a talk, he claimed that multi paradigm was a bless when programming the Scala compiler.
Industry needs pragmatism, but I see it only partially. Enthusiasts may crosses the border to become zealots. And crusades are not something that could bring stability and reciprocal understanding. When I hear about functional programming revolution I am somewhat scared, I prefer evolution, acknowledging the goods of existing stuff and building over them. In revolutions many heads roll, included those of innocent people.
The most widespread background for programmers here was Java. I understand that Java is not the most exciting language. Coming from C++ I find Java quite boring. And, in fact, some, if not many, of the advantages of Scala over Java can be found in C++ as well. Unfortunately C++ falls short of open source industry standard libraries. You won’t find anything in C++ that comes close to what Spark or Akka are for Scala. Also Play – even if it doesn’t encounters a unanimous consensus – is the de facto companion library for web services and web development.
Back to Scala days (again) – it has been a positive experience, some talks needed some preliminary study even if they were marked as beginner (everything pretending to explain implicits). Other talks were quite marketing advertisements in disguise. And some were genuinely fun.
I think I got closer to this language and had great opportunities to change my point of view. My wish is for an ecosystem more attentive to the industry and that values back-compatibility  rather than see anything that breaks with the past a way to make easy money by selling technical support.

Implicits inspected and explained

Talk by Tim Soethout. The average age among attendees seem to be higher than other talks, maybe we elders are looking for understanding finally what the hell are implicits (then we would need something similar for monads…). Here we go.
Implicit enables you to use values without explicit reference. Implicit enables the relationship “is viewable as”.
As a parameter the value is taken from the calling context (eg akka sender).
Incomprehensible examples follow.
Implicits are used for DSL, type evidence (whatever this be), reduce verbosity… Other stuff I didn’t make to copy.
Caveats – resolution can be difficult to understand. Automatic conversions. For these reasons better to not overuse.
Implicit parameter just takes the only variable with a matching type as argument.
If two or more variables with a matching types are available an error is reported by the compiler.
Implicit conversion (please avoid them) allows to automatically convert from a type A to a type B every time B is required. This comes handy to convert from and to Java collection types, but it is deprecated. The new approach is to add the asScala method to the Java types.
Implicit view is not understood by yours truly, sorry.
Implicit class. This is used when a class is missing a method. You can write a wrapper to add what you need. If this is declared implicit then when the compiler finds an invocation to the new method the wrapper class is constructed and the method is invoked.
Scoping. Some explanation I didn’t understand. Interactive moment turned quite to an epic failure with most of the attendees having no idea on what was going to happen in the example shown.
Type classes. This concept is used for ad-hoc polymorphism and to extend libraries. A naive approach to serialize to json uses subclasses. All the classes need to inherit from a JsonSerializable and to define the serialization abstract method.
With type class the class to serialize do not need to derive from the same base. A generic trait is defined for the serialization method. By defining implicit variables and assigning them the class which defines the serialization for the specific type then type serializers will be uses when needed.
Other examples just make things worse for me, I though I got it, but now I’m no longer sure (also, now, rereading what I wrote, I’m unsure I understood anything at all).
Wrap up time. I still don’t get this stuff, at least not completely and only slightly more than “danger – do not use”. Type class seems to be really powerful in the sense that they lift Java from its pond to the static polymorphism of C++. But the mechanics that power this specific kind of implicit are still obscure to me. Likely nothing that a good book and some months of hard study couldn’t solve. Just it seems not to be some knowledge you can transfer in a conference.

Transactional event sourcing using slick

Talk by Adam Warski. Event sourcing means that all the changes in the system are captured as a sequence of events.
Reasons for event sourcing: keeping information (not losing info), auditing the information in the system. But it is also useful to recreate the system state.
Hibernate was a technology developed by Adam for Java.
The goal is to have event sourcing with rdbms. There are other approaches (event store, akka persistence, eventuate).
Event is immutable, is the primary source of truth. The payload is an arbitrary json, the name is the case class name. Transaction ID, event ID and user ID.
Events are stored in a dedicated table. A read model is created on the event in the crud. This gives consistency.
DBIOAction[T] gets a description of actions to be done.
User data arrives in the system, within a command (a Scala method). The command uses read to validate the data, if data is valid the command generates a series of actions to perform that updates the model. Event listeners are triggered at this time. Each event listener may trigger further events.
Demo follows for ordering a Tesla model 3.
Summary – no library, just a small adapter. Event listener performs side effects.

image

The system could have been written with free monads. That would have improved testability. But free monads don’t come for free – you have to pay a price to decode them from the code and figure out what the code is trying to do. As Adam said, maybe it’s just me.
And saying this here while presenting a relational db approach takes really a brave heart.