It took quite a while to edit the second part, but I hope it is worth the wait.
Optional Semicolons
Once upon a time, BASIC didn’t need any instruction termination symbol. If you wanted to stick two or more instructions on the same line, you had to separate them with a colon (yes, this was before semicolons). Then it was Pascal and C, and the termination/separation character made its appearance (well, maybe history didn’t unfold exactly like this, but this is, more or less, how my relationship with the instruction termination evolved).
Scala, Python, and other languages do not need semicolons or make their use optional in most contexts. This isn’t a great save, but it indeed makes me wonder why we need semicolons in C++; isn’t the “missing semicolon” one of the most frequent syntax errors? And if the compiler can tell that a semicolon is missing, couldn’t the compiler put it there for me?
Well, I guess the problem is backward compatibility. The semicolon-free parser would give a different meaning to existing code. Consider, for example, expressions that are split over multiple lines. In C++, it is ok to evaluate an expression and throw the result away. So, introducing a new statement separation syntax would be a mess – code that used to work may now present subtle problems hard to spot in debugging and code reviews.
Nonetheless, coding without semicolons is somewhat liberating, and remembering to put that character at the end of lines is a custom that I need a while to get back to when switching from Scala to C++.
Garbage collection
C++ has a strange relationship with garbage collection. This may come as a surprise to many, but in the first C++ book, The C++ Programming Language, Stroustrup wrote that C++ could optionally support garbage collection. Microsoft, in the early years of .NET, introduced a C++ extension (managed C++, then C++/CLI) to handle managed pointers – a different class of pointers for garbage-collected objects.
C++ had even a minimal support for GC, leveraged by some libraries such as the Boehm-Demers-Weiser. So, C++ is not a stranger to garbage collection, but this automatic way of deallocating objects has never caught on. In C++23, the minimal GC support was abruptly removed.
The common way for modern C++ to manage memory is via automatic objects and smart pointers. Automatic objects are allocated on the stack, and they are automatically destroyed when the execution leaves the scope where they were allocated. Smart pointers are defined by the standard library, and they provide reference-counting pointers that will automatically dispose of the pointed object when it is no longer used. By properly using std::unique_ptr and std::shared_ptr, memory management headaches are mostly gone.
Many languages went the other way, having garbage-collected objects as the default way to handle memory, with an optional way to allocate and manually free a bunch of memory.
So, what are the advantages of garbage collection? Well, there are three main advantages:
- no reference counting management penalty (paid each time you copy/assign a shared pointer around);
- thread safety (starting from C++20, there is a std::atomic partial specialization for std::shared_ptr (std::atomic<std::shared_ptr<T>>) that can be used, but – of course – you would pay an extra time for reference count update)
- GC works fine with reference loops – such as circular lists – while reference counting has troubles with these data structures.
Garbage collection lets the object exist with no additional space overhead, and the time overhead is paid once in a while for a periodic memory scan that finds unused references and disposes unreferenced objects.
There are two main problems with GC:
- Periodic execution of the collector may impact the performance of the application. GC indeed made huge advances in this area; still, for real-time applications, it may be an issue to keep under control.
- Object disposal happens after the object’s last use, but you don’t control when. C++’s predictable destruction time allows C++ programmers to implement the RAII idiom.
So there are pro and cons, what I like about GC is that you don’t have to care about dynamic memory – in C++ I have to think whether the object is referenced only here (unique_ptr) or may be accessed by several parts of the code (shared_ptr), and then maybe I have naked pointers around I should take care of, and maybe I have to transform a smart pointer into another. As you can see, it is not as straightforward to allocate the object and let the GC do the work.
Lazy Values
This one is a bit unusual for the C++ programmer, but it definitely makes sense. Consider a variable with an expensive initialization:
class Foo
{
val bar = f()
}
In this code, the call to f() happens each time an instance of Foo is created. Now, suppose that according to the execution context, the bar variable is never used. That’s a pity; the code is unnecessarily performing computationally heavy tasks.
The lazy attribute can be used like this:
class Foo
{
lazy val bar = f()
}
And means that the function f() will be called at the first reference of the variable bar. Should we want to rewrite this in C++, it would be something like:
class TheTypeIWantJustOneInstance {
T getBar() const {
if( bar == std::nullopt ) {
bar = f();
}
return *bar;
}
mutable std::optional<T> bar = std::nullopt;
};
Ugly and not very readable, the mutable specifier is really the flashing warning sign that something bad is ongoing.
The lazy tool is also useful for creating infinite data structures or processing a subset of a large amount of data without the need to compute or retrieve all the data of the superset.
Of course, there’s more to make this work properly in a multithreaded environment, with shared resources and order initialization defined by access. The only “undefined behaviour” is with recursive initialization (i.e., to initialize a, you need b. But to initialize b, you need a).
Object
The C++ language has no native notion of Singleton, so they are typically implemented as:
class TheTypeIWantJustOneInstance {
public:
static TheTypeIWantJustOneInstance& get() {
static TheTypeIWantJustOneInstance instance;
return instance;
}
...
};
This may not be very thread safe since if the method get() is concurrently called by two threads, you could get instance initialized twice (at the same address… not good). But even if the thread-safety problem is addressed or avoided, the reader still has to decode a pattern of code to identify this as a singleton.
Scala offers the singleton construct natively. It is called “object”, and it looks like this –
object InstanceOfTheTypeIWantJustOneInstance {
...
}
The object construct offers a different perspective on class data. In C++, you can define a member variable or a member function to be static so that it is shared among all the instances of a class. In Scala, there is no such concept, but you can use the companion object idiom.
A companion object is an object that has the same name as an existing class. Methods and variables of the class have no special access to the companion object – they still need to import the symbols to access them. But from the user’s point of view, you can use the Class.member notation to access a member of the companion object. This gives quite a precise feeling of accessing something that is related to the class and not to the instance.
This example is from my solutions to the Advent of Code:
object Range {
final val Universe = Range( 1, 4000 )
}
case class Range( start: Int, count: Int ) {
def end = start+count
def lastValue = end-1
//...
def complement : List[Range] =
import Range.Universe
assert( start >= Universe.start )
assert( end < Universe.end )
val firstStart = Universe.start
val firstCount = start-Universe.start
val secondStart = end
val secondCount = Universe.end-end
List( Range( firstStart, firstCount), Range(secondStart, secondCount ))
.filter( _.isNonEmpty )
}
In this example, the class Range defines a numerical range (first value, count). The companion object contains a constant (Universe). The complement operation needs to access the Universe to compute the complement of a range. As you can see, to use the Universe symbol, the Universe class needs to import it.
Another interesting application is to use the companion object to provide additional constructors for the class. Using the apply method (that works like C++ operator()), you can create a factory:
object SimpleGrid
{
def apply[A: ClassTag]( width: Int, height: Int, emptyValue: A ) : SimpleGrid[A] =
val theGrid: Array[Array[A]] = Array.ofDim[A](height, width)
theGrid.indices.foreach(
y => theGrid(y).indices.foreach(
x => theGrid(y)(x) = emptyValue
)
)
new SimpleGrid(theGrid)
def apply[A: ClassTag]( data: List[String], convert: Char => A ) : SimpleGrid[A] =
val theGrid: Array[Array[A]] = Array.ofDim[A](data.length, data.head.length)
data.indices.foreach(
y => data(y).indices.foreach(
x => theGrid(y)(x) = convert(data(y)(x))
)
)
new SimpleGrid(theGrid)
}
Here, the companion object for the SimpleGrid class provides two alternate constructors. The first accepts grid width and height, and the default content for a cell. The second constructor accepts a list (of lists) and a function to convert the content of the list into cell initialization.
I find this approach interesting because it provides a native singleton concept and, at the same time, simplifies the class construct, removing the burden of class methods and fields.
Conclusions
In this post, we have explored several key concepts and constructs that distinguish C++ and Scala. Some are just syntactic sugar, like lazy vals and objects. You can argue that you can define your CRTC to implement them in a C++ library, but having them in the language sets the standard way for using these constructs, defines the dictionary if you want.
Other concepts are more drastically different – the memory management (alongside the principle that everything structured is accessed by reference) being the most evident. I am not a big fan of GC having delved more than once in optimizing memory usage to avoid that garbage collection spoiling the game (literally game). But aside from the point of relieving the programmer from low-level memory management care, garbage collection allows for better handling of objects.
In the next installment, we’ll go into the more advanced functional direction.