I have recently attended the Advanced Scala Training Course by TypeSafe during the Scala Days Conference in Amsterdam. During the course we discussed a lot on how to write cleaner and more performant Scala code: one of the parameters that can greatly influence your performance is the type of collections used. Which type of collection shall we use? In this article we try to answer this question by comparing the runtime performance of three immutable Scala collections: Seq, List, Vector.
Our experiment consists in creating increasingly bigger collections of random integers and measure the average execution time of a specific operation. The (quick and dirty!) script used and the generated data can be found here. The script was run on a standard MacBook Pro with a 2.8 GHz Intel Core i5 processor. After each iteration, the number of allocated elements has been increased exponentially with base 2. Each operation has been performed 10 times and the average execution time has been considered for the purposes of this experiment.
My poor Mac managed to analyse up to collections with 2^27 elements before starting screaming the pain of hell — and that is when I decided to stop!
Although, this test cannot be considered valid for any statical significance due to the limited amount of retries and the fact that our collections have been limited to be of type Int, I believe that the results of our experiment are interesting enough to provide some guidance of what type of immutable Scala collection to use according to the feature of our system.
The analysed operation is the apply operation used to access a random object. By looking at thegraphs, we can see that List didn’t perform well. Lists don’t have randomised element access: every time we need to go through the list until we reach the element at the index we are looking for. Although both Seq and Vector behaved quite well, our winner for this round is Seq. None that the default implementation for Seq for Scala 2.11 is based on LinkedLists, while Vectors are implemented using a bit-map with a factor branching of 32 (a weird structure that can be seen as a tree with up to 32 children nodes).
For the append operation, the clear winner is Vector: because it has a tree structure that make really efficient to append elements. On the other side, List and Seq have a linked structure that makes this operation quite expensive.
List is unbeatable when prepending an element to the collection: all it has to do is add a new pointer and connect the new element with the head of existing list…easy! A Vector has quite a good performance, very similar to the append study case thanks to its tree-ish structure. One the other side, Seq has a disastrous results due to the fact that all the indexes need to be updated when a new element is added at the beginning of the collection.
Who’s the winner?
The results of our test suggests that unless our system requires an intensive use of specific operations like append or prepend, we should avoid list and sequences in favour of vectors as they have an overall better performance. Note that this is particularly true when performing operations on big collections: when dealing with small ones (i.e. less than 10 elements), there no significant performance difference between one collection type and the other. This is consistent with the Collection Performance Characteristics described in the Scala Documentation.
Choosing the correct type of Scala collection can have a big impact on the performance of our code. This article has analysed the results of an experiment where the performance of Seq, List, Vector have been compared when accessing, appending and prepending an element. Our experiments are consistent with the Scala Documentation and suggest that Vectors are the collection with the overall better performance.