twitter4s: A Scala client for the Twitter API

A few months ago, I started looking into the Twitter API and I have developed twitter4s, an asynchronous non-blocking Twitter client in Scala.

In this article, we will introduce twitter4s providing examples of how to download tweets from a user timeline and how to perform some simple data analysis.

The code shown in this tutorial is available on Github.

Getting Started

The Twitter API can be accessed by means of a registered app. So, first of all, we need to register our app. Login with your twitter account and have a look at the Twitter API terms and conditions.
If you are happy with them, register your app at http://apps.twitter.com.

In order to do so, you will need to provide an app name and an brief description of what it does. After the registration, a consumer key and consumer secret will be provided: save them as we will need them when setting up twitter4s.
Also, generate an access key and access secret and make sure you have the correct permissions: for this tutorial “Read Only” is enough.

Finally, please note that the Twitter API has rate limits — have a look at the Twitter’s development website for more information.
Also, have a look at the rate limits chart where the rate limit for each endpoint is summarized.

Setup

If not already there, add Maven Central as resolver in your SBT configuration:

resolvers += "Maven central" at "http://repo1.maven.org/maven2/"

Also, you need to include the library as your dependency:

libraryDependencies ++= Seq(
  "com.danielasfregola" %% "twitter4s" % "0.2.1"
)

Usage

Add your consumer and access token to your configuration file and initialize your Twitter Client:

import com.danielasfregola.twitter4s.TwitterClient

val client = new TwitterClient()

Alternatively, you can also specify your tokens directly when creating the client:

import com.danielasfregola.twitter4s.TwitterClient
import com.danielasfregola.twitter4s.entities.{AccessToken, ConsumerToken}

val consumerToken = ConsumerToken(key = "my-consumer-key", secret = "my-consumer-secret")
val accessToken = AccessToken(key = "my-access-key", secret = "my-access-secret")
val client = new TwitterClient(consumerToken, accessToken)

Now that our Twitter Client has been initialized, we are now ready to use it! 😀
Have a look at its documentation for a complete list of the supported functionalities.

Top Hashtags in Timeline

As a sample code, let’s collect the tweets in a user timeline and display the top 10 hashtags used. In the tutorial, we will download and analyze tweets by Martin Odersky (the creator of Scala).

First, we need to get the tweets from the user timeline:

client.getUserTimelineForUser(screen_name = "odersky", count = 200)

The method getUserTimelineForUser (see scaladoc) return type is Future[Seq[Tweet]].
Note that a Tweet is a quite rich case class that contains a lot of information (see its scaladoc): it has more than 22 fields!
The need of having huge case classes is the reason why this library doesn’t support Scala versions older than 2.11: previous versions allow up to 22 fields in a case class.

In order to retrieve the hashtags used in a Tweet, we don’t have to parse the text of the tweet, as the Twitter API has already done all the hard work for us: we just need to access the Entities field and count how many times each hashtag is used:

  def getTopHashtags(tweets: Seq[Tweet], n: Int = 10): Seq[(String, Int)] = {
    val hashtags: Seq[Seq[HashTag]] = tweets.map { tweet =>
      tweet.entities.map(_.hashtags).getOrElse(Seq.empty)
    }
    val hashtagTexts: Seq[String] = hashtags.flatten.map(_.text.toLowerCase)
    val hashtagFrequencies: Map[String, Int] = hashtagTexts.groupBy(identity).mapValues(_.size)
    hashtagFrequencies.toSeq.sortBy { case (entity, frequency) => -frequency }.take(n)
  }

Let’s put everything together and add some code to print the results with a nice layout:

import com.danielasfregola.twitter4s.TwitterClient
import com.danielasfregola.twitter4s.entities.{HashTag, Tweet}

import scala.concurrent.ExecutionContext.Implicits.global

object UserTopHashtags extends App {

  def getTopHashtags(tweets: Seq[Tweet], n: Int = 10): Seq[(String, Int)] = {
    val hashtags: Seq[Seq[HashTag]] = tweets.map { tweet =>
      tweet.entities.map(_.hashtags).getOrElse(Seq.empty)
    }
    val hashtagTexts: Seq[String] = hashtags.flatten.map(_.text.toLowerCase)
    val hashtagFrequencies: Map[String, Int] = hashtagTexts.groupBy(identity).mapValues(_.size)
    hashtagFrequencies.toSeq.sortBy { case (entity, frequency) => -frequency }.take(n)
  }

  val client = new TwitterClient()

  val user = "odersky"

  client.getUserTimelineForUser(screen_name = user, count = 200).map { tweets =>
    val topHashtags: Seq[((String, Int), Int)] = getTopHashtags(tweets).zipWithIndex
    val rankings = topHashtags.map { case ((entity, frequency), idx) => s"[${idx + 1}] $entity (found $frequency times)"}
    println(s"${user.toUpperCase}'S TOP HASHTAGS:")
    println(rankings.mkString("\n"))
  }

}

At the time of this writing, running the following code generates the following output:

ODERSKY'S TOP HASHTAGS:
[1] scala (found 25 times)
[2] scaladays (found 5 times)
[3] scalajs (found 4 times)
[4] progfun (found 3 times)
[5] coursera (found 2 times)
[6] scalax (found 1 times)
[7] community (found 1 times)
[8] aws (found 1 times)
[9] iexpectmoreofapple (found 1 times)
[10] scalamatsuri (found 1 times)

Summary

In this article we have introduced a new asynchronous non-blocking Scala Client for Twitter, called twitter4s.
We have described how to register our app, setup the Twitter Client and we have provided a sample code to download tweets from a user timeline and analyze their hashtags.

The code shown in this tutorial can be found here.

How to build a Scala REST CRUD application with Spray

In previous articles we have described how to build a REST Api with Spray and how to (de)serialize case classes with json4s. However, in order to keep things simple, we didn’t always do things as suggested by spray.io.

In this article we will redeem ourselves and we will describe how to build a REST CRUD application in Spray, taking full advantage of the tools offered by the Spray’s tool kit.

All the code shown in this tutorial can be found on GitHub.

Our CRUD Application

A REST CRUD application is an application that manipulated entities using 4 key operations: create, retrieve, update, delete.
In this tutorial we will describe how to create a simple REST CRUD application to manage question entities.
A question has the following fields: id, title, text. We are going to use json4s to translate it to the following case class (for more information on how to use json4s, have a look here):

case class Question(id: String, title: String, text: String)

Also, to keep things simple, we are not going to store the entities in a database but we will simply keep them in memory. In this tutorial the class QuestionService simulates a persistent storage by storing all the questions in a Vector (see its complete code here):

package com.danielasfregola.quiz.management.services

import com.danielasfregola.quiz.management.entities.{Question, QuestionUpdate}
import scala.concurrent.{ExecutionContext, Future}

class QuestionService(implicit val executionContext: ExecutionContext) {

  var questions = Vector.empty[Question]

  def createQuestion(question: Question): Future[Option[String]] = ...

  def getQuestion(id: String): Future[Option[Question]] = ...

  def updateQuestion(id: String, update: QuestionUpdate): Future[Option[Question]] = ...
    
  def deleteQuestion(id: String): Future[Unit] = ...

}

POST – Create a Question

Usage

The first task of our application is to define an endpoint to create a question entity.
According to the REST protocol, an entity is created through a POST request that should reply with a 201 (Created) HTTP status code. Also, a Location Header with the URI that identifies the location of the new entity should be returned.
Note that a POST request is non-idempotent: if the entity already exists or cannot be created, we should return an HTTP error status code.

For our questions application, this can be translated in the following curl command:

curl -v -H "Content-Type: application/json" \
   -X POST http://localhost:5000/questions \
   -d '{"id": "test", "title": "MyTitle", "text":"The text of my question"}'

The first time we make the request, we should get a reply similar to the following:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> POST /questions HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
< HTTP/1.1 201 Created
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 11:37:11 GMT
< Location: http://localhost:5000/questions/test
< Content-Length: 0
<
* Connection #0 to host localhost left intact

If we repeat the request again, we will get an HTTP response with a 409 (Conflict) status code as the entity already exists:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> POST /questions HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 68
> 
* upload completely sent off: 68 out of 68 bytes
< HTTP/1.1 409 Conflict
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 11:53:34 GMT
< Content-Length: 0
< 

Implementation

Spray has several methods to complete a generic result and convert it into a Route (see the Spray Documentation for more information). However, there isn’t a standard function to transform a result into a Location Header….so we are going to write one! 😀
Note that our implementation is tailored to the behaviour of our system: when QuestionService creates a question, it returns a Future[Option[T]] and, if the returned option is not defined, we want to return a different HTTP status code.

package com.danielasfregola.quiz.management.routing

import com.danielasfregola.quiz.management.serializers.JsonSupport
import spray.http.HttpHeaders
import spray.routing._
import scala.concurrent.{ExecutionContext, Future}

trait MyHttpService extends HttpService with JsonSupport {

  implicit val executionContext: ExecutionContext

  def completeWithLocationHeader[T](resourceId: Future[Option[T]], ifDefinedStatus: Int, ifEmptyStatus: Int): Route =
    onSuccess(resourceId) { maybeT =>
      maybeT match {
        case Some(t) => completeWithLocationHeader(ifDefinedStatus, t)
        case None => complete(ifEmptyStatus, None)
      }
    }

  def completeWithLocationHeader[T](status: Int, resourceId: T): Route =
    requestInstance { request =>
      val location = request.uri.copy(path = request.uri.path / resourceId.toString)
      respondWithHeader(HttpHeaders.Location(location)) {
        complete(status, None)
      }
    }
}

We can now put everything together and define the endpoint to create a question entity:

package com.danielasfregola.quiz.management.resources

import com.danielasfregola.quiz.management.entities.{QuestionUpdate, Question}
import com.danielasfregola.quiz.management.routing.MyHttpService
import com.danielasfregola.quiz.management.services.QuestionService
import spray.routing._

trait QuestionResource extends MyHttpService {

  val questionService: QuestionService

  def questionRoutes: Route = pathPrefix("questions") {
    pathEnd {
      post {
        entity(as[Question]) { question =>
          completeWithLocationHeader(
            resourceId = questionService.createQuestion(question),
            ifDefinedStatus = 201, ifEmptyStatus = 409)
          }
        }
    } ~
    ...
  }

}

GET – Retrieve a Question

Usage

Now that we have created a question, we can retrieve it by performing a GET request to the URI that identifies the entity (i.e.: the one returned in the Location Header). The request should respond with either a 200 (OK) HTTP status code with a body containing the question entity or a 404 (NotFound) HTTP status code with empty body.

For example, we can get an existing question with the following curl command…

curl -v http://localhost:5000/questions/test

…and it should return something similar to the following:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> GET /questions/test HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 12:23:34 GMT
< Content-Type: application/json; charset=UTF-8
< Content-Length: 64
< 
* Connection #0 to host localhost left intact
{"id":"test","title":"MyTitle","text":"The text of my question"}

Moreover, if we request an entity that doesn’t exists…

curl -v http://localhost:5000/questions/non-existing-question

….we should get a 404 error code:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> GET /questions/non-existing-question HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 404 Not Found
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 12:25:43 GMT
< Content-Length: 0
< 
* Connection #0 to host localhost left intact

Implementation

This behaviour can be easily be achieved with Spray as it will automatically complete optional values:
– if the option is defined, complete will transform it in a HTTP response with status 200 (OK) status code and a body containing the entity json representation.
– if the option is empty, it will just return a HTTP response with status 404 (NotFound) status code.

package com.danielasfregola.quiz.management.resources

import com.danielasfregola.quiz.management.entities.{QuestionUpdate, Question}
import com.danielasfregola.quiz.management.routing.MyHttpService
import com.danielasfregola.quiz.management.services.QuestionService
import spray.routing._

trait QuestionResource extends MyHttpService {

  val questionService: QuestionService

  def questionRoutes: Route = pathPrefix("questions") {
    ... ~
    path(Segment) { id =>
      get {
        complete(questionService.getQuestion(id))
      } ~
      ...
    }
  }
}

PUT – Update a Question

Usage

When updating an entity, we should use a PUT request. Also, we should send only the fields that we want to update, not the whole object. Not only this will make the usage of our API easier, but it will also reduce potential concurrency issues.
If the update goes through, we should get a HTTP response with a 200 (OK) status code with the updated entity in the body. On the other side, if the update was not possible, for example because the entity no longer exists, we should get a HTTP response with status 404 (NotFound) and an empty body.
Note that a PUT request is idempotent: performing the update multiple times should already return the same result.

In our application we can update the question entity with the following curl command…

curl -v -H "Content-Type: application/json" \
   -X PUT http://localhost:5000/questions/test \
   -d '{"text":"Another text"}'

….and get the following reply:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> PUT /questions/test HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 23
> 
* upload completely sent off: 23 out of 23 bytes
< HTTP/1.1 200 OK
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 12:44:03 GMT
< Content-Type: application/json; charset=UTF-8
< Content-Length: 53
< 
* Connection #0 to host localhost left intact
{"id":"test","title":"MyTitle","text":"Another text"}

Similarly, if we try to update a resource that doesn’t exist…

curl -v -H "Content-Type: application/json" \
   -X PUT http://localhost:5000/questions/non-existing-question \
   -d '{"text":"Another text"}'

…we should get a 404 (NotFound) error code:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> PUT /questions/non-existing-question HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 23
> 
* upload completely sent off: 23 out of 23 bytes
< HTTP/1.1 404 Not Found
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 12:46:15 GMT
< Content-Length: 0
< 
* Connection #0 to host localhost left intact

Implementation

As explained in the previous section, we want the client of our API to send just the fields to update, not the whole entity. In order to achieve this, we will deserialize the body of our PUT request to the following case class:

case class QuestionUpdate(title: Option[String], text: Option[String])

Note that we decided not to allow our clients to update the field id, as it is used to locate the entity.

Similarly to what we did for the GET request, Spray does all the work for us:

package com.danielasfregola.quiz.management.resources

import com.danielasfregola.quiz.management.entities.{QuestionUpdate, Question}
import com.danielasfregola.quiz.management.routing.MyHttpService
import com.danielasfregola.quiz.management.services.QuestionService
import spray.routing._

trait QuestionResource extends MyHttpService {

  val questionService: QuestionService

  def questionRoutes: Route = pathPrefix("questions") {
    ... ~
    path(Segment) { 
      ... ~
      put {
        entity(as[QuestionUpdate]) { update =>
          complete(questionService.updateQuestion(id, update))
        }
      } ~
      ...
    }
  }
}

DELETE – Delete a Question

Usage

Finally, we want to have an endpoint to delete a question entity. This can be achieved by sending a DELETE request to the URI that identifies the entity that should reply with a 204 (NoContent) status code once the operation has been completed. Note that DELETE is idempotent, so deleting a resource that has been already deleted should still return an HTTP response with a 204 (NoContent) status code and an empty body.

For example, we can delete the question test with the following…

curl -v -X DELETE http://localhost:5000/questions/test

…and get the following result back:

*   Trying ::1...
* Connected to localhost (::1) port 5000 (#0)
> DELETE /questions/test HTTP/1.1
> Host: localhost:5000
> User-Agent: curl/7.43.0
> Accept: */*
> 
< HTTP/1.1 204 No Content
< Server: Quiz Management Service REST API
< Date: Sat, 21 Nov 2015 12:58:30 GMT
< Content-Type: application/json; charset=UTF-8
< Content-Length: 2
< 
* Excess found in a non pipelined read: excess = 2 url = /questions/non-existing-question (zero-length body)
* Connection #0 to host localhost left intact

Implementation

Once again, Spray makes our life really easy as all we have to do in order to define an endpoint to delete a question is just to reuse already defined functions in the Spray’s tool kit:

package com.danielasfregola.quiz.management.resources

import com.danielasfregola.quiz.management.entities.{QuestionUpdate, Question}
import com.danielasfregola.quiz.management.routing.MyHttpService
import com.danielasfregola.quiz.management.services.QuestionService
import spray.routing._

trait QuestionResource extends MyHttpService {

  val questionService: QuestionService

  def questionRoutes: Route = pathPrefix("questions") {
    ... ~
    path(Segment) { id =>
      ... ~
      delete {
        complete(204, questionService.deleteQuestion(id))
      }
    }
  }
}

Summary

In this article we have described what a REST CRUD application is. Also, we have provided a simple tutorial on how to create a simple CRUD application using Spray. The code of the application analysed can be found on GitHub.

Static Duck Typing in Scala

Structural Types is a neat feature of Scala that not many developers know or use. This article will provide a brief introduction on what they are and how to use them.

What are Structural Types?

Types can be really powerful in Scala: they can define classes, abstract classes, objects, traits, functions…and a lot more!
What if we don’t really care what our instance is as long as it has a particular structure?

This problem is also called Duck Typing. The duck test by Mr Riley asserts that

When I see a bird that walks like a duck and swims like a duck and quacks like a duck, I call that bird a duck.

A Structural Type allows you to specify a desired set of properties and to define a type for them: the compiler will guarantee that the given type is compatible with the defined structure and reflection is used at runtime on the instance to call the method. It is important to mention that the use of reflection is quite expensive, so you should probably be careful when using it if performance is a big concern.

How to use them

Let’s simplify the duck test and let’s see how the problem can be easily solved using structural types.

For the sake of this tutorial we consider a duck any entity that can quack.

This requirement can be translated in the following structural type:

scala> type Duck = { def quack(): String }
defined type alias Duck

Every time we will use the type Duck the compiler will guarantee that the instance associated to it is compatible with the given structure:

scala> class Bird { def quack() = "quack"}
defined class Bird

scala> val d1: Duck = new Bird
d1: Duck = Bird@541a9917
// it works!

scala> class Bro { def quack() = "YOOOO QUAAAACKKK" }
defined class Bro

scala> val d2: Duck = new Bro
d2: Duck = Bro@7db1592b
// it works!

scala> class Person
defined class Person

scala> val d3: Duck = new Person
<console>:9: error: type mismatch;
 found   : Person
 required: Duck
    (which expands to)  AnyRef{def quack(): String}
       val d3: Duck = new Person
// it doesn't work because Person doesn't quack

Summary

In this article we have described how Scala implements Static Duck Typing and how it can be used to validate instances on their structure rather than type.

Performance Comparison between immutable Seq, List, Vector

I have recently attended the Advanced Scala Training Course by TypeSafe during the Scala Days Conference in Amsterdam. During the course we discussed a lot on how to write cleaner and more performant Scala code: one of the parameters that can greatly influence your performance is the type of collections used. Which type of collection shall we use? In this article we try to answer this question by comparing the runtime performance of three immutable Scala collections: Seq, List, Vector.

The Experiment

Our experiment consists in creating increasingly bigger collections of random integers and measure the average execution time of a specific operation. The (quick and dirty!) script used and the generated data can be found here. The script was run on a standard MacBook Pro with a 2.8 GHz Intel Core i5 processor. After each iteration, the number of allocated elements has been increased exponentially with base 2. Each operation has been performed 10 times and the average execution time has been considered for the purposes of this experiment.

My poor Mac managed to analyse up to collections with 2^27 elements before starting screaming the pain of hell — and that is when I decided to stop!

Although, this test cannot be considered valid for any statical significance due to the limited amount of retries and the fact that our collections have been limited to be of type Int, I believe that the results of our experiment are interesting enough to provide some guidance of what type of immutable Scala collection to use according to the feature of our system.

Apply

The analysed operation is the apply operation used to access a random object. By looking at thegraphs, we can see that List didn’t perform well. Lists don’t have randomised element access: every time we need to go through the list until we reach the element at the index we are looking for. Although both Seq and Vector behaved quite well, our winner for this round is Seq. None that the default implementation for Seq for Scala 2.11 is based on LinkedLists, while Vectors are implemented using a bit-map with a factor branching of 32 (a weird structure that can be seen as a tree with up to 32 children nodes).

Seq-applyList-applyVector-applyAll-apply

Append

For the append operation, the clear winner is Vector: because it has a tree structure that make really efficient to append elements. On the other side, List and Seq have a linked structure that makes this operation quite expensive.

Seq-appendList-appendVector-append

All-append

Prepend

List is unbeatable when prepending an element to the collection: all it has to do is add a new pointer and connect the new element with the head of existing list…easy! A Vector has quite a good performance, very similar to the append study case thanks to its tree-ish structure. One the other side, Seq has a disastrous results due to the fact that all the indexes need to be updated when a new element is added at the beginning of the collection.

Seq-prependList-prependVector-prependAll-prepend

Who’s the winner?

The results of our test suggests that unless our system requires an intensive use of specific operations like append or prepend, we should avoid list and sequences in favour of vectors as they have an overall better performance. Note that this is particularly true when performing operations on big collections: when dealing with small ones (i.e. less than 10 elements), there no significant performance difference between one collection type and the other.  This is consistent with the Collection Performance Characteristics described in the Scala Documentation.

Seq List Vector

Summary

Choosing the correct type of Scala collection can have a big impact on the performance of our code. This article has analysed the results of an experiment where the performance of Seq, List, Vector have been compared when accessing, appending and prepending an element. Our experiments are consistent with the Scala Documentation and suggest that Vectors are the collection with the overall better performance.

Pimp My Library

Methods are an efficient way of reducing code duplication and make our code cleaner. What happens if a class that you don’t own (i.e.: any class of the Standard Scala Library) doesn’t have a particular method that could make your life a lot easier — and your code a lot more readable? This article will describe how we can efficiently pimp an existing Scala library and how to seamlessly use it in our code.

How to Extend an Existing Class

Let’s assume that in our application we often need to complete a text with the string “Yo”.
We could write a nice method for it and import it when needed, but that does not make its usage look exactly the same as the other standard methods of the String class. Instead, by “pimping” our class, we will be able to use our method as it was actually part of to the standard implementation of String.

Creating the following class will do the trick:

// in file com/daniela/sfregola/tutorial/package.scala
package com.daniela.sfregola

package object tutorial {

  implicit class ExtendedString(val text: String) extends AnyVal {
    def yofy = s"Yo $text"
  }
}

We can now use our yofy method for any String used in a class in the package com.daniela.sfregola.tutorial:

package com.daniela.sfregola.tutorial

object Main extends App {
	println("bro".yofy)
	//"Yo bro"
}

These are just a few lines of code, but they use quite some interesting and powerful tools of the Scala Language.
First of all, ExtendedString is inside a package object, called tutorial: the class will be automatically imported in all the files that belong to package com.daniela.sfregola.tutorial. For more information on package objects, have a look at this article.
Also, the class is implicit: this allows the compiler to seamlessly wrap an instant of a String inside ExtendedString.
Finally, we can see that our class is a subclass of AnyVal: this is a functionality introduced from Scala 2.10, called Custom Value Class: in practical terms, it makes our code a lot faster following some optimisation from the compiler.

Custom Value Classes

If we play a bit with the :javap command in the scala console we can see how the compiler disassembles our code. If we do this for a Custom Value Classes (i.e. a class that extends AnyVal) we can notice that, instead of allocating an instance for that class type, the compiler will just allocate a java.lang.String: this little trick makes our code a lot more performant as it will avoid the allocation of runtime objects…magic indeed!

Ok, so why don’t we use extends AnyVal everywhere?

The compiler translates our instance into a java.lang.String, so it could struggle if serialising/deserialising it. This approach is usually suggested only when pimping libraries: when extending an existing class, usually we are not actually creating new classes but just adding methods by wrapping an instance of the original class.

Summary

Pimping libraries is a powerful tool to enrich existing libraries. In this article we have described how to efficiently add methods to existing classes. Also, we have briefly described the principle used by the compiler to perform runtime optimisation using Custom Value Classes.

Loading Configurations in Scala

The separation of configuration from code is a good practice that makes our system customisable as we can load different configurations according to the environment we are running it in. In this article we will describe different approaches to load configurations in Scala and how they can be combined together: loading configurations from a file, from command line parameters or from environment variables.

Configurations from a file

Let’s start with the basic case scenario: given a file, we want to read it and parse its values to use them in our code.

First, we need to define our configuration file, let’s call it application.conf.

// application.conf
my {
	secret {
		value = "super-secret"	
	}
}

We can now parse the file and use the obtained configuration in our script:

// config-tutorial.scala
import com.typesafe.config.ConfigFactory

val value = ConfigFactory.load().getString("my.secret.value")
println(s"My secret value is $value")
>> scala config-tutorial.scala 
My secret value is super-secret

By default, the ConfigFactory looks for a configuration file called application.conf. If willing to use a different configuration file (e.g.: another.conf), we just need to indicate a different file name and path to load (e.g.: ConfigFactory.load("another")). The Typesafe Config library provides several methods to make sure that the parsed value is compatible with the expected type: have a look at the Config Typesafe Documentation for methods to parse integers, longs, floats, etc.

Configurations from command line parameters

Another approach is to allow our users to redefine settings through command line parameters rather than changing the configuration file directly. All we have to do is changing our configuration file as following:

// application.conf
my {
	secret {
		value = "super-secret"
		value = ${?VALUE}	
	}
}

The output of our script will now change accordingly to the command line parameters provided.

>> scala config-tutorial.scala 
My secret value is super-secret

>> scala config-tutorial.scala -Dmy.secret.value=another-secret
My secret value is another-secret

Configurations from environment variables

Redefining configurations as part of the command line parameters works in most of the cases, but it can be tedious when we have a lot of parameters to change. Also putting sensitive information, such as passwords or tokens, in clear text in a configuration file or a run script may not be safe enough. Another option to load configurations is to inject our parameters from predefined environment variables.

In order to achieve this, we can just write a simple method that looks for a specific environment variable before loading the configurations in the previously described approach.

import scala.util.Properties

def envOrElseConfig(name: String): String = {
    Properties.envOrElse(
      name.toUpperCase.replaceAll("""\.""", "_"),
      config.getString(name)
    )
}	

Before loading our my.secret.value configuration, this simple method will first check if an environment variable called MY_SECRET_VALUE exists.

We can now put all together and create a script (gist available here) that will inject configurations in the following order:
1) From properly named environment variables
2) From command line parameters
3) From a configuration file

// application.conf
my {
	secret {
		value = "super-secret"
		value = ${?VALUE}	
	}
}
import com.typesafe.config.ConfigFactory
import scala.util.Properties

class MyConfig(fileNameOption: Option[String] = None) {
    
  val config = fileNameOption.fold(
                  ifEmpty = ConfigFactory.load() )(
                  file => ConfigFactory.load(file) )

  def envOrElseConfig(name: String): String = {
    Properties.envOrElse(
      name.toUpperCase.replaceAll("""\.""", "_"),
      config.getString(name)
    )
  }
}

The script can be used as following:

val myConfig = new MyConfig()
val value = myConfig.envOrElseConfig("my.secret.value")
println(s"My secret value is $value")

[/code]

Summary

Having a clear separation between configurations and code allow us to customise its execution to the environment where it runs in. In this article we have described different approaches of defining specific settings. In particular, we have presented a simple script that combines all these approaches in one: the script loads configurations first from environment variables, then from command line parameters and finally from a configuration file.

How to Compose Futures

Futures are a powerful tool that has been developed by the Akka team and then adopted as a standard Scala library from version 2.10.
A Future is a placeholder for a value that will be available in the future: thanks to it, it is possible to run operations in parallel and to worry about what to do with it only once the value is available making our applications more scalable and performant. A lot can be achieved with it, have a look at the official Scala documentation for Future and Promises. Each future can be seen as an isolated parallel operation, so combining them can be challenging: in this article we will describe how Futures can be composed together.

How to Select the Fastest Future

Let’s assume that in our application we have more services to perform the same operation and that these services have a different response time according to their traffic load. Because our application doesn’t have any information on the load of each service, or simply we don’t want to rely on it, we want to call all the services and get the first reply we get back: let’s see how this can be achieved using futures.

First of all, let’s simplify our life a bit: for the purposes of this tutorial, we will simulate the behaviour of our services with a method that will wait a period of time before returning a String wrapped in a Future:

def reply(timeout: Duration, msg: String): Future[String] = Future {
  Thread.sleep(timeout.toMillis)
  msg
}

Future.firstCompletedOf is the function that we are looking for: it will get a sequence of futures and return the first one that completes:

val futureSlowReply = reply(1 second, "Hello from a slow fella")
val futureFastReply = reply(100 milliseconds, "I am a super fast fella!")

val futureReplies = Seq(futureSlowReply, futureFastReply)
val futureFastestReply = Future.firstCompletedOf(futureReplies)

Await.result(futureFastestReply, 100 milliseconds)
// res0: String = I am a super fast fella!

Note that waiting 100 milliseconds to complete the future is enough: all the futures are run in parallel and we know that the fastest will complete by then.

How to Combine Futures in Parallel

What if we have different services that process that same information differently? For example, given a customer id we have a service to retrieve the account information, another to retrieve the payment details, another to retrieve product suggestions based on previous selections. We could do it the old Java style way and retrieve sequentially all the information…or we could retrieve all the information in parallel and be really efficient! 😀

Let’s see how this can be achieved using the zip method of the Future class:

val futureSlowReply = reply(1 second, "Hello from a slow fella")
val futureFastReply = reply(100 milliseconds, "I am a super fast fella!")

val futureAllParallelReplies< = futureSlowReply.zip(futureFastReply)
Await.result(futureAllParallelReplies, 1 second)
// res1: (String, String) = (Hello from a slow fella,I am a super fast fella!)

Note that waiting the combined future value, called futureAllParallelReplies, for less than 1 second would generate a java.util.concurrent.TimeoutException: the zip function needs all the futures to be completed before returning a composition of all the futures!

How to Concatenate Futures

In order to combine futures in parallel they need to be independent from each other. What if this is not possible and we need to run them sequentially?

All we need to do is using the for-comprehension loop to force the futures to run sequentially:

def futureAllSequentialReplies(msg: String) = for {
  firstReply <- reply(100 milliseconds, msg)
  nextMsg = if (msg.length < 3) msg.reverse else msg.toUpperCase 
  secondReply <- reply(200 milliseconds, nextMsg)
} yield (firstReply, secondReply)

Await.result(futureAllSequentialReplies("Hi"), 400 milliseconds)
// res2: (String, String) = (Hi,iH)
Await.result(futureAllSequentialReplies("Hello"), 400 milliseconds)
// res3: (String, String) = (Hello,HELLO)

Note that waiting for 300 milliseconds is not enough: not only the futures are run sequentially moreover, but also we spend some time computing the nextMsg String.

Summary

Future is a powerful tool to perform operations in parallel. However, combining several parallel operation can be challenging. This article has described who easily we can compose Scala Futures: how to filter them, how to combine them in parallel and, when needed, how to force them to run sequentially.