Migrate a service getting 200KQPS from Jackson to Circe

Alexandre Careil
Nov 26 · 7 min read

When you work with Finagle, handling JSON often means using Jackson, right? We decided to test another approach, more functional, with one of our production applications, and guess what? Works like a charm.

Client-side metrics (CSM) is a Top Level Application owned by the Criteo team Creator, and is in charge of registering the data we collect from banners.

Banners emit tracking pixels conveying data, and CSM is here as an interface with the Criteo internal world to emit metrics on Graphite, Hive, or notify other services that need to be updated live. It is not a complex application, but it receives about 200k qps worldwide and does one thing intensively: JSON deserialization. The application itself is written in Scala 2.11, and uses Finagle 17.12 (as of November 13 2019) with Jackson.

This article describes a way to use Circe, a functional JSON library, instead of Jackson, to decode all JSON payloads sent from banners and received by CSM.

The Author & Team

I work as a software engineer in the Creator Atom team.

At Creator, we build the banners served by Criteo. In details, we use some machine learning to produce the right banner, with the right layout and colors, for the right person.

Creator/Atom is the subteam looking after banners primitive chunks. We introduce javascript scripts in banners to collect all kinds of insights. For example, one of those signals is called “Viewability”, and provides information about whether the banner was in the visible part of the web page or not. We also collect some key interaction events (clicks, mouse interactions…), that help us monitor that our banners are not broken, and analyze their UX.

General information

The example shown after is the endpoint in charge of receiving the “key interaction events” (consisting of TouchStart, TouchEnd, ScrollStart, PrivacyClick etc.) from banners. The endpoint is called PointerEvent, and receives http POST calls with a body containing a json string (sent with the method sendBeacon). The scala case class used to deserialize the json payload is called RawPointerEvent.

We will talk about protobuf, a serialization format introduced by Google (https://developers.google.com/protocol-buffers). Protobuf is a language to generate encoders and decoders for data structures. This lets us encode and store data structures efficiently, and has the huge perk of generating apis in many languages (C# and Scala included). It has enums, that represent restricted integers as usual.


Jackson: pros and cons

Jackson can be used on its own, but it happens to be well integrated with Finagle. In fact, you can make use of annotations on case classes to deserialize structures properly, and have explicit error messages.

A basic deserializer with Jackson/Finagle

Things are pretty cool, until we wanted to deserialize custom values like protobuf enums. So far we used to just deserialize them as integers, but it would be more consistent to only have one layer of deserialization that contains all the validation.

One attempt consisted in using a wrapper, that would contain the integer and its real enum value. A protobuf enum is a class containing a function forNumber that converts the number value to its matching enum member. If there is no match, null is returned.

A wrapper to deserialize protobuf enums with Jackson

This solution works, but if you want to use multiple wrappers, it does not work anymore. If you want to deserialize more than one field with this solution, it does not work anymore.The Finatra Jackson documentation states explicitly that the @JsonCreator annotation is not supported https://twitter.github.io/finatra/user-guide/json/index.html

One strange thing was to notice that, in the example, sometimes, x and y were floats (since these come from browsers, maybe some of them consider that x and y positions can be floating numbers...). The case class that was supposed to deserialize them contained Integers fields, and it did not break the Jackson deserialization at all! These floating numbers were cast into integers silently!

So, because of this “wrapper bug”, we could not use this trick to deserialize other enum fields, and for this reason, we wanted to give Circe a shot.


Circe, an interesting surprise

How does it work?

Circe is a functional library (part of the Typelevel project https://github.com/circe/circe) that provides you with tools to deserialize JSON. It uses macros to generate deserializers at compile time (which require a scala compile plugin) instead of using reflection at runtime as Jackson does.

Syntax is very similar, but Circe provides a better way to define custom deserializers.

Circe uses implicit decoders to decode Json. A decoder is a mapping from the Json structure (btw you can use Circe with another Json parser) to the case class you provide. It provides decoders out of the box for primitive types, and you can add yours using different methods, we will use the semi-automatic method (as opposed to the automatic method).

The automatic method is a “magic” method that adds a “asJson” method on case classes, and derives decoders and encoders for this class at compile time. The problem is that if you have a structure containing two fields, a and b of the same type T that needs a custom decoder, this decoder will be generated twice, and will not be reused. This causes high compilation times.

The semi-automatic method forces you to define decoders for your custom structures one by one. Usually, to create a decoder you just need to do

Declaring a decoder for semi-automatic derivation

Writing a custom decoder for protobuf enums

Let’s focus on our case, decoding our protobuf enums.

First, we need to define a decoder generator, that takes an Int => T lambda, where T is a protobuf enum type. This way Circe will resolve it during the "implicit" resolution.

In the example we also use the manifest to get the type argument name, so we get more expressive error messages. Since we are using Scala 2.11, the Either type is unbiased (meaning you have no flatMap available on it) and we need to use the right projection to handle it.

Circe uses a Either[DecodingFailure, T] (where T is the type you want to decode) type to handle errors, and no exceptions, that is why you just need to return a Left if you want to notify an error, or just Right to return the decoded result.

Generating decoders for protobuf enums

We can simply define a decoder now:

And that’s it!

Customizing the decoder’s mapping

Well, almost good!

What if we also want to define a mapping between JSON keys of the raw structure and the members of the case class ? We can define a decoder for the class SingleEvent with the annotation @ConfiguredJsonCodec that will generate a decoder and an encoder (we can specify that we only want a decoder, it reduces the compilation time by avoiding the creation of encoders that we won't use), combined with the annotation @JsonKey. This way we specify the json key names that will source the matching members of the case class.

This trick requires the macro-paradise plugin, if you use Scala < 2.13

Circe decoder with custom mappings

There we are! Our decoder is now ready to deserialize structures such as {ts: 123, x: 1, y: 2, pe: 1} into SingleEvent(123, 1, 2, PointerEvent.EnumMember1)


A few observations

Some unexpected niceties appeared during this trip, among which:

A better validation

Out of the box, Jackson/Finatra did not seem to validate correctly a few types. In our example SingleEvent, the field timestamp was in fact an int64 before getting deserialized by Jackson into an int32... Circe complained about that, and we fixed this bug.

x and y, as suspected before were truncated into integers, Circe also complained.

Good performances

Let’s keep in mind that these comparisons are not rock solid (these measures were done at the Application level, not the specific part that deserialize payloads).

We did not expect anything in terms of performance, this migration was only experimental, to study whether we could do a better job with validation, genericity for enums… But during the release in production, we observed:

Initial CPU bump observed during the release

CPU gain

We divided the CPU consumption by the number of requests handled by the server, which gave us the following graph:

CPU usage during 1 week, before and after the changes were released

In average, we use 16% less CPU with Circe.

Memory increased usage

In average, we use 20% more memory with Circe.


Conclusion

Circe works in production! Because it was not the point of this experiment, we did not do a thorough study about performance (we could have measured the time spent to deserialize for each request), but we still have a rough estimate of its performance, at least compared with the well established Jackson.

It has a neat syntax to define custom decoders, and a stricter validation step out of the box.

We now know this library can handle this task pretty well, and will be used more widely in this application.

And you, do you use functional libraries like Circe in production? If not, what is keeping you from doing it? This article aims to share what we learnt along the way, feel free to share your inputs in the comments.

Criteo R&D Blog

Tech stories from the R&D

Alexandre Careil

Written by

SoftwareEngineer@Criteo working on Scala Top Level Applications, and promoting Functional Programming.

Criteo R&D Blog

Tech stories from the R&D

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade