Photo by Joshua Sortino on Unsplash

To parse or not to parse?

Working with JSON data in Scala

Ashley Nguyen
Published in
10 min readAug 25, 2021

--

I’m Ashley Nguyen, a software engineer on the Customer Support and Insights Engineering (CSI) team at Disney Streaming. I work full-stack on a Customer Support Tool that thousands of agents worldwide use to resolve customer issues related to Disney+ and ESPN+.

JSON, JSON, JSON

JSON (Javascript Object Notation) is the bedrock of data that powers our customer support tool and is prevalent across many parts of Disney Streaming. Many services across the platform are constantly ingesting, receiving, and returning JSON objects. When it comes to parsing JSON in our Scala services, there is a multitude of libraries that may make it hard to pick the right one! This article will be a deep dive into one library in particular: Circe and Circe Optics which we discovered to be our best option due to its performance in handling JSON in a RESTful service.

What is Circe?

Besides being the daughter of Helios (Sun God) that transformed humans into animals (quite fitting!), Circe offers support for JSON parsing, traversing, and transforming. It’s forked from an existing Scala JSON parsing library, Argonaut, and serves as a wrapper around other notable packages like Jawn and Shapeless.

There are a few main components of Circe to understand:

  1. Parser
  2. Cursor
  3. Decoding and Encoding

The Parser

Part of the circe-parser package includes a parse(...) method which returns Either[ParsingFailure, Json]. This method parse will attempt to determine if the JSON string is valid JSON, in which case it returns Right(json), or if not Left(_). Parsing is an important first step to be able to use Circe to traverse JSON.

Creating RESTful services for customer support tools involves hitting other external APIs that return JSON with the data needed to display on the UI. Oftentimes, the JSON is in a wonky, overly nested structure that would be too complicated for the frontend to handle. Our service’s job is to transform the external APIs JSON to how we want it to look and return it in our service. To do that we will need to understand…

The Cursor

The Cursor is an object that comes in three forms Cursor, HCursor, ACursor which belong to the circe-core package. This is used to traverse down the JSON to extract or modify data. Think of it as a literal cursor, that starts at the top of the JSON and works its way down the object. For example:

{
"id": "00112233",
"email": "ashley@example.com",
"location": {
"current": "JP",
"registered": "US"
},
"subscriptions": [
{
"id": "445566",
"name": "Disney+ Monthly",
"price": 7.99
},
{
"id": "778899",
"name": "ESPN+ Monthly",
"price": 6.00
}
]
}

Let’s say we want the “email”, “current location” and “subscription name” fields. Extracting those fields using the cursor would involve:

Code Breakdown

  • The JSON being passed starts the cursor at the top of the JSON and drills down each field by name using downField and then types the extracted data as[String] or any unified type.
  • Arrays of JSON objects can be done with downArray to comb over the array of subscriptions, “collect” the names in each subscription then create a List of Strings that will hold the subscription names.

The data that has been extracted in the example would look something like this:

"ashley@example.com"
"JP"
["Disney+ Monthly", ""ESPN+ Monthly""]

But now, what do we do with it? And what was the implicit val decoder: Decoder[Subscriber]?

Decoding and Encoding

Decoding involves extracting JSON data that is returned by another service/database. What is decoded is usually stored in a case class. Encoding involves creating a new JSON object from that case class to model our own service’s response. A basic flow could be:

  1. Use an HTTP client to call GET data from https://some-external-service.com
  2. parse(...) the JSON string that is in the response body of https://some-external-service.com
  3. If JSON is valid, decode it implicitly as[SomeCaseClass] to grab the fields we care about and shove them into a case class object
  4. When our service is ready to return a JSON response we can call the asJson method from circe-syntax package on SomeCaseClass
  5. When asJson is invoked, based on what is implicitly defined in the encoder — this is how the JSON will be presented in our service

The Decoder[T]andEncoder[T] are defined as implicit parameters in a trait or object so that the JSON can be converted automatically upon invoking as[SomeCaseClass] (implicit Decoder) or asJson (implicit Encoder). IMPORTANT: the specified Decoder or Encoder may have to be imported when it is used. More on this later.*

There are two ways to decode and encode. The first way is the example above with a custom decoder — manually extracting the fields that were needed. The second way is using derivation.

Derivation

deriveDecoder[SomeCaseClass] and deriveEncoder[SomeCaseClass] methods should be used only if the case class matches the JSON object that is decoded or encoded. We typically use semi-automatic derivation in the semiauto._ package to call out which case classes are derived or not. Automatic derivation eliminates the need for calling the deriveDecoder/deriveEncoder methods completely.

Using the same JSON example:

{
"id": "00112233",
"email": "ashley@example.com",
"location": {
"current": "JP",
"registered": "US"
},
"subscriptions": [
{
"id": "445566",
"name": "Disney+ Monthly",
"price": 7.99
},
{
"id": "778899",
"name": "ESPN+ Monthly",
"price": 6.00
}
]
}

Semi-automatic derivation would look like this:

Code Breakdown

  • Since the case classes are modeled in a way that matches the JSON returned from some-external-service we can use deriveDecoder
  • If we want to return the same JSON structure in our service’s response, deriveEncoder would do that for us. Another way to use derivation is by adding the @JsonCodec annotation to the case class

For small objects that are in the response, derive is great if we don’t plan to restructure it much. But for a large object like this Subscriber — we might want to parse down to the fields we need and restructure the JSON response so it’s relevant to the client. To do that we can re-define our case class:

A few things have happened here:

  • Subscriber case class only contains 3 parameters, the fields we care to extract — email, current location, and subscription names
  • The custom decoder for a Subscriber case class happens in a for yield to traverse down the JSON with an HCursorand yield a case class object out of it
  • The custom encoder returns a Json.obj that 1) redefines the service response by abstracting out the case class definition; and 2) relabels the field names from the case class parameter names so that it makes sense to the client: “email” → “email”, “currentLocation” → “location”, “subscriptionNames” → “subscriptions”
  • The custom encoder friendly names the country codes in the original response to readable country names (there are libraries that can do this — the simple match is for example's sake)
  • asJson is used on the primitive types (i.e. String) to “JSONify” each field in the response

Custom Decoding and Encoding is extremely helpful to filter down JSON responses to what is needed and to modify the JSON in the response so that it is as custom to the client as possible. In this example, our service’s response is:

{
"email": "ashley@example.com",
"location": "JP",
"subscriptions": ["Disney+ Monthly", "ESPN+ Monthly"]
}

Putting it all together

The parser, cursor, and decoder/encoder can all come together seamlessly with implicit parameters and a clean application flow and structure.

This snippet essentially follows the 5 steps (see code comments) mentioned at the beginning of the section:

  1. Use an HTTP client to call GET data from https://some-external-service.com
  2. parse(...) the JSON string that is in the response body of https://some-external-service.com
  3. If JSON is valid, decode it implicitly as[SomeCaseClass] to grab the fields we care about and shove them into a case class object
  4. When our service is ready to return a JSON response we can call the asJson method from circe-syntax package on SomeCaseClass
  5. When asJson is invoked, based on what is implicitly defined in the encoder — this is how the JSON will be presented in our service

*Note how the decoder gets imported when as[Subscriber] is called (line 14) and how the encoder gets imported when subscriber.asJson is called (line 68). This is to ensure that the right Encoder and Decoder objects are implicitly used as opposed to Circe’s deriveCodec or deriveDecoder defaults that can be imported to resolve the compiler issue “No implicits found for parameter decode: Decoder[T]”.

Leveraging Circe Optics

There may be cases where JSON is deeply nested and we want to set the cursor somewhere other than the top of the object. Circe Optics is a great library supplement to optimize decoding and reduce boilerplate by eliminating the chain of downField calls. Circe Optics depends on Scalaz and cats, making it quite performant.

Let’s take our example from above and say the JSON is now wrapped in a data field.

{
"data": {
"id": "00112233",
"email": "ashley@example.com",
"location": {
"current": "JP",
"registered": "US"
},
"subscriptions": [
{
"id": "445566",
"name": "Disney+ Monthly",
"price": 7.99
},
{
"id": "778899",
"name": "ESPN+ Monthly",
"price": 6.00
}
]
}
}

Using the regular cursor approach, decoding a field like the “current location” now looks like this:

For all the fields that we want to decode, we’ll have to always prefix it with downField("data") since cursors are assumed to start at the top. With Circe Optics we can define where we want the cursor to start with root then begin traversing the JSON from there. From the Optics documentation:

In other words, optics provide a way to separate the description of a JSON traversal from its execution. Consequently we can reuse the same traversal against many different documents, compose traversals together, and so on.

With the sample JSON above, we can define root before we reach the implicit Decoder and have the cursor start at data rather than at the top of the object.

Code Breakdown

  • To avoid redundantly doing root.data in the Decoder, we tell the cursor to start at that point right after parsing the response JSON string (line 11)
  • Then as[Subscriber] is the cue for the Decoder[Subscriber] to step in and take things from there. The c cursor is now starting at data, so to grab the current location we can do root.location.current.as[String] on line 22 which returns an monacle.Optional[Json, String]
  • To actually grab the string, we have to finish with getOption(c.value) which will return an Option[String]. We’re passing in the cursor’s value (which is Json) to get that specific field for that JSON document

The reason Optics only grabsOption[String]of the field instead of the String itself is because:

  1. Optics uses this Scala feature called Dynamic which allows safe access to fields that may not exist. We don’t have to worry about parsing errors when accessing fields via root that don’t exist — they will resolve to None
  2. Since fields are considered Dynamic and are not guaranteed to exist, to resolve the monacle to a usable value, it is only optionally available when collecting the JSON c.value

Optics and Arrays

Circe Optics also has powerful support for traversing arrays. Back to the example with an array of subscriptions — given that after parsing, the cursor now starts at data with root.data.as[Subscriber]

Code Breakdown

  • Optics enables traversing the subscriptions array with each method and can grab the fields within the array as[T] which returns a monacle.Traversal[Json, T]
  • The main difference from before with getOption, when using each it must be resolved with getAll(c.value)which will return a List[T]
  • The name field is collected across the array with getAll to be resolved into a List of Strings (similar to our initial cursor example). List is used in Circe mostly for optimization reasons but it can be turned into any collection with toVector, toArray, toSet etc…

If multiple fields within the array need to be extracted, it is also possible to .map over each subscription as Json, start the cursor at the subscriptions and grab the fields as Option similar to unnested fields.

Optics Gotchas

Keep in mind that with Optics, the return type is a Function HCursor => SomeCaseClass rather than a Decoder[SomeCaseClass]. If using Optics and expecting it to return a Decoder[SomeCaseClass] — the decoder must return a Right(SomeDecodedCaseClass)to resolve the function into a Decoderobject unless it's moved into a private method like the snippet above.

If it’s desired to derive the decoder and encoder but only after the cursor has been moved to a certain field — that is absolutely possible with Optics. Simply use root.someField.as[SomeCaseClass] and in the implicit decoder/encoder for SomeCaseClassderiveDecoder[SomeCaseClass] can be defined.

Optics is flexible and can be used in combination with the downField method as well — enabling developers to combine both techniques for optimal JSON parsing where appropriate in the data.

The main difference between Optics and Cursor/downField is the level of safety needed to access each field. If the cursor attempts to go downField on a field that doesn’t exist, a parsing error will occur. As mentioned before with Optics, using root.someField is safe and will not throw a parsing error because it resolves either to an Option or an empty List at the end of the operation.

Why Circe?

Many Stack Overflow and Reddit threads on JSON parsing libraries in Scala consist of many opinions based on experience, preference, and use case. Some criticize Circe’s “counterintuitive” API in comparison to the familiar Java-based libraries like Jackson due to Circe’s implicit handling, dependency conflicts with other libraries, and the presence of case classes as a conduit for JSON interaction.

However, the reason our customer support tool thrives with Circe is its performance. With the engineering team being new to Scala, the ramp-up time for Circe was probably greater than if other libraries which work directly with JSON were used but the benefits of that time invested have resulted in amazing response times in our services that involve heavy JSON decoding and encoding. Provided network calls are optimal, on average, response times for our Circe-based service retrieves ~6–10 KB of data from an external service to custom decode, transform, custom encode, and return restructured JSON — has a p99 of 259 ms (0.259 seconds).

When compared to other libraries, Circe is generally faster in many benchmarks of parsing, decoding, encoding, and printing various complex case classes in common use cases. While performance testing is maintained actively, the circe library itself receives active contributions via open source and dedicated maintainers. It is also built with cutting-edge functional programming libraries such as cats. See more documentation on design guidelines around Circe.

Conclusion

Circe is a fast, effective library in parsing and transforming JSON documents to their desired structure. With some understanding of the basic components of Circe and when to use its fancy features such as implicit custom decoders/encoders, derivation, and Optics it can be a really powerful way to optimize service response time. The API is readable to fellow engineers and offers flexibility when combining traversals between cursors and Optics. By providing safe access with Optics and returning results as an Either[Failure,T] it empowers any Scala service to gracefully handle errors when dealing with JSON parsing and ensure a higher probability of happy paths in execution.

--

--