To parse or not to parse?
Working with JSON data in Scala
I’m Ashley Nguyen, a software engineer on the Customer Support and Insights Engineering (CSI) team at Disney Streaming. I work full-stack on a Customer Support Tool that thousands of agents worldwide use to resolve customer issues related to Disney+ and ESPN+.
JSON, JSON, JSON
JSON (Javascript Object Notation) is the bedrock of data that powers our customer support tool and is prevalent across many parts of Disney Streaming. Many services across the platform are constantly ingesting, receiving, and returning JSON objects. When it comes to parsing JSON in our Scala services, there is a multitude of libraries that may make it hard to pick the right one! This article will be a deep dive into one library in particular: Circe and Circe Optics which we discovered to be our best option due to its performance in handling JSON in a RESTful service.
What is Circe?
Besides being the daughter of Helios (Sun God) that transformed humans into animals (quite fitting!), Circe offers support for JSON parsing, traversing, and transforming. It’s forked from an existing Scala JSON parsing library, Argonaut, and serves as a wrapper around other notable packages like Jawn and Shapeless.
There are a few main components of Circe to understand:
- Parser
- Cursor
- Decoding and Encoding
The Parser
Part of the circe-parser
package includes a parse(...)
method which returns Either[ParsingFailure, Json]
. This method parse
will attempt to determine if the JSON string is valid JSON, in which case it returns Right(json)
, or if not Left(_)
. Parsing is an important first step to be able to use Circe to traverse JSON.
Creating RESTful services for customer support tools involves hitting other external APIs that return JSON with the data needed to display on the UI. Oftentimes, the JSON is in a wonky, overly nested structure that would be too complicated for the frontend to handle. Our service’s job is to transform the external APIs JSON to how we want it to look and return it in our service. To do that we will need to understand…
The Cursor
The Cursor is an object that comes in three forms Cursor, HCursor, ACursor
which belong to the circe-core
package. This is used to traverse down the JSON to extract or modify data. Think of it as a literal cursor, that starts at the top of the JSON and works its way down the object. For example:
{
"id": "00112233",
"email": "ashley@example.com",
"location": {
"current": "JP",
"registered": "US"
},
"subscriptions": [
{
"id": "445566",
"name": "Disney+ Monthly",
"price": 7.99
},
{
"id": "778899",
"name": "ESPN+ Monthly",
"price": 6.00
}
]
}
Let’s say we want the “email”, “current location” and “subscription name” fields. Extracting those fields using the cursor would involve:
Code Breakdown
- The JSON being passed starts the cursor at the top of the JSON and drills down each field by name using
downField
and then types the extracted dataas[String]
or any unified type. - Arrays of JSON objects can be done with
downArray
to comb over the array of subscriptions, “collect” the names in each subscription then create a List of Strings that will hold the subscription names.
The data that has been extracted in the example would look something like this:
"ashley@example.com"
"JP"
["Disney+ Monthly", ""ESPN+ Monthly""]
But now, what do we do with it? And what was the implicit val decoder: Decoder[Subscriber]
?
Decoding and Encoding
Decoding involves extracting JSON data that is returned by another service/database. What is decoded is usually stored in a case class. Encoding involves creating a new JSON object from that case class to model our own service’s response. A basic flow could be:
- Use an HTTP client to call
GET
data from https://some-external-service.com parse(...)
the JSON string that is in the response body of https://some-external-service.com- If JSON is valid, decode it implicitly
as[SomeCaseClass]
to grab the fields we care about and shove them into a case class object - When our service is ready to return a JSON response we can call the
asJson
method fromcirce-syntax
package onSomeCaseClass
- When
asJson
is invoked, based on what is implicitly defined in the encoder — this is how the JSON will be presented in our service
The Decoder[T]
andEncoder[T]
are defined as implicit parameters in a trait or object so that the JSON can be converted automatically upon invoking as[SomeCaseClass]
(implicit Decoder) or asJson
(implicit Encoder). IMPORTANT: the specified Decoder or Encoder may have to be imported when it is used. More on this later.*
There are two ways to decode and encode. The first way is the example above with a custom decoder — manually extracting the fields that were needed. The second way is using derivation.
Derivation
deriveDecoder[SomeCaseClass]
and deriveEncoder[SomeCaseClass]
methods should be used only if the case class matches the JSON object that is decoded or encoded. We typically use semi-automatic derivation in the semiauto._
package to call out which case classes are derived or not. Automatic derivation eliminates the need for calling the deriveDecoder/deriveEncoder
methods completely.
Using the same JSON example:
{
"id": "00112233",
"email": "ashley@example.com",
"location": {
"current": "JP",
"registered": "US"
},
"subscriptions": [
{
"id": "445566",
"name": "Disney+ Monthly",
"price": 7.99
},
{
"id": "778899",
"name": "ESPN+ Monthly",
"price": 6.00
}
]
}
Semi-automatic derivation would look like this:
Code Breakdown
- Since the case classes are modeled in a way that matches the JSON returned from
some-external-service
we can usederiveDecoder
- If we want to return the same JSON structure in our service’s response,
deriveEncoder
would do that for us. Another way to use derivation is by adding the@JsonCodec
annotation to the case class
For small objects that are in the response, derive
is great if we don’t plan to restructure it much. But for a large object like this Subscriber — we might want to parse down to the fields we need and restructure the JSON response so it’s relevant to the client. To do that we can re-define our case class:
A few things have happened here:
Subscriber
case class only contains 3 parameters, the fields we care to extract — email, current location, and subscription names- The custom decoder for a
Subscriber
case class happens in afor yield
to traverse down the JSON with anHCursor
and yield a case class object out of it - The custom encoder returns a
Json.obj
that 1) redefines the service response by abstracting out the case class definition; and 2) relabels the field names from the case class parameter names so that it makes sense to the client: “email” → “email”, “currentLocation” → “location”, “subscriptionNames” → “subscriptions” - The custom encoder friendly names the country codes in the original response to readable country names (there are libraries that can do this — the simple match is for example's sake)
asJson
is used on the primitive types (i.e. String) to “JSONify” each field in the response
Custom Decoding and Encoding is extremely helpful to filter down JSON responses to what is needed and to modify the JSON in the response so that it is as custom to the client as possible. In this example, our service’s response is:
{
"email": "ashley@example.com",
"location": "JP",
"subscriptions": ["Disney+ Monthly", "ESPN+ Monthly"]
}
Putting it all together
The parser, cursor, and decoder/encoder can all come together seamlessly with implicit parameters and a clean application flow and structure.
This snippet essentially follows the 5 steps (see code comments) mentioned at the beginning of the section:
- Use an HTTP client to call
GET
data from https://some-external-service.com parse(...)
the JSON string that is in the response body of https://some-external-service.com- If JSON is valid, decode it implicitly
as[SomeCaseClass]
to grab the fields we care about and shove them into a case class object - When our service is ready to return a JSON response we can call the
asJson
method fromcirce-syntax
package onSomeCaseClass
- When
asJson
is invoked, based on what is implicitly defined in the encoder — this is how the JSON will be presented in our service
*Note how the decoder gets imported when as[Subscriber]
is called (line 14) and how the encoder gets imported when subscriber.asJson
is called (line 68). This is to ensure that the right Encoder and Decoder objects are implicitly used as opposed to Circe’s deriveCodec
or deriveDecoder
defaults that can be imported to resolve the compiler issue “No implicits found for parameter decode: Decoder[T]”.
Leveraging Circe Optics
There may be cases where JSON is deeply nested and we want to set the cursor somewhere other than the top of the object. Circe Optics is a great library supplement to optimize decoding and reduce boilerplate by eliminating the chain of downField
calls. Circe Optics depends on Scalaz and cats, making it quite performant.
Let’s take our example from above and say the JSON is now wrapped in a data
field.
{
"data": {
"id": "00112233",
"email": "ashley@example.com",
"location": {
"current": "JP",
"registered": "US"
},
"subscriptions": [
{
"id": "445566",
"name": "Disney+ Monthly",
"price": 7.99
},
{
"id": "778899",
"name": "ESPN+ Monthly",
"price": 6.00
}
]
}
}
Using the regular cursor approach, decoding a field like the “current location” now looks like this:
For all the fields that we want to decode, we’ll have to always prefix it with downField("data")
since cursors are assumed to start at the top. With Circe Optics we can define where we want the cursor to start with root
then begin traversing the JSON from there. From the Optics documentation:
In other words, optics provide a way to separate the description of a JSON traversal from its execution. Consequently we can reuse the same traversal against many different documents, compose traversals together, and so on.
With the sample JSON above, we can define root
before we reach the implicit Decoder and have the cursor start at data
rather than at the top of the object.
Code Breakdown
- To avoid redundantly doing
root.data
in the Decoder, we tell the cursor to start at that point right after parsing the response JSON string (line 11) - Then
as[Subscriber]
is the cue for theDecoder[Subscriber]
to step in and take things from there. Thec
cursor is now starting atdata
, so to grab the current location we can doroot.location.current.as[String]
on line 22 which returns anmonacle.Optional[Json, String]
- To actually grab the string, we have to finish with
getOption(c.value)
which will return anOption[String]
. We’re passing in the cursor’s value (which isJson
) to get that specific field for that JSON document
The reason Optics only grabsOption[String]
of the field instead of the String
itself is because:
- Optics uses this Scala feature called
Dynamic
which allows safe access to fields that may not exist. We don’t have to worry about parsing errors when accessing fields viaroot
that don’t exist — they will resolve toNone
- Since fields are considered Dynamic and are not guaranteed to exist, to resolve the
monacle
to a usable value, it is only optionally available when collecting the JSONc.value
Optics and Arrays
Circe Optics also has powerful support for traversing arrays. Back to the example with an array of subscriptions — given that after parsing, the cursor now starts at data
with root.data.as[Subscriber]
Code Breakdown
- Optics enables traversing the
subscriptions
array witheach
method and can grab the fields within the arrayas[T]
which returns amonacle.Traversal[Json, T]
- The main difference from before with
getOption
, when usingeach
it must be resolved withgetAll(c.value)
which will return aList[T]
- The
name
field is collected across the array withgetAll
to be resolved into a List of Strings (similar to our initial cursor example).List
is used in Circe mostly for optimization reasons but it can be turned into any collection withtoVector, toArray, toSet etc…
If multiple fields within the array need to be extracted, it is also possible to .map
over each
subscription as Json
, start the cursor at the subscriptions
and grab the fields as Option
similar to unnested fields.
Optics Gotchas
Keep in mind that with Optics, the return type is a Function HCursor => SomeCaseClass
rather than a Decoder[SomeCaseClass]
. If using Optics and expecting it to return a Decoder[SomeCaseClass]
— the decoder must return a Right(SomeDecodedCaseClass)
to resolve the function into a Decoder
object unless it's moved into a private method like the snippet above.
If it’s desired to derive the decoder and encoder but only after the cursor has been moved to a certain field — that is absolutely possible with Optics. Simply use root.someField.as[SomeCaseClass]
and in the implicit decoder/encoder for SomeCaseClass
— deriveDecoder[SomeCaseClass]
can be defined.
Optics is flexible and can be used in combination with the downField
method as well — enabling developers to combine both techniques for optimal JSON parsing where appropriate in the data.
The main difference between Optics and Cursor/downField is the level of safety needed to access each field. If the cursor attempts to go downField
on a field that doesn’t exist, a parsing error will occur. As mentioned before with Optics, using root.someField
is safe and will not throw a parsing error because it resolves either to an Option or an empty List at the end of the operation.
Why Circe?
Many Stack Overflow and Reddit threads on JSON parsing libraries in Scala consist of many opinions based on experience, preference, and use case. Some criticize Circe’s “counterintuitive” API in comparison to the familiar Java-based libraries like Jackson due to Circe’s implicit handling, dependency conflicts with other libraries, and the presence of case classes as a conduit for JSON interaction.
However, the reason our customer support tool thrives with Circe is its performance. With the engineering team being new to Scala, the ramp-up time for Circe was probably greater than if other libraries which work directly with JSON were used but the benefits of that time invested have resulted in amazing response times in our services that involve heavy JSON decoding and encoding. Provided network calls are optimal, on average, response times for our Circe-based service retrieves ~6–10 KB of data from an external service to custom decode, transform, custom encode, and return restructured JSON — has a p99 of 259 ms (0.259 seconds).
When compared to other libraries, Circe is generally faster in many benchmarks of parsing, decoding, encoding, and printing various complex case classes in common use cases. While performance testing is maintained actively, the circe
library itself receives active contributions via open source and dedicated maintainers. It is also built with cutting-edge functional programming libraries such as cats. See more documentation on design guidelines around Circe.
Conclusion
Circe is a fast, effective library in parsing and transforming JSON documents to their desired structure. With some understanding of the basic components of Circe and when to use its fancy features such as implicit custom decoders/encoders, derivation, and Optics it can be a really powerful way to optimize service response time. The API is readable to fellow engineers and offers flexibility when combining traversals between cursors and Optics. By providing safe access with Optics and returning results as an Either[Failure,T]
it empowers any Scala service to gracefully handle errors when dealing with JSON parsing and ensure a higher probability of happy paths in execution.