Implementing custom parsers

Wilker Lucio
11 min readApr 3, 2017

-- is blowing my mind (and if you are reading this, probably yours too!!) with this new approach to writing UI’s, but still all very new and we are all learning how to deal with this new ideas. Today I want to talk about my experience writing a custom parser to read queries on the server; I hope I can give you some insights to create your own. So let’s start!

Update: I’ve wrote a library that abstracts the operations mentioned on this article, you can it at This article still a good reference for how to make things this way, but for real apps I encourage you to try Pathom.

Before we begin

This tutorial assumes you are comfortable with query syntax and some basic parsing operations, if you are not, here are some recommended reading:

On this tutorial we are focusing on how to write a parser in a more sophisticated way, making it composable and extensible. We are going to revisit some basics on the way, but the focus here is on a larger picture.

Let’s build Youtube!

For this tutorial we are going to build a subset of some Youtube features, implementing a subset of their API into graph style.

Given any youtube video page, like this:

Given the page above, how can we describe these data needs with an query?

Here is one way how:

We start our query right away with an ident query making a join. The ident query [video/by-id "XHGKIzCcVa0"] makes the initial reference for the video of the page, from then we have it’s attributes and joins.

The layout of the query internals is designed to reflect the Youtube API v3, and this will be handy later for us to generalize the access; you can check more details about it here.

By the end of this tutorial, we are going to read the contents from the Youtube API using our previous query.

Starting a new Parser

We will start with a very basic parser, a “hello world” parser you might say:

This parser will always return “Hello World” value, no matter what query we send. Try it: (parser {} [:one :two]), which returns:

{:one "Hello World"
:two "Hello World"}

The way Parser works, is by calling your read function for each item on the query (only the roots, not automatically recursive), it’s good to keep that in mind; since your read function is going to be called once for each requested attribute, think of it as a function that processes a single key, I found this helps the thought process.

The problem that I see with hard coding the read function is that you can’t change it during the parsing time; we can change that by making the read function part of the environment instead of hard coding it:

The difference is that now we can write small read functions outside, and swipe than at any point of our parsing, it will get more clear as we make use of it down the road, stick with me. (I choose to only send the env for the reader function to make it easy to pass down since we can extract the key and params from the env).

Let’s talk about the environment; it contains not only the things you send to the parse but also other valuable information you are going to need when processing the query. Here is a list of some of then that we will be using:

  • :parser this contains the parser itself, making easy to call the parser again recursively, a typical case is when dealing with a join field (that includes children items to be parsed)
  • :query only present when a sub-query exists for the current item, same as (get-in env [:ast :query])
  • :ast is the AST for the current entry, I recommend you play around with>ast to get familiar with how the AST is structured, here I’ll post some examples, so you get exposure to it, but please try some for yourself until you are comfortable with the format:

For each call to our read function, the AST will be a children item from the root.

Project Setup

Before we go on let’s create a new project and add the dependencies that we are going to use:

Thinking on a composable graph

In our data is represented as a graph, and this can’t be different for our parser on the server, but while the client reads from a map in memory; our server must read in parts, lazily, while it navigates through the query. So it’s about implementing the nodes on the graph, and how each node can navigate, ideally we want those nodes to be composable pieces; parts that we can move around and reuse as we wish, from now on, we will call these parts Readers.

Starting the Root Reader

Now we are going to start our real parser, implementing our root node. The only thing we need to handle at the root now is the [video/by-id "SOME-ID"]:

Not the video yet, but this illustrates how we can extract the ID value so we can decide which video we need to load.

First, we retrieve the key from the AST, so we know what we need to handle on this iteration, then we check if the key is an ident.

An ident means it’s a vector with exactly two items, which the first is a keyword.

In case it is an ident, then we break it down by the left and right sides of it, separating type from the id. When it’s a video request, we will fetch and expose it.

Before we continue, we need to implement some requests to the Youtube API.

Youtube API

To make the requests, we are going to need a Youtube Key, you can find/create one at Go in “Credentials” and create an API Key.

With the key on hands, the implementation to load a video is straight forward:

Remember to use your key.

To request a video, we need to make the API call querying for the ID and for the part, the part part specifies which resources from the video should be fetched; there is a list of available parts and their costs at

Run the following to try out video fetch function:

(video-by-id #:video{:id "oyLBGkS5ICk" :parts #{"snippet"}})


Time to go back to the parser.

Smart Reading

It’s time to figure out how to get from our query to the Youtube request. So we have to determine the arguments to video-by-id from our parser environment.

The id is easy; we already extracted it before. For the parts, take a look again at how we described the query for our full page, notice we have keys :video/snippet and :video/statistics. Taking just the name from the keywords, we can get the part. This way we only fetch from Youtube the parts that we are going use.

We use query->ast helper so we can extract just the keys from the properties, this makes easy to handle “join cases” (which are common here). Then for each key, we get the keyword name and convert to camel case to match how Youtube has it.

The csk namespace comes from [camel-snake-kebab.core :as csk].

Putting it together with the updated reader:

The interesting part here is where we do a recursive call on the parser. We extract the parser from the environment and call it again, add ::entity to the environment and changing the reader. Now we must implement the youtube-key-reader that will use the ::entity from the environment and map the requested keys.

Here we do something similar we did to get the part; we use the camel-cased version of the keyword name to find the key to the Youtube return. Then, if the value is a map, we recursively do the process again, note this time we didn’t have to change the reader since it’s already on the key reader. Try out this query now:

I find this to be cool; now our client just specifies the data needs, we fetch only the required things by the user, and then just extracts precisely what was ask, all in a singular format, DDA for the win! :)

Coercing results with spec

You might catch one problem on our return, the numbers on statistics are strings instead of numbers, and the published-at is also a string instead of a date. Let’s solve that with spec!

With specs on hands, let’s create some helpers to do the coercion:

The idea here is to get the form from the spec and dispatch a coercion from it. This is far from fully featured; it will only support simple specs (without using s/and or any other composition thing (you can improve this by a smarter implementation of spec->coerce-sym). If you want to make a more full-featured version of built-in-coercions, check this for a reference of symbols to implement. For our purposes, this implementation will work just fine.

Time to update youtube-key-reader to support coercing:

Try again, and you will get coerced results.

Update: I launched a new lib that handles coercion in a same way as described before, but with much more inferences implemented, working on Clojure and Clojurescript, check it out at:

Fetching associated data

So far we did handle all client needs making a single request to Youtube, now it’s time to load more that on the fly, how much that changes for us? Not much actually.

First, we need to have a new method on to fetch a channel, very much like we did for videos:

To handle special keys, let’s turn the youtube-key-reader into a multimethod, that dispatches on the dispatch-key of the AST, we will have a special handler for :video/channel, and our previous implementation is going to be default:

The trick here is to find the associated information at the currently parsed entity; and this is why our :video/channel is inside the :video/snippet, this is where the required information is available. And in the same way, we did for the videos; we are using the query to figure the parts we need to load.

And like magic, we can now do:

Next is time for the comments, Youtube uses two resources for it: CommentThreads and Comments, when you want comments on a video or channel you must ask for the CommentThreads, and they get the comments and possibly the replies, on this tutorial we will handle only the top level ones.

Implementing the comments on

Adding the :video/comments to the key reader:

The novelty here is just that we are mapping the results with the parser since we are dealing with a collection return now.

We also introduced a problem, during the query->parts it will try to put comments as part of parts, and Youtube API will reject it with a 400 response. We can handle that by filtering the part at Youtube API function, only keeping the valid ones:

Let’s kick in a full query with all we got so far:

The query can go get very complex, your code doesn’t have to.

Placeholder nodes

There is one issue that some people stumbled upon while using; the problem happens when you need to display two or more different views of the same item as siblings (regarding query arrangement, not necessarily DOM siblings), how do you make this query?

For an example using our structure, let’s say you want to have two different views for a video category inside of the video, given these components:

You might be tempted to concat the queries, and in case you don’t have nesting like we do here, that may even look like it’s working, but let me break this illusion for you; because it’s not. When you use om/get-query it’s not just the query that’s returned; it also contains meta-data telling from which component that query came from.

This information is important, uses to index your structure and enables incremental updates. When you concat the queries, you lose this, and as a consequence, when you try to run a mutation later that touches those items you will have a “No queries exist at the intersection of component path” thrown in your face.

This problem is still in discussion on the repository. So far the best way I know to handle this is to use placeholder nodes, so let’s learn how to manage those cases properly.

What we need is to be able to branch out the different queries, this is my suggestion on how to write the VideoComponent query:

The trick is to create a convention about placeholder nodes, in this case, we choose the namespace ph to represent “placeholder nodes”, so when the query asks for :ph/something we should just do a recursive call, but staying at the same logical position in terms of parsing, as if we had stayed on the same node.

Here is one way to implement this on the server:

This is a little or engine, the read-sequence takes readers and try to run then until it finds a value that isn’t ::continue, it’s just a unique value we pick to make able to recognize when a reader is not capable of handling the currently asked key, moving to the next on the list.

We also updated our dynamic-parser to always try the placeholder here, making it globally available.

The code so far

Here is the cumulative code we wrote on this article (also added specs for the used attributes):


I hope by now you have an idea on how to write your parser, on this article we took advantages of Youtube conventions to write tiny amount of code and got a lot of coverage. We did wrapping of an external library, but you can use the same techniques to wrap a SQL database or a Datomic database, or your microservices architecture, pretty much whatever you want.

The Clojure syntax is very expressive, this means we don’t need a new parser like GraphQL folks do, we can stay on our data structures, no extra syntax parser.

A tip that I would like to leave: use qualified keywords, in conjunction with specs they are a powerful tool to convey meaning to your values just by knowing the key name.

And I lied to you, the query expressed is not entirely implemented, we missed the related, and I’ll leave that for you to implement, at this point, you have enough information for it, go and practice :).

Here you can find some suggestions to practice and improve this API for Youtube:

  • Implement related videos, here you can find how to fetch related videos.
  • Make the Youtube Key a parameter on the calls, since we using maps you can add an extra key, then fetch the key from the env on the parser, so users can inject this information when calling the parser.
  • Error handling
  • Add specs for all used keys to ensure correct coercion.
  • Implement comment replies.
  • Use spec keys to figure the possible children on each node, so you can alert users when they try to fetch an invalid key.
  • Implement pagination on comments using query params.