AppsFlyer’s Open Source Libraries for Clojure and Protocol-Buffers

Published in

AppsFlyer Engineering

8 min readDec 6, 2021

Introduction

In the past year, AppsFlyer — a large Clojure shop — has widened its adoption of protocol buffers as a serialization format and schema definition language. We recently open sourced two new projects that aim to improve the usability of protobuf for Clojure engineers: pronto and lein-protodeps.

By “usability,” we mean two things:

developer tooling: a simple, consistent and automated way for Leiningen projects to declare, consume, and compile schema dependencies.
idiomatic interop: use protobuf POJOs as plain data, but in a safe manner that ensures that we cannot break the schema.

In this post, we will introduce these projects, explain why we wrote them and how we use them.

Quick Protobuf Primer

Protobuf provides a language for schema definition. Schemas reside in .protofiles and look similar to this:

Here we’re defining a simple schema for a Person and Address message types (which can be thought of as classes, structs, or records in other languages.)

Apart from primitives and struct types, protobuf’s type system also allows for enums and one-of’s (union types).

To use a schema in your language of choice, it first needs to be compiled into stubs. To do that, we use the protoc CLI compiler:

In Java, each message type will be translated to its own POJO that contains getter methods per schema field, as well as an accompanying Builder class for setter methods.

These POJOs can be instantiated, serialized to bytes, and deserialized back in any language with protobuf support — with support for both backwards and forwards compatibility.

lein-protodeps

We decided early on to follow Google’s example of creating a single monorepo to store all of AppsFlyer’s schemas.

Each schema has a single owner (a development team) responsible for serving it: whether by RPC to a service, or by producing it to some message broker to be consumed later by other services. The repo is divided into logical “products” (an API for a service or a schema for a particular logical domain, for example) that reside in different directories. The directory structure resembles this:

-- products
    |-- productA
    |   |-- v1
    |   |   |-- .proto files
    |   |-- v2
    |       |-- .proto files
    |-- productB
        |-- v1
            |-- .proto files

A single cross-company monorepo allows anyone at AppsFlyer to easily discover schemas. However, in order to actually use these schemas in a Clojure project (hosted in a different repo), we also needed a way to fetch and compile them into Java stubs.

At first, we did the only thing that was available to us: clone the schema monorepo locally, run protoc, and generate Java stubs in our project. We encountered several challenges with this, all stemming from this being a manual procedure:

It’s toilsome — we are looking for automation around this process.
It’s inconsistent — since each developer was doing it locally, they were using whatever protoc binary happened to be installed locally. This could differ between machines — and might not even be compatible with the Java protobuf lib dependency in the project itself. Even if all developers working on a project synced on their protoc version, our CI/CD environment might be using something else while building the project.
It’s unversioned — similarly to regular library dependencies in a project, we’d like to pin the version of the schemas our projects needed to use rather than blindly use the latest version that happened to exist when we cloned the repo.
It’s unclear how to package the generated stubs — should they be checked in to every project’s repo? This is both unnecessary (as we already have a source of truth in the form of our schema monorepo), clutters up pull requests, and is made worse by the process being inconsistent between developers (as described above).

To solve these problems, we wrote lein-protodeps, a leiningen plugin for automating protobuf and gRPC schema compilation. Using the plugin, we can declaratively list our schema dependencies in our project.clj.

Each dependency is defined by the repo hosting the schemas, the schema’s path within the repo, and its version. Once declared, the plugin automates the entire process of cloning repos, compiling the relevant files, and placing them in your project’s source tree.

The process is predictable and repeatable, since protoc’s version is itself declared in the project. This means that it compiles to the same code whether it runs on a developer’s machine or in our CI pipelines.

Below is an example configuration of the plugin in a sample project.clj file:

Here, we’ve stated that we’d like to use protoc version 3.11.3. We’ve also told it that we’re dependent on the schemas residing under the products/my_cool_apischemas in the git repo. These are hosted at git@repo-url.com:MyOrganization/schemas.git and for the version behind the v0.0.6 git tag. As you can see, lein-protodeps will also compile gRPC schemas if compile-grpc?is set to true.

Next, when we run lein protodeps generate, the following happens:

The plugin downloads version 3.11.3 of protoc and version 1.30.2 of the gRPC plugin (unless they have been previously downloaded by the plugin).
It will shallow clone the repo locally and checkout the commit at schemas@v0.0.6. Either SSH or HTTP authentication can be used.
It will compile the schema files in the products/my_cool_api directory into Java stubs, and place them in the src/java/directory of our project. All of its dependencies will be automatically compiled as well.

At this point, you’re ready to spin up a REPL! The Java classes will be automatically compiled and usable.

Since the plugin automates the entire process, developers never check-in the compiled stubs to their repos. They simply run the plugin locally as well as in their CI pipelines when building artifacts.

pronto

Now that we’ve figured out the compilation process, we can start writing Clojure programs that use the stubs. Put on your Java interop hats:

 💡 PLEASE NOTE 💡The examples have been taken from an example lein project that shows how to set up a minimal project with lein-protodeps and pronto. You can find it here, and take a quick look at the schema we’ll be using.

… et cetera. We didn’t want to sacrifice our ability to write data-oriented Clojure programs but we also wanted to ensure the properties for which we chose to use protobuf to begin with are upheld — namely, that we cannot break the schema. Essentially, we’d like to turn the above into this:

There are several existing Clojure libraries that let us interact with protobuf as Clojure maps. However, we could not find one that met our entire list of criteria, including:

Uses the official Google Java implementation of protobuf.
Preserves unknown fields — this is a must for us in order to build asynchronous pipelines of services that might use different schema versions.
Performant — map operations (instantiation, assoc, update, get, etc) should have performance characteristics close to their equivalent when using the Java POJOs, and certainly not worse than that of native Clojure maps.
Low memory overhead.
Safe — any operation that breaks the schema (i.e, associng a key not present in the schema, or associng a wrong type) should be rejected and fail-fast.
Extensible — allow users fine-grained control over the shape of their maps and how different types in the schema are represented at runtime.

We therefore wrote pronto, which aims to do all of the above.

The main abstraction in pronto is the proto-map. They look and feel like Clojure maps, but a proto-map is in fact an immutable wrapper around a Java protobuf POJO.

To start using pronto, you’ll have to start by creating a mapper for the protobuf Java classes you’d like to use by calling the defmapper macro:

Here we created a new my-mapper var, which holds a mapper for the Person class (and its entire dependencies graph). You can pass as many classes as you’d like, but you’d usually just pass some root classes. During macroexpansion, it generates a wrapper class for each protobuf Java class. These wrappers implement all the necessary Clojure interfaces in order to behave as regular maps.

Now that we have a mapper, we can use it to interact with the library. For example, instantiating proto-maps or deserializing data into proto-maps:

If we check the type of a particular proto-map, we’ll see that it’s actually a bespoke wrapper type for a particular Java class:

Wrapper classes are generated once, during the macro expansion of the defmapper call. Every instance holds an underlying instance of the Java generated class to which the proto-map delegates all reads and writes via the appropriate setters and getters. No reflective API is used.

It is important to realize that while proto-maps look and feel like Clojure maps for the most part, their semantics are not always identical. Clojure maps are dynamic and open; protocol buffers are static and closed. This leads to several design decisions, where we usually prefer to stick to protocol buffers’ semantics rather than Clojure’s.

This is done in order to remove ambiguity, and because we assume that protocol buffers users would like to ensure the properties for which they decided to use it in the first place are maintained.

One such conflict surrounds the issue of nullability. Unlike Clojure, every scalar field in a protobuf message is always present and defaults to its type’s “zero-value”. This detail is not hidden by proto-maps:

We have documented where such mismatches between maps and proto-maps occur here.

Performance-wise, we are happy with the results. Map read operations (get, get-in, keyword lookups, etc) latencies are within a very similar range to their Clojure equivalents (i.e., when compared to a regular map with the same structure and number of keys). You can read a more detailed report here.

Write operations like assoc can actually be slower than in native Clojure maps; this is due to the way that Clojure and protobuf approach immutability. Clojure provides a very efficient implementation using persistent data structures and structural sharing. In protobuf, however, immutability is achieved by copying the POJO to a mutable builder, performing the mutation on the builder, and copying the builder back to an immutable POJO.

That means that the cost of a single assoc is high because it needs to perform this entire round-trip. To alleviate this problem, proto-maps can be made transient, in which case the backing instance is the mutable protobuf Builder instance rather than the immutable POJO instance. We can use transients to pipeline several mutations at the cost of a single round-trip:

Instead of writing the above, you can also use the library’s p-> macro, a threading macro that performs the transient transformation for you:

Lastly, to make interactive development at the REPL a little nicer when working with protobuf, pronto also provides a set of utility functions for inspecting protobuf schemas. For example, the schema function returns a Clojurified representation of a protobuf schema for a particular class:

Conclusion

Pronto and lein-protodeps have been under active development for the past year, and we now feel that they are stable enough to be open-sourced. At AppsFlyer, they are used in production by many services and their adoption is steadily increasing as we widen our usage of protobuf and gRPC. We feel that these projects have made the adoption of these technologies much simpler. We welcome any suggestions and contributions, and we invite you to give them a try yourselves.