Lacking Schemas Will Eventually Destroy You: Start Structuring your Data

Rotem Shaul
AppsFlyer Engineering
6 min readNov 21, 2021

A beginner’s guide to ditching schema-less JSONs in favor of Protobuf schemas, it’s time…

8h.

On a typical Monday morning around a year ago, a pesky bug was opened by another team that was consuming data from my team’s services. The bug? A new field that was added in a recent feature did not make it to the correct location in the JSON that is transferred.

8 hours logged on fix and deployment.

That story sound familiar to you?

This recurring bug is so very common and is the direct result of working in a schema-less JSON world, aka the wild wild west.

At AppsFlyer, I’m a backend engineer in a team where we process 100 billion events every day, applying attribution logic in real time. Most of our services are written in Clojure, and while we have many benefits of developing with dynamic functional programming principles, the real world comes knocking when the schema-less JSONs fall through.

Consequently, in our older services we decided to make the switch to working with protocol buffers, a.k.a. Protobufs — “a language-neutral, platform-neutral, extensible mechanism for serializing structured data”, and our newer services are all designed with Protobufs at their base.

Although it is a sizable time investment, and there is risk involved in changing a big part of the infrastructure in a functioning flow that is under massive scale (if it works, why risk breaking it?!), the many long term benefits are worth it.

Whether you’re thinking of refactoring an older service, or are about to start developing a new one, I believe I have some insight. In this post, I’ll outline some things I’ve learned about starting to work with Protobufs and migrating a service. I’ll offer some of my best tips and considerations made during this process.

Now, why should you work with Protobuf schemas?

First we can start with that recurring bug — bugs of this kind will be completely eliminated. Once a clear schema is in place, these kinds of mistakes are impossible to make. There is one explicit schema that serves as a clear API between your services, clients and customers. This makes your life easier as a developer as well, as you know exactly what your data looks like — having a schema gets rid of ambiguity altogether.

Take a look at this example of a ‘click’ event msg:

Working with schemas makes your code more maintainable and organized. It enables type enforcements and clear communication across different platforms and languages, as well as providing flexibility through out-of-the-box forward and backward compatibility. It also offers great performance — so your code is more optimized and therefore provides substantial cost reduction. $$$$

So in turn, Protobuf schemas can result in happy coworkers (that know exactly what they’re working with) as well as saving you time and money!

Sounds good, but how do you accomplish all of this?

Lay Down Some Groundwork

First, you need to lay the groundwork — starting with understanding the Protobuf ecosystem and defining the steps of your plan in a design doc.

Start thinking from a data perspective early — see the bigger picture, rather than only in terms of specific services. Your mindset when approaching your service’s logic needs to have clear focus on the data-driven schemas you’ve defined.

Make sure you:

  • Keep your downstream consumers in mind and sync with all stakeholders involved, to find common ground
  • Understand what kind of tooling/libraries you’ll need as dependencies and what your workflows should look like
  • Focus on your external dependencies and how they will work with Protobufs (databases, queues, caches, other services, etc.)
  • Don’t forget about performance testing, metrics and testing methods, and allow time for all of these considerations

At each step of your plan, ask yourself — when is what I’ve done so far good enough to be ready for the next step? The answer is crucial in defining how you progress in your development while minimizing the risks.

Define your Protobuf Schema

Next, the most important thing to continue with is defining your Protobuf schema. This schema will be the base for updating your service, and will probably take a significant amount of time to decide and agree on.

WARNING: This should not be rushed — the less you have to modify your schema throughout the migration, the more time you’ll save. Try to get to the most final version before starting to integrate your schema.

Refactor your Older Services

If you’re updating an older service, start with refactoring that service — keeping in mind that a Protobuf schema message will flow through. Older services usually have code debt to account for, and cleaning up your code prior to introducing Protobufs will help you out later — providing a cleaner base to work with that will make the integration easier.

Looking at the clicks example from before, here is the migration outline:

The basic plan defines the task’s breakdown and migration phases.

The internal service plan dives into the actual service code:

  • Create 2 separate flows with proto msg vs json msg pipelines, with comparisons along the way
  • Gradually roll out certain messages to only work with the proto flow (over the JSON flow) one at a time — according to the release plan, using a feature control tool to set traffic exposure

This uncharted territory was made possible to navigate by having strong tooling and a well thought out plan. And lots of metrics. And testing. And trial and error. And patience. Oh my!

Lastly, here are some tips that may help when you’re starting out:

Working in Clojure

We used the lein-protodeps plugin for Protobuf and gRPC stub generation, and the pronto library to ease the integration of Protobufs. *Wink* we’ve got you covered — these were developed and tested in-house, open-sourced and are currently used in our production successfully. I advise you to supply and choose good tooling to help you.

Saving Time

Spend an adequate amount of time finalizing your schemas and refactoring your code right at the beginning. Every small change along the way is super time consuming!

Detaching Testing Env from Prod

Wait to deploy to production until you’re absolutely ready — we started out with 2 separate flows in the same production service (after many non-production tests). After only testing on test data for short periods of time, the performance was affected due to the large scale and real time nature of our flows.

We had to separate the new flow into a whole separate service that consumes from Kafka on a different consumer group id and is completely transparent to prod. This was done in order to not have any effect on our production.

Starting with the First Service in your Flow

It’s best to start off with your first services in the pipeline, and once Protobuf is integrated there — all services and teams downstream will want to have it too. They’ll also benefit from having a basic place to start. FOMO, anyone?

8h.

Once you have your Protobufs in place, those 8 hours spent on fixing that tedious bug could be better used towards starting to migrate your next service to work with Protobufs.
They could be used for much more interesting things that can magnificently improve your software.
They could also be used for writing a blog to spread your knowledge gained about going from schema-less JSONs to working with Protobuf schemas!

--

--