Bigblue’s multi-model database on FoundationDB

Published in

Bigblue Engineering

6 min readFeb 1, 2022

At Bigblue, we use FoundationDB as our primary data store to save many models like inventory items, fulfillment details, warehouse settings, carrier delivery areas, and so on…

This blog post is for you if:

🔥 You wonder how to create a robust data model layer.
💡 You‘re interested in Protocol buffer and protoc.
🧐 You’re curious about our FoundationDB storage layer.
💙 Like us at Bigblue, you believe we learn a lot from others.

Bigblue is the post-purchase delivery solution that allows brands of all sizes to compete with the service offered by retail giants. Since I joined the team more than one year ago, I marvel at how easy it has been to interact with FoundationDB while rapidly evolving our product at the same time. In this post, I’m trying to share key points in facilitating the data model management on FoundationDB.

FoundationDB and me

As a software engineer, distributed systems is a passion I can’t satisfy enough. Still, I had the chance to work with data stores like PostgresSQL, MySQL, Mongo, Redis, and Elasticsearch. Joining Bigblue, FoundationDB was new to me, so I couldn’t help but dive into the documentation to understand all the nuances. To put it simply, FoundationDB is a distributed store with ACID promises while ensuring at the same time performance and scaling guarantees. But with great power comes great responsibility 🕷 😉. As some of you may already know, misusing a database can significantly impact performance and maintainability 😱. It’s here that wrapping FoundationDB with a layer of abstraction helps in many ways 💪.

Defining a data model

On Linux, “everything is a file”. When engineering software, everything is data!

At Bigblue, we believe that the data model is a vital architecture component! A unified representation of the data makes it very accessible to engineers, BI analysts, and newcomers. In addition, with code generation, we ensure that the data store always reflects our data model. As a result, we avoid unexpected insertions into the data store.

If I step back and take a closer look at databases I used to work with:

PostgreSQL data model goes through SQL schemas.
MongoDB contains denormalized data. Still, the data modeling can be enforced by external libs like the well-known Mongoose.

Now, what about FoundationDB?

Regarding Foundation DB, they tell us:

In FoundationDB, both keys and values are simple byte strings. Apart from storage and retrieval, the database does not interpret or depend on the content of values.

Said otherwise: It is up to developers to provide their custom data layer to marshal and unmarshal the data model. At Bigblue, we have chosen to use Protocol buffer to build a robust, easy to change, and reliable data model.

👇 Here is our magic soup 👇

1/ A Github repository, called proto, contains all our Protocol buffer messages. A Protocol buffer message represents a specific data model to store in the FoundationDB store.

2/ The Github repository is split into logical functional folders to group the data models with the same level of concern. Then each folder contains .proto files with the definitions of Protocol buffer messages:

proto /
  bigblue /
    inventory /
      inventory.proto
    warehouse /
      carrier.proto
      warehouse.proto

3/ In .proto files, a Protocol buffer message defines the list of typed fields that constitute the data model. Here is an example of what anyone would expect from inventory items:

4/ Once the PR that contains data model changes is merged, our CI runs a job that generates the Go code from Protocol buffer messages, pushes the results to dedicated Github repositories, and tags them with a new version. Then, anyone can update their dependencies on their project.

Code generation from data models

The protocol buffer compiler protoc is used to compile .proto messages. Using the plugin protoc-gen-go, protoc generates the Go code that marshals and un-marshals messages, which is what we need to save or fetch data from FoundationDB. Here is a sample of the Go code generated by protoc:

The InventoryItem Go structure can now be marshaled and un-marshaled by proto libs.

Building a FoundationDB storage layer

With code generation, we can generate methods that interact with the database too!

To abstract the FoundationDB query complexity and ensure the data saving and fetching are consistent, we have developed our own specific protoc plugin. Executed by protoc, it generates the specific methods to interact with FoundationDB.

Sample of the methods generated by the Bigblue protoc plugin

How to declare the methods we need?

A store configuration is declared on each proto message so that the Bigblue plugin generates the specific wanted methods that interact with FoundationDB. By refining this configuration, engineers control how the data is saved or fetched.

👆 In this example, the store configuration declares primary_key to have inventory items saved and fetched using a product ID and a warehouse ID. From this simple configuration, our protoc plugin generates CRUD methods:

A method to create or update an inventory item.
A method to read an inventory item, using the declared primary key‘s fields as arguments.
A method to delete an inventory item. Like the fetch method, a warehouse ID and product ID are requested.

These methods define how data is stored, query FoundationDB and marshal and un-marshal protobuf values.

Inside our FoundationDB data store

The data is stored in FoundationDB in directories (the storage layer equivalent of a Postgres table or MongoDB collection): one directory per data model. In directories, data is stored as ordered key ⇒ value pairs.

The key is based on the configured primary_key: it’s a special concatenation of its fields: each field is serialized so that the keys preserve the ordering of the data types they encode.
The value is the message’s data, marshaled into protobuf bytes.

Keys are sorted in lexicographic order. They will always be read in that order. It makes reading multiple adjacent values in one transaction very efficient.

Anatomy of a message’s directory in the storage layer

The methods that our protoc plugin generates make use of our data storage layer to connect with FoundationDB:

Sample of the generated code that fetches and saves inventory items

👆Once this code is generated, engineers can very easily use these methods in their application to save or get their data models 🤩:

In the end:

Few lines are required to store our data model within FoundationDB.
The code semantic is consistent, making it easy to maintain our code stack, which, in the end, speeds up our productivity.
The DX is great and simple, making it easy to onboard new Bigbluers.
The abstraction layer makes us focus more on Product outcomes than technical issues, which increases the growth of our products 🚀.

For the sake of clarity, I didn’t explain how our custom protoc plugin works, how we do data versioning, or how we leverage all the FoundationDB capabilities. But I guess that it would be perfect topics for future blog posts. Anyway, feel free to share the topic you’d like to see covered in the comments. For now, if you want an overview of FoundationDB capabilities, this paper is for you!

Thank you for reading! We’ll blog again, so subscribe to our blog to get notified when we do!

Like what you’ve read? We’re hiring software engineers to build the ultimate delivery experience for all brands. Check our open positions!