WTF Dial: Data storage with BoltDB

So far we’ve implemented a domain model and then subsequently rearranged it after getting some feedback. Now it’s time to get down to brass tacks — let’s build the storage layer of WTF Dial.

You can follow along with the code for this post by looking at this pull request. As always, feedback and questions are welcome!

Most people are familiar with a wide variety of database servers — everything from SQL databases to Elasticsearch to Redis. However, I’m taking a different approach for WTF Dial. I’m using an embedded key/value store called BoltDB. There are many key/value stores but I’m partial to Bolt because, well, I’m the original author of it.

Why use an embedded key/value store?

There is a long running belief in the software community that you should use a relational database as your primary store. I was an Oracle DBA for many years and I shared this same belief. However, once I began writing databases I realized that there is a lot of cruft attached to database servers that you don’t need for many applications.

BoltDB’s sweet logo

We can implement our own schema on our key/value store by using one of the many serialization libraries available. I personally like to use Protocol Buffers (aka protobufs). Protobufs uses simple definition files to declare how objects are encoded and then you can generate the encoders & decoders so they are really fast.

UPDATE: Bryce Reitano let me know that I didn’t do a very good job explaining why I prefer embedded databases over relational databases. I agree! Here’s an extended explanation of why I’m using BoltDB instead of a more traditional relational database or even a different remote key/value store.

Further explanation of why you should use a key/value store…

Relational databases have a lot of overhead and a lot of moving parts. This overhead means that it takes more hardware to manage the same amount of data. Additional moving parts means that there’s a lot more that can go wrong.

Lifecycle of a query

When you execute a query, it may seem like a single line of code to you but that SELECT statement goes through potentially tens or even hundreds of thousands of lines of code to deliver your result.

Let’s look at your average query:

  1. Invoke a sql.DB.Query() call with your query.
  2. The client opens a network connection to the remote server.
  3. Your query is serialized into binary and sent over the connection.
  4. The server deserializes the query.
  5. The server looks up your query in the query cache to see if it’s been run recently. If so, it can avoid some query parsing & query planning.
  6. If the query isn’t in the cache, it has to be parsed into an abstract syntax tree.
  7. The query engine then creates an execution plan based on index statistics and sizes of the various tables involved in the join.
  8. A transaction is started and multiple locks may be taken on the dataset to ensure consistency.
  9. The execution plan is executed and data from multiple disk locations or in-memory caches is combined at query-time to form your result set.
  10. The transaction is closed and the locks are released.
  11. The result set is then serialized back over the connection.
  12. Your client then deserializes the result set from the connection.
  13. The result set returned to your Go caller.

It’s crazy what all goes into doing a single fetch for a query. These steps don’t even get into mutating the data through UPDATE/INSERT/DELETE or the complex operations side of running a database.

An embedded key/value store, such as BoltDB, has a simpler query path:

  1. Start a transaction. This involves acquiring a single sync.Mutex lock which takes around 50 nanoseconds. After the transaction starts, the mutex is released and no additional locks are required during execution.
  2. Traverse through a B+tree to find your key/value pair. Many times your branch data is cached in-memory so only your the leaf values needs to be fetched from disk. This operation can take 1µs if all pages are cached or a couple hundred microseconds if pages need to be fetched from an SSD.
  3. Deserialize your data into an object. This operation can take less than a microsecond depending on the complexity of your object.
  4. Return the object to the caller.

Because BoltDB has very little read contention, these operations can scale well with the number of CPU cores on your machine which makes it efficient to scale vertically. You can handle a massive number of read requests even on commodity hardware by using an embedded database.

These steps also have a very limited number of ways they can fail. Read transactions realistically only fail if your process runs out of memory. Write transactions only fail if you have a disk error. Remote databases, on the other hand, also have networks that can fail or client driver incompatibilities.

Scaling with an embedded store

One common issue with local data stores is that there’s usually not a way to automatically scale them horizontally by adding more machines. That’s true. If you get to the point with your application where you can no longer handle your load on a single machine then you may want to move to a database server that can handle scaling better such as Riak.

However, adding sharding to your application may not be as complicated as it seems. If there’s interest, I’ll do a post on distributing our WTF Dial server by using consistent hashing.

The operations side

Other questions that typically come up are about how to manage an embedded database. What do backups & restore look like? Do I need to do regular maintenance of my database? Can I do replication?

We’ll be taking a look at these issues when we go to deploy our WTF server and I’ll cover these issues more in detail then. BoltDB is built to be simple and it intentionally lacks tunable knobs. It’s intended to be run without having to worry about a lot of complexity.

Should you use an embedded database?

The last point I’d like to touch on is whether an embedded database is a fit for your own project. Should you rip out your existing SQL database? Of course not. As much as I would love the industry to move to doing application development using a key/value store, it is still bleeding edge technology. It will take time to mature as a paradigm.

The WTF Dial is a fun, kind of silly application but it gives us a playground to explore different technologies than we might typically use. Once you get comfortable with using a key/value store then it might make sense to start integrating it into your application development environment.

Managing authentication

Previously we created a DialService that would allow us to manage our Dial entities:

One piece that was missing was authentication. A user should not be able to set the level of a Dial that she did not create.

One suggestion was to add the current user to each method:

There are two issues with this approach:

  1. You need to pass around the User through every function that uses DialService. This is tedious and clutters your code.
  2. The caller may not have a reference to the user. For example, if our service is implemented as an HTTP API then we’d need to authenticate using a token instead of passing the user object.

We need to separate the authentication piece from the service itself.

Clients, sessions, & services

We can remove auth from the service API by introducing two concepts:

  • Client — This represents a reference to the provider of the service. If we were implementing the services using a relational database then the client would be the database handle (aka *sql.DB). A client’s only job is to create sessions.
  • Session — This represents a single connection to the provider. A session can be authenticated using whatever mechanisms you want your application to support (e.g. passwords, tokens, JWT). The session provides references to the services.

In our BoltDB implementation, the Client will be a reference to the *bolt.DB instance and the Session will provide a method for authenticating uses using a token.

The interfaces look like this:

Changes to UserService

Previously there was a UserService for authenticating users which only had a single Authenticate() method. Authentication is such a broadly used piece of functionality that I decided to rename this service to Authenticator to narrow its focus. I’ll use UserService in the future when we add user management tasks:

Narrowly defined interfaces are usually the best interfaces. They have a very specific purpose. Additionally, well named interfaces tend to be ones were the interface name is simply the method name plus “er”. For example, the Authenticator method has an Authenticate() method.

The bolt package

We’re isolating our BoltDB dependency to a package named bolt. This may seem odd since the package it depends on shares the same name. However, our local bolt package is meant to wrap the BoltDB package and translate it to our domain model. We should never have an instance where we are importing both our local package and the underlying implementation package.

Internalizing Protocol Buffers

As I mentioned above, we’re going to use protobufs for our “schema”. It lets us efficiently encode our domain objects into byte slices. This is done by generating Go code from definition files. I like to move my Protocol Buffers definitions to the internal package. I do this for a couple reasons:

  1. Generated code is ugly. Really ugly. By moving it to its own package, it avoids cluttering my exported API. The internal package is not visible in godoc.
  2. These encodings are specific to my storage in BoltDB. I may want to share these in the future if I supported multiple key/value store backends but for right now it’s isolated to just my bolt package.
  3. The naming makes it easy to distinguish whether I’m referencing the domain type (wtf.Dial) or the encoding type (internal.Dial).

Technically this is a bastardization of the internal package but I find that it works well in practice.

Protobuf Definition

Writing definition files for Protocol Buffers is easy. They essentially look like Go struct definitions. Here’s the one we’ll use:

There are a few key differences:

  1. All numeric types need a size on them. We’re using int64 instead of int.
  2. Fields can be specified as optional or required (or repeated for arrays). I always use optional since the definitions will change over time and I like the flexibility.
  3. Field numbers are required at the end.

The most interesting part of Protocol Buffers is item #3. Field numbers provide a versioning system for your data. Need to add a field? Just add a new field to the definition with a new number. Need to rename? Just change the name. It’s field is stored by number and not by name.

This versioning system largely removes the need to perform migrations on your database. Even when you need to change the type on a field, you can provide two fields — one with the new type and one with the deprecated type — and convert the deprecated type at read-time and save to the new type when the object is updated. This allows you to slowly update your data over time.

Automatic generation

Go 1.4 introduced an awesome command called go generate. It’s premise is simple — find any comment lines starting with “go:generate” and execute the command. We can add our Protocol Buffer generation to our Go file using this generate command:

//go:generate protoc --gogo_out=. internal.proto

Now we can generate our encoders & decoders using this command:

$ go generate ./...

Marshaling boilerplate

Unfortunately there is a small bit of boilerplate involved in converting domain types to and from encoding types:

This simply copies over the fields from wtf.Dial to internal.Dial and marshals it to protobufs to produce a byte slice.

Unmarshaling data looks very similar but simply in reverse:

Now we have functions for marshaling and unmarshaling our data: internal.MarshalDial() & internal.UnmarshalDial().

BoltDB Client

The job of the client is to keep a reference to the *bolt.DB instance and to create Sessions. It also holds a reference to the Authenticator for sessions to use so it can be passed off when sessions are created.

The client looks like this:

One surprising feature of bolt.Client is that it has a reference to a Now() function. This is added so we can mock out the current time during tests. Another option is to make this function global but then we can’t run our tests in parallel.

Another surprise for some people is that a lot of fields are exported. I find that a lot of people try to hide fields and only allow them to be set during initialization through some kind of Options object passed into the constructor. That seems like overkill. These fields can all be safely set before the client is initialized and then never touched again.

Opening a BoltDB database

Creating a database with BoltDB is really simple. It’s just a file.

Once we open the file we’ll initialize buckets inside of it to store our data. Buckets are simply distinct key/value maps inside our database. We’re going to treat them like tables would be used in a relational database. Right now we’re just storing Dial data so we’ll only create a “Dials” bucket.

One important thing to note here is the db.Begin(true) call. BoltDB provides ACID transactions with serializable isolation. All commands must go through a transaction.

The true argument in our Begin() call specifies a writable transaction. BoltDB allows one writer at a time but allows as many read transactions as you want. Because of this design, BoltDB excels at read-heavy workloads.

We call a “defer tx.Rollback()” after our transaction starts to ensure that it will always close the transaction in case of an early error return or a panic. If Commit() at the end is reached successfully then the deferred Rollback() becomes a no-op.

Finally, the Tx.CreateBucketIfNotExists() creates our “Dials” bucket. It’s called every time we open the database to ensure it’s there so we don’t have to check for it after opening. If the bucket already exists then this becomes a no-op.

Connecting to a session

BoltDB is an in-process database so there’s no network connection for our sessions. Our session simply represents a way to authenticate a user. The client’s Connect() method creates the session and copies the database reference and authenticator:

We also push the current time to the Session so that all objects created with the session can have the same timestamp for consistency.

BoltDB Session

Our bolt.Session provides an interface for authenticating via a token as well as references to the services associated with the session. Our session looks like this:

All the fields are unexported since this can only be created by the Client. We save the database handle and current time so the services can reference them. The authToken is saved for authentication and the user is added as a field so authentication can be cached. Finally, the dialService is an instance of our bolt.DialService which we’ll associate with the session in the constructor:

By making the dialService a non-pointer field we reduce the allocations required when creating a new session.

Authenticating sessions

At first it might seem like we should only have an Authenticate() method directly on the Session that’s invoked by the user. However, that would mean that every time we initiate a session we’d need to make an authentication call. Some of the service APIs don’t require authentication (such as DialService.Dial()) so this would result in a lot more authentication calls than needed.

Instead we’ll let the owner of the session set the credentials and allow the service itself to request authentication as needed.

Setting the credentials is as simple as setting the auth token:

Then the service can call Session.Authenticate() when it needs the authenticated user:

This implementation caches the user so that it can be used by additional service calls without re-authenticating each time.

Session services

Finally, we’ll provide a reference to our services:

Our embedded field was a non-pointer but *bolt.DialService provides the actual implementation of wtf.DialService so we need to return a pointer to our field.

Generally you should return the concrete type that you’re using (e.g. *bolt.DialService) instead of the interface type (wtf.DialService). However, in this case we’re implementing the wtf.Session so we have to use the interface type.

Dial service implementation

Now that we’ve handled authentication, we can move on to the actual storage of the Dial itself. These methods have some code that could be refactored to shorten them but I’ve left it in so that it’s easier to read.

Creating a Dial

Let’s look at our CreateDial() implementation and then we’ll break it down:

Let’s see how this works step-by-step:

  1. We first authenticate since we need to know who is creating the Dial.
  2. We’ll be mutating our database so we’ll start a writable transaction using bolt.DB.Begin(true). Make sure to defer rollback!
  3. Next we’ll grab our “Dials” bucket we created during initialization.
  4. All BoltDB buckets have an autoincrementing sequence number similar to relational database AUTOINCREMENT fields. We’ll use this to grab the next DialID.
  5. We need to assign the current user to the dial and set it’s modified time.
  6. Next we’ll marshal our Dial to bytes using internal.MarshalDial().
  7. We’ll insert our binary blob into our bucket using bolt.Bucket.Put() and use the Dial.ID as the key.
  8. Finally we’ll commit the transaction.

I use one helper function called itob() when writing BoltDB applications that simply converts an int to an 8-byte big endian encoded byte slice. It’s 3 lines of code that I don’t want to repeat everywhere:

In general, use big endian encoding for integers because it provides lexicographical sorting which is important when we want to iterate over our data.

Retrieving a Dial

Once we have our dial in our database, reading it out is easy:

Let’s walk through this function:

  1. Since we are not mutating the database we can use a read-only transaction. We do this by passing false to bolt.DB.Begin(). Again, don’t forget your defer rollback!
  2. Next we’ll retrieve the “Dials” bucket and lookup the binary blob associated with the given id using the bolt.Bucket.Get() method.
  3. If there is no value then the dial doesn’t exist and we’ll return a nil Dial. Some people return a “dial not found” error but I don’t think looking up a non-existent object is an error. It also tends to complicate error handling by the caller.
  4. Next we’ll unmarshal our blob into our wtf.Dial variable using our internal.UnmarshalDial() function.
  5. Finally we’ll return a reference to our dial variable. By exiting the function, our transaction will automatically be closed by the deferred rollback.

One reason I love using key/value stores for application development is that your data doesn’t have to move far. It’s all right there on one machine. BoltDB uses a read-only mmap to perform zero copy access of the data and, assuming your data is hot and in-memory, this fetch will only take a couple microseconds.

Read-only transactions in BoltDB also have very little contention so it’s reasonable to support thousands of read-only transactions per second.

Setting the level

Finally, we need to be able to update the levels of existing Dials. We also need validation so that only the user that created a dial can update it.

Updating a dial looks a lot like the previous two actions combined. We fetch the dial, update it, and then save it. Let’s walk through the code:

  1. We need to make sure the current user is the owner so we’ll authenticate first.
  2. We’re mutating the database so we need to create a writable transaction.
  3. We’ll grab a reference to the bucket since we’ll be using it twice in this function. No point in looking it up twice.
  4. First we’ll look up the binary blob by ID and unmarshal the data. In this case, since we’re expecting the dial to be there, we’ll return a wtf.ErrDialNotFound error if it doesn’t exist.
  5. Next we’ll verify that the dial’s creator is the same as the currently authenticated user. If it is not then we’ll return wtf.ErrUnauthorized.
  6. Now that we know we can change the dial, we’ll update the level and the modified time.
  7. Next we’ll marshal our modified dial and overwrite the previous entry in our database for our dial.
  8. Finally we’ll commit the transaction.

Error handling

There’s a lot of debate in the Go community about how to do errors. My default way to do errors is using constant string-based errors. Despite Dave Cheney specifically saying this is the wrong way to do errors, I actually learned this from his blog post.

The idea is simple. A wtf.Error represents an error that occurs within the scope of the domain model. For example, not finding a Dial is a wtf.Error but a disk failure is not.

We can declare our type like this:

You’ll notice that Error is actually just a string which is an immutable value. We can create our errors like this:

It also has the awesome property that it can be checked for equivalence by using the == operator. For example, we can do this:

This becomes important as we get into the HTTP handler. We’ll cover errors more extensively in that post.

Ensuring interface compliance

With those 3 methods, our *bolt.DialService now implements wtf.DialService. To ensure that we get compile-time error checking of this implementation, we can use a nifty little trick:

var _ wtf.DialService = &DialService{}

By assigning our implementation to a global variable, the compiler will verify that it implements the type of the variable. We use underscore as the name to indicate that we’re not actually going to use the variable.

Testing our implementation

If you’ve ever done testing against a database server, you probably know how painful it can be to set up your environment and how slow it is to run tests. BoltDB, however, has no setup (besides “go get”) and it’s incredibly fast. Since it just uses a single file, you can easily parallelize BoltDB tests.

For these unit tests, I only use the built-in testing package for testing but I use a handful of tricks to make it easier. We’ll take a look at these one by one.

Underscore test package

One feature of “go test” is that you can put your test files in a separate package than your non-test files even though they’re in the same directory. The name of your test package is always your package name + “_test”. For example, our bolt package’s test package is bolt_test.

There is debate in the community as to whether this is a good idea or not but I personally love splitting my packages. By using a separate test package, it ensures that I am testing my code only through its exported API. It makes me an end user of my own code and I find it makes me design my APIs better.

Test wrappers

One side benefit to using a test package is that you can make test wrappers for your types without having naming conflicts. These wrappers embed my types as anonymous fields so I can use the wrappers just as I would use the regular types themselves.

I use test wrappers in two cases:

  1. The type has interface fields that I want to mock.
  2. The type needs setup/teardown for the test.

Let’s look at the code itself to see how this looks in practice.

Client test wrapper

The bolt.Client is our wrapper around the BoltDB database itself. It has two pieces I want to mock: authentication & time. It also requires setup & teardown since we need to create a temporary database file and then remove it when our test is complete.

Our wrapper looks like this:

First, note that our *bolt.Client is an anonymous field inside *bolt_test.Client. The compiler is smart enough to treat calls to this test wrapper like calls to the anonymous field.

Next, we embed a mock implementation of Authenticator inside our wrapper. We’ll attach this to our anonymous field’s Authenticator interface in the constructor.

Client wrapper construction

Our constructor wires everything together:

Let’s take this step by step:

  1. First we generate a temporary file using ioutil.TempFile(). If this fails then just panic. TempFile() returns a file handle so we need to close it before handing off the underlying path to the Client.
  2. Next we’ll create the Client wrapper with the implementation inside. Anonymous functions have an implicit name based on their type name so we reference the field using “Client”.
  3. Next we assign our temporary path to the client.
  4. Then we set a mocked Now() function to a local variable we declare as midnight on 2000–01–01 UTC.
  5. Finally we need to assign our mock Authenticator to our underlying client’s Authenticator interface field. This will let us set mock functions during testing.

Setup & teardown

The last part of our Client test wrapper is the setup & teardown. We don’t want to waste a bunch of space in our tests setting up and removing our database so we can move it to our test wrapper.

First, we’ll have a MustOpenClient() which returns a new, opened Client test wrapper:

I simply panic if the Open() fails because this action will be performed on every test. If it fails on one test then it’ll fail on all of them so the test suite should just stop immediately.

Next, we’ll wrap our Close() function so that it cleans up automatically:

This performs a little trick with the defer. The underlying Client implementation will close first and set the returned error. Since the Remove() is deferred, it will run afterward and clean up the database file.

Testing the dial service

Dial creation

The first thing we want to test is that we can create and retrieve a Dial.

Let’s walk through the code:

  1. First we’ll setup our client and defer its teardown. We’ll also create a session and grab a reference to its DialService.
  2. Next we’ll mock our authentication so that it returns a user with the ID of 123. Note that we could also mock the case where authentication fails by returning an error here instead.
  3. We’ll create a dial variable with some basic data.
  4. Next we’ll create the dial through the service’s CreateDial() method and verify any changes we expect to the dial. In this case we expect a new dial ID to be set and for the authenticated user’s ID to be set.
  5. We want to make sure our dial was persisted so we’ll call DialService.Dial() to retrieve it from the database. We can use reflect.DeepEqual() to compare the retrieved dial with our original instance.

One convention that I use that differs from the recommended standard is that I don’t use “actual/expected” error messages. Instead, I write that there is an unexpected value and show the actual value but I don’t print the expected value. When I debug test cases, I always end up going to the line of the test failure which contains the expected value. I don’t see a point in adding it to the message. It simply clutters the code, in my opinion.

Another convention that seems debated is the use of “if/else if”. I find that grouping related conditionals helps readability tremendously. Adding additional vertical spacing by separating different if blocks is harder to digest.

Setting levels

Next we’ll test updating the levels of existing dials:

The comments should be self-explanatory. We create two dials, update each one separately, and then finally verify that each one was set appropriately.

Testing error conditions

The great thing about mocks is that we can isolate our tests to only our package. We can inject failures in the mocks and test error conditions easily.

Here we’ll look at what happens when a user tries to update the level of a dial that she did not create:

We create one session that creates the dial and then in the second session we mock the authentication with a different user. Our service returns a wtf.ErrUnauthorized which we can check for equality because we’re using constant, string-based errors.

Nuimo integration

Peter Bourgon had a great suggestion to use the Nuimo device as an input for the WTF Dial. I ordered mine and just and received it the other day (woohoo!). It looks and feels awesome.

The fine folks at Senic (who make the Nuimo) graciously offered a discount code of “WTFDIAL” for the first few orders in case anyone is interested in one as well.

We’ll be looking at building other clients for WTF Dial so you certainly don’t have to purchase this device.

Conclusion

We’ve looked at building a simple CRUD interface using BoltDB to manage our Dial data. We broke it out into a Client to represent the database handle, a Session to provide authentication, and services to provide the interface implementation.

We also dove into how to efficiently test this setup using test wrappers and mocks. We saw how we can use anonymous fields to wrap our types and provide additional test-only functionality such as setup & teardown.

This was a lot to take in about key/value stores and testing mechanics but I hope it provided some insight into how to construct real world applications using embedded data storage.

Questions or feedback? Find me at @benbjohnson on Twitter.

If you liked this, click the💚 below so other people will see this here on Medium.