All about Proto DataStore

Simona Milanović
Android Developers
9 min readJan 31, 2022

--

In this post, we will learn about Proto DataStore, one of two DataStore implementations. We will discuss how to create it, read and write data and how to handle exceptions, to better understand the scenarios that make Proto a great choice.

Proto DataStore uses typed objects backed by Protocol Buffers, to store smaller datasets while providing type safety. It removes the need for using key-value pairs, making it structurally different from its SharedPreferences predecessor and its sibling implementation, Preferences DataStore. However, that’s not all — DataStore brings many other improvements over SharedPreferences. Feel free to quickly jump back to our first post in the series and take a look at the detailed comparison we’ve made there. Going forward, we will refer to Proto DataStore as just Proto, unless specified otherwise.

To sum up:

  • Provides a fully asynchronous API for retrieving and saving data, using the power of Kotlin coroutines
  • Does not offer ready-to-use synchronous support — it directly avoids doing any work that blocks the UI thread
  • Relies on Flow’s inner error signalling mechanism, allowing you to safely catch and handle exceptions when reading or writing data
  • Handles data updates safely in an atomic read-modify-write operation, providing strong ACID guarantees
  • Allows easy and simple data migrations
  • Need full type safety and your data requires working with more complex classes, like enums or lists? This isn’t possible with Preferences, so choose Proto instead

Intro to Protocol Buffers

To use Proto DataStore, you need to get familiar with Protocol Buffers — a language-neutral, platform-neutral mechanism for serializing structured data. It is faster, smaller, simpler and less ambiguous than XML and easier to read than other similar data formats.

You define a schema of how you want your data to be structured and specify options such as which language to use for code generation. The compiler then generates classes according to your specifications. This allows you to easily write and read the structured data to and from a variety of data streams, share between different platforms, using a number of different languages, like Kotlin.

Example schema of some data in a .proto file:

How to use the generated Kotlin code for constructing your data model:

Or you can try out the newly announced Kotlin DSL support for protocol buffers for a more idiomatic way of building your data model:

Investing a bit more time into learning this new serialization mechanism is definitely worth it as it brings type safety, improved readability and overall code simplicity.

Proto DataStore dependency setup

Now let’s look at some code and learn how Proto works.

We will use the Proto DataStore codelab sample. If you’re interested in a more hands-on approach with implementation, we really encourage you to go through the Working with Proto DataStore codelab on your own.

This sample app displays a list of tasks and the user can choose to filter them by their completed status or sort by priority and deadline. We want to store their selection — a boolean for displaying completed tasks and a sort order enum in Proto.

We will firstly add Proto dependencies and some of the basic protobuf settings to your module’s build.gradle. If you’re interested in a more advanced customisation of the protobufs compilation, check out the Protobuf Plugin for Gradle notes:

💡 Quick tip — if you want to minify your build, make sure to add an additional rule to your proguard-rules.pro file to prevent your fields from being deleted:

Protobuf setup for Proto DataStore

Our journey with Proto starts by defining the structure of your persisted data in a .proto file. Think of it as a readable schema for you and a blueprint for the compiler. We will name ours user_prefs.proto and add it to the app/src/main/proto directory.

Following the Protobuf language guide, in this file we will add a message for each data structure we want to serialize, then specify a name and a type for each field in the message. To help visualize this, let’s look at both a Kotlin data class and a corresponding protobuf schema.

UserPreferences — Kotlin data class:

UserPreferences .proto schema:

If you haven’t used protobufs before, you might also be curious about the first few lines in the schema. Let’s break them down:

  • syntax — specifies that you’re using proto3 syntax
  • java_package — file option that specifies package declaration for your generated classes, which helps prevent naming conflicts between different projects
  • java_multiple_files — file option that specifies whether only a single file with nested subclasses will be generated for this .proto (when set to false) or if separate files will be generated for each top-level message type (when set to true); it is false by default

Next is our message definition. A message is an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types, like we did with SortOrder.

The = 1, = 2 markers on each element identify the unique “tag” that the field uses in the binary encoding — like an ID of sort. Once your message type is in use, these numbers should not be changed.

When you run the protocol buffer compiler on a .proto, the compiler generates the code in your chosen language. In our specific case, when the compiler is run, this leads to the generation of the UserPreferences class, found in your app’s build/generated/source/proto… directory:

💡 Quick tip — You can also try out the newly announced Kotlin DSL support for protocol buffers to use a more idiomatic way of building your data model.

Now that we have UserPreferences, we need to specify the guidelines for how Proto should read and write them. We do this via the DataStore Serializer that determines the final format of your data when stored and how to properly access it. This requires overriding:

  • defaultValue — what to return if no data is found
  • writeTo — how to transform the memory representation of our data object into a format fit for storage
  • readFrom — inverse from the above, how to transform from a storage format into a corresponding memory representation

To keep your code as safe as possible, handle the CorruptionException to avoid unpleasant surprises when a file cannot be de-serialized due to format corruption.

💡 Quick tip — If at any point your AS is unable to find anything UserPreferences related, clean and rebuild your project to initiate the generation of the protobuf classes.

Creating a Proto DataStore

You interact with Proto through an instance of DataStore<UserPreferences>. DataStore is an interface that grants access to the persisted information, in our case in the form of the generated UserPreferences.

To create this instance, it is recommended to use the delegate dataStore and pass mandatory fileName and serializer arguments:

fileName is used to create a File used to store the data. This is why the dataStore delegate is a Kotlin extension property whose receiver type must be an instance of Context, as this is needed for the File creation via applicationContext.filesDir. Avoid using this file in any way outside of Proto, as it would break the consistency of your data.

In the dataStore delegate, you can pass one more optional argument — corruptionHandler. This handler is invoked if a CorruptionException is thrown by the serializer when the data cannot be de-serialized. corruptionHandler would then instruct Proto how to replace the corrupted data:

You shouldn’t create more than one instance of DataStore for a given file, as doing so can break all DataStore functionality. Therefore, you can add the delegate construction once at the top level of your Kotlin file and use it throughout your application, in order to pass it as a singleton. In later posts, we will see how to do this with dependency injection.

Reading data

To read the stored data, in UserPreferencesRepository we expose a Flow<UserPreferences> from userPreferencesStore.data. This provides efficient access to the latest saved state and emits with every change. This is one of the biggest strengths of Proto — your Flow’s values already come in the shape of the generated UserPreferences. This means you don’t have to do any additional transformations from the saved data into a Kotlin data class model, like you would with SharedPreferences or Preferences DataStore:

The Flow will always either emit a value or throw an exception when attempting to read from disk. We will look at exception handling in later sections. DataStore also ensures that work is always performed on Dispatchers.IO so your UI thread isn’t blocked.

🚨 Do not create any cache repositories to mirror the current state of your Proto data. Doing so would invalidate DataStore’s guarantee of data consistency. If you require a single snapshot of your data without subscribing to further Flow emissions, prefer using userPreferencesStore.data.first():

Writing data

For writing data, we will use a suspend DataStore<UserPreferences>.updateData(transform: suspend (t: T) -> T) function.

Let’s break that down:

  • DataStore<UserPreferences> interface — we’re currently using userPreferencesStore as the concrete Proto implementation
  • transform: suspend (t: T) -> T) — a suspend block used to apply the specified changes to our persisted data of type T

Again, you might notice a difference to Preferences DataStore which relies on using Preferences and MutablePreferences, similar to Map and MutableMap, as the default data representation.

We can now use this to change our showCompleted boolean. Protocol buffers simplify this as well, removing the need for any manual transformation from and to data classes:

There’s a few steps to analyze:

  • toBuilder() — gets the Builder version of our currentPreferences which “unlocks” it for changes
  • .setShowCompleted(completed) — sets the new value
  • .build() — finishes the update process by converting it back to UserPreferences

Updating data is done transactionally in an atomic read-modify-write operation. This means that the specific order of data processing operations, during which the data is locked for other threads, guarantees consistency and prevents race conditions. Only after the transform and updateData coroutines complete successfully, the data will be persisted durably to disk and userPreferencesStore.data Flow will be reflecting the update.

🚨 Keep in mind that this is the only way of making changes to the DataStore state. Keeping a UserPreferences reference and mutating it manually after transform completes will not change the persisted data in Proto, so you shouldn’t attempt to modify UserPreferences outside of the transform block.

If the writing operation fails for any reason, the transaction is aborted and an exception is thrown.

Migrate from SharedPreferences

If you’ve previously used SharedPreferences in your app and would like to safely transfer its data to Proto, you can use SharedPreferencesMigration. It requires a context, SharedPreferences name and an instruction on how to transform your SharedPreferences key-value pairs to UserPreferences within the migrate parameter. Pass this via the produceMigrations parameter of the ​​dataStore delegate to migrate easily:

In this example, we go through the process of building the UserPreferences and setting its sortOrder to what was previously stored in the corresponding SharedPreferences key-value pair, or simply defaulting to NONE.

produceMigrations will ensure that the migrate() is run before any potential data access to DataStore. This means your migration must have succeeded before DataStore emits any further values and before it begins making any new changes to the data. Once you’ve successfully migrated, it’s safe to stop using SharedPreferences, as the keys are migrated only once and then removed from SharedPreferences.

The produceMigrations accepts a list of DataMigration. We will see in later episodes how we can use this for other types of data migrations. If you don’t need to migrate, you can ignore this as it has a default listOf() provided already.

Exception handling

One of the main advantages of DataStore over SharedPreferences is its neat mechanism for catching and handling exceptions. While SharedPreferences throws parsing errors as runtime exceptions, leaving room for unexpected, uncaught crashes, DataStore throws an IOException when an error occurs with reading/writing data.

We can safely handle this by using the catch() Flow operator and emitting getDefaultInstance():

Or with a simple try-catch block on writing:

If a different type of exception is thrown, prefer re-throwing it.

To be continued

We’ve covered Protocol Buffers and DataStore’s Proto implementation — when and how to use it for reading and writing data, how to handle errors and how to migrate from SharedPreferences. In the next and final post, we will go a step further and look at how DataStore fits in your app’s architecture, how to inject it with Hilt and of course, how to test it. See you soon!

You can find all posts from our Jetpack DataStore series here:
Introduction to Jetpack DataStore
All about Preferences DataStore
All about Proto DataStore
DataStore and dependency injection
DataStore and Kotlin serialization
DataStore and synchronous work
DataStore and data migration
DataStore and testing

--

--

Simona Milanović
Android Developers

Android Developer Relations Engineer @Google, working on Jetpack Compose