Cloud Firestore: On data constraints and evolvability

Published in

Firebase Developers

12 min readMar 2, 2020

Cloud Firestore is Google’s highly scalable, NoSQL database for mobile and web applications. On top of a document-oriented data model, Firestore offers an array of powerful features including realtime event listeners, atomic transactions, and offline support.

Just because it is another NoSQL database, many developers assume that you cannot enforce any integrity constraints on Firestore data. But that is not entirely correct. In this article we look at Firestore’s data model, the constraints we can apply on Firestore data, and the programming semantics that result from such constraints. We also explore how to change the Firestore data constraints over time without disrupting the applications that depend on the data.

Schemaless? Doesn’t have to be.

Firestore and other similar NoSQL databases are typically dubbed schemaless databases. But as Martin Kleppmann points out in the book Designing Data-Intensive Applications, this term is bit of a misnomer. Any application that queries data usually assumes some kind of a structure. Without a structure it’s impossible to implement any meaningful queries or application logic to interact with the data. Consider the code fragment in listing 1 as an example.

Listing 1: Implicit structure assumed while reading data

Listing 1 queries Firestore for all cities with a population over one million. Results are then unmarshalled into the custom City class. The City class outlines the implicit structure developer expects to be present in the documents read from Firestore. The query itself assumes there’s a numeric population field present in each document. This demonstrates how there’s always some notion of a schema in play when writing code to interact with data. The term schemaless is often just an indication of whether the underlying database enforces it or not.

Kleppmann uses the terms schema-on-write and schema-on-read to disambiguate this idea. Traditional relational databases are schema-on-write systems, in which every write operation is verified against a predefined schema. Firestore is usually a schema-on-read system where there’s an implicit structure present in the data, which is only interpreted when the data is queried by an application.

Schema-on-read systems are generally more flexible and adaptable to the variety in real-world data than schema-on-write systems. Suppose you defined your city name column to be VARCHAR(32) in a relational database, only to come across a city name 58 characters long! Your only options are to either store a truncated city name or perform a potentially expensive schema alteration. Schema-on-read systems handle such edge cases much more gracefully, often without any effort on the developer’s part. However, app development is usually about guarantees, and schema-on-write systems undeniably give stronger guarantees about the structure, correctness and completeness of the data.

Firestore is typically a schema-on-read database. Security rules allow adding schema-on-write semantics to Firestore.

Thankfully, Firestore doesn’t lock you into the realm of schema-on-read. Security rules enable you to implement schema-on-write semantics in your Firestore databases. This prevents applications from writing malformed or incomplete data to Firestore, which in turns gives applications stronger guarantees about the data when querying. Listing 2 shows an example rules configuration that we can use to validate the city documents. This configuration requires every city document being written to have a population field with a positive numeric value, and a name field with a non-empty string value.

Listing 2: Firestore data validation rules

It is highly advisable to implement data validation rules similar to listing 2 in every production Firestore database instance. You do sacrifice some of the flexibility in the process, but allowing applications to write arbitrary-structured data is almost never a good idea. More variety across your documents make querying your database that much harder. Validation rules can also help you detect subtle data mutation bugs in the application code, and prevent malicious clients from corrupting the data.

Flexible typing

Firestore allows you to specify validation rules that are flexible regarding the types of your document fields. Take listing 3 for example, which effectively assigns the Integer type to the population field.

Listing 3: Validation rule with flexible field typing

Since our rules make use of the int() function, clients may also write float or string values to the population field without violating the schema constraint. That means all of the following values are accepted in the population field:

Integer: 1000
Float: 1000.0
String: '1000'

This is in stark contrast to schema-on-write systems like SQL where an INT column is strictly constrained to integer values. This flexibility around typing can be quite useful when your data is written by a wide range of clients, on different programming languages and platforms. Firestore provides similar type conversion functions for other primitive types — string(), float(), bool().

However, note that whenever you allow the type of a field to be flexible, any application logic that queries the documents should be prepared to cope with the resulting variety of values. Listing 4 shows some JavaScript code that reads and displays a city document on a web page.

Listing 4: Managing the type variability in application code

The display() function is type-agnostic, and simply adds the values to the HTML output. In contrast, the closeToMillion() function performs some arithmetic on the population values. Since our type-flexible schema allows string values like '1000' in the database, we need to perform an explicit type conversion before applying the + operator. This is a good example of the trade-off between flexibility and guarantees. Increased flexibility leads to weaker guarantees regarding the form of the data, which in turns requires us to manage it in the application code.

Data constraints

Schema-on-write systems typically enforce four main types of integrity constraints on data. The following list outlines these constraint types along with how they are supported in SQL.

Domain integrity: Types (INT, VARCHAR etc), Not-null constraints, check constraints
Entity integrity: Primary key constraints
Referential integrity: Foreign key constraints
Key integrity: Unique key constraints

Following SQL statement demonstrates how these different integrity constraints are used in practice.

CREATE TABLE PERSON (
  UID VARCHAR(128) NOT NULL PRIMARY KEY,
  AGE INTEGER CHECK(AGE >= 18),
  EMAIL VARCHAR(256) CHECK(EMAIL LIKE '%___@___%'),
  SCREEN_NAME VARCHAR(64) NOT NULL UNIQUE,
  COMPANY_ID INT,
  FOREIGN KEY(COMPANY_ID) REFERENCES COMPANY(COMPANY_ID)
);

Firestore security rules naturally lend themselves to expressing domain integrity constraints. Document ID is the closest thing to an entity integrity constraint in Firestore, as it is guaranteed to be unique within a collection, and acts as a natural collection-wide primary key. Referential and key integrity constraints are not as obvious to implement in Firestore, but certainly not impossible. Listing 5 shows an attempt at implementing all the semantics of the above SQL statement with Firestore security rules.

Listing 5: Domain, entity, referential and key integrity constraints with Firestore

Notice that we are using the DocumentReference data type for the company field, which manifests as a document path during rules evaluation. We can simply lookup this reference using either exists() to enforce the referential integrity constraint. The unique key constraint is enforced by mapping each screen name to a document in a separate screenNames collection. Each of these documents contain a uid field which can be cross referenced against the document IDs in the users collection.

You should consider the cost of implementing security rules of this form. Due to our use of get() and exists() functions, each write requires 2 additional reads for rules enforcement. These reads do count towards your Firestore API quota, and are billed accordingly. Depending on how often the users collection is updated you may or may not be willing to bear the extra cost. But the technology does support it.

Also keep in mind that what’s good in the relational+SQL world may not be idiomatic in the Firestore world. Therefore consider your application requirements carefully, and map them to a document-oriented data model that is both natural and cost-effective to your workload. Your requirements may cover a broad range of topics including the volume of reads and writes, read latency, data access patterns, realtime listeners vs one-off queries, battery life of clients and more. Think about the flexibility and guarantee trade-offs as well. You need some constraints to prevent writes that can render your data meaningless, but not so many that they become a hindrance.

There are many helpful references on the subject of Firestore data modeling from the Firebase team. Here’s a talk from Todd Kerpelman to get you started.

Schema evolvability without data migration

As your application changes over time, your schema will also have to adapt and evolve. An interesting feature of Firestore security rules is their ability to change without forcing an expensive data migration. To demonstrate this, notice that listing 2 limits the city names to 32 characters (similar to VARCHAR(32) in a relational database). If you later decide to increase this limit to 64 characters, you can do so by simply updating your rules configuration. Your existing data will remain intact, and none of the application code needs to be updated.

You can always update your Firestore data validation rules without touching the data itself. However, note that if your schema change is not backward compatible, it can leave your old data in an inconsistent state. For example suppose we have a description field in the city documents that is 128 characters long. If we later decide to lower this to only 100 characters, the existing documents with longer descriptions go out of compliance with our latest rules. The applications can continue to read these documents as long as they don’t make any assumptions regarding the field length. But any attempts to update the field will fail if the updated value is longer than 100 characters.

In general you should keep schema changes to a minimum, regardless of their potential impact. Carefully evaluate your schema choices before you implement them, and once implemented plan to stick with them. But if you ever have to implement a schema change, make sure it is at least backward compatible. That is any newly written code should not encounter problems while reading or writing old documents. However, this is also only a necessary requirement. Depending on the nature of the schema change, backward compatibility may not be sufficient to ensure continued operation of your apps.

Backward compatibility may not be enough

Making sure that a planned schema change is backward compatible is usually easy enough. You can always write new application code to gracefully handle any documents that were written before the last schema change. What is complex is planning for forward compatibility — i.e. making sure the old application code can read and write new documents that were validated against a newer schema.

This is a particularly pressing concern for mobile apps, since the app developer has no control over when the users upgrade their apps. For instance you may have just released v2 of your app along with a related set of schema changes. But if you were not careful, the users who are still on v1 may fail to read or write the data generated by the new version of the app. Hence, if you are a mobile app developer you need to make sure that each schema change you roll out is both backward and forward compatible.

Adding and removing fields

In time you may wish to add new fields to your documents or remove existing fields that are no longer of use. As long as the fields being added or removed are optional, such changes are both backward and forward compatible. When we say optional, that means both the Firestore security rules and the application code should treat the fields as optional. Provided this condition is satisfied, the old versions of the app will simply ignore any new fields, and will not break when a familiar field is not present in the newly written documents. Similarly, new versions of the app will not break when they cannot find a new field in the old documents. Once you can gracefully handle addition and removal of individual fields, you can also easily deal with renaming of fields.

It is interesting to consider what would happen if a new required field is added, or an existing required field is removed. These changes disrupt application usage in slightly different ways. Adding a new required field can prevent old versions of the app from writing documents. As your old code is not aware of the new field, it fails to compose documents that satisfy the latest data validation rules. On the other hand removing a required field can prevent old versions of the app from reading the documents written by the new versions of the app. Recall that in both cases we can manage to maintain backward compatibility with a bit of clever programming. It is the forward compatibility that usually suffers in these situations.

As an example, suppose we renamed the required field surname to familyName in a collection. This is same as removing the surname field, and adding the familyName field. Listing 6 shows how a new, backward compatible version of the app can gracefully handle this change.

Listing 6: Programming for backward compatibility

Now consider listing 7, which is from an old version of the app. This code fails to create new documents as it is not aware of the now required familyName field. Furthermore, it cannot read the documents created by new code (listing 6), as they do not contain the surname field.

Listing 7: Application that breaks due to the lack of forward compatibility

We can mitigate the issue of writes by relaxing our validation rules to allow either familyName or surname fields in the documents. This is shown in listing 8.

Listing 8: Relaxed data validation rule for forward compatibility

But resolving the issue of failing reads is tricky if we hadn’t planned for such changes from the beginning. In case of listing 7 we should have at least programmed our app more defensively, using sensible default values for each field wherever possible (data classes with default field values is a good way to implement this). This of course doesn’t solve all of our forward compatibility woes. But it at least prevents total failures in the old versions of the app, as the new rules and the corresponding code roll out.

To retain full forward compatibility through our field rename we can keep adding both familyName and surname fields to each document created by new versions of the app. We can do this in listing 6 itself, and we will have to keep this duplication up until we are confident that all users have migrated off the old version of the app.

Retaining forward compatibility with Remote Config

In this final section we look at a more elaborate solution for implementing forward compatible schema changes. This is not a retroactive fix. Rather you had to have implemented this from the beginning in anticipation of future schema changes. Most developers won’t need something like this. Nevertheless it is an interesting technique, and a powerful application of Firestore and Firebase Remote Config.

To make this work we should add a schema version field to all of our documents. In the following example this field is just called v. Suppose that each version of our app knows how to read documents of a specific schema version. We can also assume that the application code is backward compatible— i.e. it can also read documents of older schema versions. If the application encounters a document with a newer schema version, it consults Remote Config for instructions on how to interpret the document. Consider the following two documents for example.

people
  |
  +---- person1
  |        +-- v: 1
  |        +-- firstName: Peter
  |        `-- surname: Parker
  |
  `---- person2
           +-- v: 2
           +-- firstName: Carol
           `-- familyName: Denvers

Notice that the two documents have schema versions 1 and 2 respectively. The surname field has been renamed to familyName in the new version. We specify this schema change as a Remote Config parameter named v1_to_v2 with the JSON value {"surname":"familyName"}. Now whenever our V1 application code encounters a V2 document, it can consult Remote Config to figure out how to interpret the data. Listing 9 illustrates this implementation.

Listing 9: Forward compatible schema changes with Remote Config

This correctly displays the values from both documents, even though the surname field is not present in one of them. The getField() method correctly resolves surname field to familyName when reading the V2 document.

D/MainActivity: Field resolved: surname --> familyName

This solution can transparently deal with any renamed field. It is also quite efficient since the schema mappings can be fetched from Remote Config once, and cached in memory. If necessary this solution can be easily extended to support other more complex schema changes by incorporating more mapping instructions to the JSON parameter value stored in Remote Config. For example, here’s a hypothetical schema change where a fullName field has been split into two firstName and lastName fields:

{
  "fullName": {
    "changeType": "split",
    "fields": ["firstName", "lastName"],
    "separator": " "
  }
}

This tells the reader to fetch the firstName and lastName fields from the new schema, and concatenate them together to obtain the fullName value of the old schema.

Conclusion

Cloud Firestore is a NoSQL schema-on-read database. But security rules enable us to implement powerful schema-on-write semantics on top of it. With the right combination of rules, Firestore supports implementing all major types of data integrity constraints, resulting in stronger guarantees for the application developers. These strengthened guarantees usually come at the price of reduced flexibility, and increased data validation costs, and therefore should only ever be implemented in order to meet specific application goals. Implementing an overly rigid structure can make testing and long-term maintenance harder for developers.

With integrity constraints also comes the need to revise those constraints to meet changing application needs — i.e. evolvability. Developers must keep backward compatibility in mind when adding, removing or otherwise moving document fields around. But if you are a mobile app developer who maintains several versions of the same app, then you might also have to think about forward compatibility. That is your old versions of the app should be able to read and write the documents that adhere to a newer schema. We discussed a few simple guidelines to follow that help ensure this property, as well as a somewhat advanced solution based on Remote Config that allows developers program forward compatibility into their applications.