Efficiently manage large lists in Cloud Firestore

Dana Hartweg
Dec 23, 2019 · 11 min read

We previously discussed methods and techniques for testing Cloud Firestore functions and security rules, and we’ll be expanding on that knowledge in later sections. Give that article a read first, then join us back here.

Overview

Given the above, our criteria are as follows:

  • Provide the minimum amount of data to the client that can be displayed, filtered, and searched through
  • Use as little Cloud Firestore quota as possible
  • Don’t generate documents that exceed storage capabilities
  • Require as little manual intervention as possible

We could consider using a third-party service such as Algolia, but that wouldn’t check all of our boxes and would come with additional fee structures. Plus it does nothing to help our Cloud Firestore quotas.

For this exercise we’re going to explore a technique that uses resources already available to us: client-side logic to manage the retrieved data and Cloud Firestore functions to generate the data.

Data structure

  • There is a root collection of plant varieties available to all homesteads
  • Each homestead has a private collection of plant varieties
  • It should be easy to copy a plant variety between homesteads
  • Plant varieties should track their origin company or homestead

Multiple collection approach

Single collection approach

The chosen approach

Here’s what our root plantVariety collection looks like:

plantVarieties/{plantVarietyId}{
homesteadId: string,
commonName: string,
variety: string,
originType: 'company' | 'homestead'
originId: string,
originName: string,
/* other fields that need to be stored */
}

Index structure

  • Root and homestead plant varieties need to be queryable together
  • Any fields that inform sorting, filtering, or searching need to be available
  • Indices can’t be individual documents, as that would give only slight benefit to querying the underlying collection
  • Any data we store should be as lightweight as possible, so the index can contain more information

With all of that in mind, we end up with something like this:

indices/{indexId}{
indexName: string;
parentId: 'root' | string;
isFull: boolean;
[plantVarietyId: string]: {
n: string, // the name of the plant variety
o?: string, // the name of the origin
oid?: string, // the id of the origin
l?: boolean, // is the plant local to a homestead
},
}
  • indexName is non-unique, and used internally to link all related index documents together when querying
  • parentId associates an index with a homestead or the globally accessible root collection
  • isFull tracks whether or not more data can be added into this specific index (more on that in the Cloud Functions section below)
  • All of the other fields in the document are dedicated to storing the actual index data, keyed to the id of the document being indexed

In fact, the above collection doesn’t have to just contain data for plant varieties. It can contain data for any index we want to establish throughout the application.

Security rules

That leaves us with the following rule. Important to note that this does require Cloud Firestore security rules version two.

match /{path=**}/indices/{index} {
allow get: if isAuthenticated() && (resource.data.parentId == 'root' || hasHomesteadAccess(resource.data.parentId));
allow list, write: if false;
}

Fun note: this syntax would also support storing the indices as a subcollection and accessing them with a collection group query.

Cloud functions

Creating the root index

Creating the homestead indices

Walking through the above cloud function we are:

  • Watching for any new documents in the homestead collection
  • Creating a batch in which to commit all of the indices
  • Adding every required index into the batch with a new document id
  • Committing the batch and exiting the cloud function

Keeping the indices up-to-date

  • Creating a plant variety should add a new index entry
  • Modifying a plant variety should update the appropriate index entry only if the data we’re tracking has actually updated
  • Removing a plant variety should remove the appropriate index entry and free up space in the index document if it happened to be full

There’s a lot happening above, so let’s break everything down.

  • We watch for any writes on a plant variety document, as we should respond to new documents, updated documents, and deleted documents
  • To establish an accurate homesteadId we first check the after data (in case a new document is being created) and then the before data (in case the document is being deleted)… if there is no homesteadId after that check, we can safely assume this is a root plant variety
  • If this is a new document that means there has never been an index entry for it… so we need to find the first index document that matches the correct indexName, the correct parentId, and can still accept new entries
  • If this is an existing document that means we need to find the index document in which it exists… we can take advantage of the fact that Cloud Firestore automatically indexes the first level of all maps and look for the first index entry that contains an n key on the incoming plant variety id. Note: I originally was attempting to keep the other query parameters in place to ensure we got back the correct index document, however, you are unable to use a compound query (mixing equality with range) without first creating a composite index. In our case that’s impossible to do when a dynamic field would need to be in the index. So we’ll have to trust in the ability of Cloud Firestore to generate unique document ids that will only exist for one index entry.
  • If, for whatever reason, we can’t find an index document reference, we’re going to stop executing the cloud function. This could alternatively be adjusted to create a brand new index to contain the entry.
  • If the plant variety has been deleted, we need to remove the index entry and mark the index document as being able to once again accept more entries
  • The rest of the cloud function is dedicated to ensuring we only update the index entry if the data we need has actually changed… per the cloud function documentation we don’t want to create any infinite update loops

Updating index meta information

  • We watch for any updates on the indices collection
  • If the index is full, we don’t need to do anything else… this also prevents the cloud function from running circularly
  • The maximum size for a Cloud Firestore document is 1 MiB, so we’re going to use the firestore-size package to see if we’re still below that threshold. There is also a best practice for increasing realtime update snapshot listener performance that mentions limiting to no more than 100 fields per document… if that’s important for your use, this check could easily be changed to one that counts the number of fields on the document.
  • If the index can still accept more entries, there’s nothing more to do
  • If the index should now be considered full we need to create a brand new index with the exact same meta information, and then block the current index from accepting more entries

Testing

In our teardown helper, we’re going to add the following line to the very beginning to ensure we have a clean database between runs. I missed this easy method the first time around, and had only found means that would trigger cloud functions to run… which doesn’t help in the slightest.

await firebase.clearFirestoreData({ projectId: generateProjectId() });

Additionally, we want to ensure we’re running our tests serially since they’re all backed by the same underlying database. All we need to do there is add the --runInBand option to our jest commands.

createIndicesForHomestead

beforeAll(async () => {
setUseRealProjectId();
await setup(USER_ID, {
[documentPath(COLLECTIONS.USERS, USER_ID)]: {
displayName: 'user',
},
[documentPath(COLLECTIONS.HOMESTEADS, HOMESTEAD_ID)]: {
name: 'homestead',
owner: USER_ID,
},
});
db = getAdminApp();
return waitForCloudFunctionExecution();
});

The first test case ensures we’re only creating the number of indices we actually set out to create.

test('only creates the needed indices', async () => {
const indexQuery = await db
.collection(COLLECTIONS.INDICES)
.where('parentId', '==', HOMESTEAD_ID)
.get();
expect(indexQuery.size).toEqual(1);
});

The second test case ensures the index meta information is what we expect it to be. I’ve decided to lean on the actual database query to do the heavy lifting, as that’s how client-side code will gather the data.

test('creates a plant variety index', async () => {
const plantVarietyIndexQuery = await db
.collection(COLLECTIONS.INDICES)
.where('parentId', '==', HOMESTEAD_ID)
.where('indexName', '==', 'plant-varieties')
.where('isFull', '==', false)
.get();
expect(plantVarietyIndexQuery.size).toEqual(1);
});

updatePlantVarietyIndex

beforeAll(async () => {
setUseRealProjectId();
await setup(USER_ID, {
[documentPath(COLLECTIONS.INDICES, INDEX_ID_ROOT)]: {
indexName: 'plant-varieties',
parentId: 'root',
isFull: false,
},
[documentPath(COLLECTIONS.INDICES, INDEX_ID_HOMESTEAD)]: {
indexName: 'plant-varieties',
parentId: HOMESTEAD_ID,
isFull: false,
},
});
db = getAdminApp();
});

The first test case makes sure an index entry is created properly.

test('creates a new index', async () => {
await db
.collection(COLLECTIONS.PLANT_VARIETIES)
.doc(PLANT_VARIETY_ID_1)
.set({
commonName: 'Name',
variety: 'Variety',
});
await waitForCloudFunctionExecution(); const plantVarietyIndexDocument = await db
.collection(COLLECTIONS.INDICES)
.doc(INDEX_ID_ROOT)
.get();
expect(plantVarietyIndexDocument.get(PLANT_VARIETY_ID_1))
.toMatchInlineSnapshot(`
Object {
"n": "Name, Variety",
}
`);
});

The second test case makes sure an index entry is updated properly.

test('updates an index', async () => {
await db
.collection(COLLECTIONS.PLANT_VARIETIES)
.doc(PLANT_VARIETY_ID_1)
.update({
commonName: 'New Name',
variety: 'New Variety',
});
await waitForCloudFunctionExecution(); const plantVarietyIndexDocument = await db
.collection(COLLECTIONS.INDICES)
.doc(INDEX_ID_ROOT)
.get();
expect(plantVarietyIndexDocument.get(PLANT_VARIETY_ID_1))
.toMatchInlineSnapshot(`
Object {
"n": "New Name, New Variety",
}
`);
});

The third test case makes sure an index entry is removed properly.

test('removes an index', async () => {
await db
.collection(COLLECTIONS.PLANT_VARIETIES)
.doc(PLANT_VARIETY_ID_1)
.delete();
await waitForCloudFunctionExecution(); const plantVarietyIndexDocument = await db
.collection(COLLECTIONS.INDICES)
.doc(INDEX_ID_ROOT)
.get();
expect(plantVarietyIndexDocument.get(PLANT_VARIETY_ID_1)).toBeUndefined();
});

There are an additional three tests that use near identical logic to test updates to homestead indices. I won’t include them here, but they will be included in the sample repository.

updateIndexMeta

beforeAll(async () => {
setUseRealProjectId();
await setup(USER_ID, {
[documentPath(COLLECTIONS.INDICES, INDEX)]: {
indexName: 'index',
isFull: false,
parentId: 'root',
},
});
db = getAdminApp();
});

The first test suite ensures the index isn’t updated if there is still room to accept more entries.

test('does nothing if the index has space', async () => {
const indexRef = db.collection(COLLECTIONS.INDICES).doc(INDEX);
await indexRef.update({ [generateId()]: { data: 'any' } });
await waitForCloudFunctionExecution(); const index = await indexRef.get();
expect(index.get('isFull')).toBe(false);
const indexQuery = await db
.collection(COLLECTIONS.INDICES)
.where('parentId', '==', 'root')
.where('indexName', '==', 'index')
.get();
expect(indexQuery.size).toEqual(1);
});

The second test is more of a placeholder for now, as I found no way to actually test the document becoming too large. Any attempt to throw a lot of data at it choked the emulator. The firestore-size module can’t be mocked as if this were a traditionally test, as the code running in the emulator has been pre-compiled and is running on a separate process. This is likely how you could test that behavior if it actually worked.

// unable to test this at the moment, as the large dataset kills the function emulator
test.skip('creates a new index when full', async () => {
const indexRef = db.collection(COLLECTIONS.INDICES).doc(INDEX);
await indexRef.update({
[generateId()]: { data: 'x'.repeat(1 * 1024 * 1024) },
});
await waitForCloudFunctionExecution(); const index = await indexRef.get();
expect(index.get('isFull')).toBe(true);
const newIndexQuery = await db
.collection(COLLECTIONS.INDICES)
.where('parentId', '==', 'root')
.where('indexName', '==', 'index')
.where('isFull', '==', false)
.get();
expect(newIndexQuery.size).toEqual(1);
});

Initial client-side implementation

There are a few gotchas right now (the index meta fields are still visible, for one), but in the next article we’ll dive into fleshing out a UI that can query, search, and filter the data we’ve prepared here.

Recap

As always, here’s a link to the entire project. Enjoy your experimentation with Cloud Firestore and large lists of data!

The Startup

Medium's largest active publication, followed by +568K people. Follow to join our community.

Dana Hartweg

Written by

Senior Front End Software Engineer, InVision Studio

The Startup

Medium's largest active publication, followed by +568K people. Follow to join our community.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade