How to count documents in Firestore

Louis Coulet
Firebase Tips & Tricks
7 min readFeb 15, 2021

What is the problem?

When using a SQL Database, one can run this simple query to get the number of articles written by a certain author:

SELECT COUNT(*) FROM articles WHERE author_id = "123";

On the other hand, as a NoSQL database, Firestore offers no direct way to count all documents of a collection, or documents matching a certain query. When finding this out, most people are surprised that such basic functionality is not implemented.

But lament no more! Developers have come up with multiple approaches to solve this problem, each with various performance characteristics, and in this article we will list, categorize, and evaluate several solutions.

Preliminary: the data layout

Depending on how the data is laid out, we want to count the total number of documents in a collection, or only those matching a set of conditions (a query). For example:

  • if claps is a collection whose documents store a reference to the article to which the clap is intended, then we want to count the number of documents of claps where article_id is equal to a certain value.
  • if claps is a sub-collection of an article document, then we want to count the total number of documents in this sub-collection, no conditions needed.

All solutions listed below perform equally well on whole collections or on queries.

Overview

We can split the multiple solutions in the following categories:

  1. Aggregation queries: run a query that goes through all documents to count them

2. Store a counter: maintain a piece of data containing the total count

3. External: use an external service which can provide the count

4. Ad-hoc: use application-specific knowledge on data to infer the count

1. Aggregation queries

Aggregation queries rely on going through the documents of a collection or a query and counting them as they pass, similarly to a steward counting passengers in a plane. While it works, it is important to note that it is expensive (in money, time, and memory) because each count reads all documents.

1.1 Retrieve all documents at once

This solution uses get to retrieve all documents matching a query. The count is the size of the resulting snapshot:

const query = firestore.collection("fruits");
const snapshot = await query.get();
const count = snapshot.size;

Beware: this will exceed memory if the collections contains more documents than can fit in your memory! Reserve this method for small collections.

1.2 Go through documents progressively

To solve the memory problem of the previous solution, one can retrieve documents progressively in chunks of reasonable size:

By chunks with pagination using startAfter and limit
One by one with onSnapshot (client-side only, real-time)
One by one with stream (server-side only)

This approach works great, one could say any other solution is premature optimization! More seriously, let’s evaluate it:

  • it returns always up-to-date and exact results
  • it can be expensive: if the collection contains N documents, it does N reads each time a count is requested, so it costs time, money, and data transfer
  • it can handle queries easily: simply add one or more where clauses to the query
  • it can display real-time count on the client-side
  • it is straightforward to use it on any collection, even when it already contains documents

2. Store a counter

Because the previous approach can be expensive, developers proposed to store and maintain a counter. This counter can then be queried when needed, so it has the potential to reduce the number of reads to one! But not so fast…

The count can be stored in a Firestore document, but it can only sustain 1 update per second. If a higher throughput is needed, one can use distributed counters, or store the counter in the Realtime Database.

Reading the counter costs 1 read if it is a simple Firestore or Realtime Database document, and it counts S reads if the counter is distributed on S shards. Typically S is orders of magnitudes smaller than the number of documents. We will refer to the number of shards as S in the rest of the article.

As pointed out by Renaud Tarnec, an advantage of using a stored counter in Firestore is that its access can be controlled with Firestore Security Rules. In the approaches that update the counter from Cloud Functions, we can forbid write access to clients, we can also forbid read access to clients for counters intended for internal use.

2.0 Simultaneous collection and counter update

This approach works this way: when a party (client or server) creates or deletes a document, it is also responsible for updating the counter. The document creation or deletion and the counter update can be wrapped in a transaction for consistency in case of failure.

This approach is detailed in the guides. Let’s evaluate it:

  • the count can be obtained in a single read (or S reads)
  • the counter is up to date and exact
  • it has a major drawback: it disrupts the way parties interact with Firestore by forcing them to deal with an implementation detail.
  • It has another drawback: it relies on parties to play consistently well: bugs are waiting in a dark corner for a developer to forget to update the counter at some point as the application grows.

2.1 Scheduled refresh

In order to avoid paying N reads each time the count is needed, one can schedule a Cloud Function to update a counter document at specific time intervals.

With this approach the counter only provides an approximation as the actual count varies up or down. The update interval is a trade-off between accuracy and cost, as each update computes the actual count at the cost of N reads. For example, it is OK if the count of claps on an article is slightly off for a few minutes, even hours. On the other critical data or data important to the user may need more accuracy: for example the count of notifications, or the number of passengers in a plane.

Let’s evaluate it:

  • the count is obtained in a single read
  • the counter is not up to date and only eventually exact
  • it is simple and robust
  • it can be added to an existing collection easily

2.2 Incremental naive

In order to keep the counter closer to the actual value than with the previous solution, one can go another route: update the counter when documents are created or deleted using Cloud Firestore triggers.

It looks straightforward: the counter stores the actual count as it is incremented on document creation, and decremented on document deletion (thanks to the increment sentinel which performs an atomic update of the counter value).

There is a catch though, due to the triggers’ limitations:

  • a function may be invoked up to 10 seconds after the collection update so the counter is actually delayed
  • a function may be invoked multiple times for the same event, resulting in the counter being incorrect and deviating in time.

Depending on the use case, the 10 seconds trigger delay may be acceptable, because it eventually resolves. However the multiple invocation is more annoying because it skews the counter durably.

The evaluation:

  • the count is obtained in a single read
  • it may be skewed and a few seconds late
  • it deviates from the truth as time passes

2.3 Incremental naive readjusted

The approaches “incremental naive” and “scheduled refresh” can be combined in order to readjust the counter periodically to limit its deviation over time. It evaluates the same way as incremental naive, but the time deviation is under control.

2.4 Incremental debounced

In his article, Ihor Malaniuk observes that one can enforce a single invocation of the triggered function by using the event’s context which holds a unique id (eventId). I call this approach “debouncing” from the electronics and user interface term pointing to detecting and ignoring spurious triggers of an initial event.

Ihor Malaniuk solution consists in storing events indexed by eventId upon document creation or deletion, then periodically update the counter by reading stored events that are old enough (safe from duplicate calls), and then deleting processed events. In order to get the total count, parties need to sum the counter with the sum of pending events values (+1 for creation, -1 for deletion).

Here is a simpler version. It relies on the fact that create throws an exception if a document already exists at the specified path.

Upon document creation or deletion, we store a document indexed by eventId, if create throws it means that the event was already processed, otherwise it creates the document, preventing other invocations to process the event, then we can update the counter. Finally we perform periodical clean up of old events every ~300 updates (it could also be done in a scheduled function). In order to get the count, one only needs to read the counter.

Here is how it fares:

  • the count is obtained in a single read
  • it is exact but a few seconds late
  • it does not deviate in time

3. External

Depending on the use case, an external database can be used for the purpose of counting elements in a collection.

Here is an sample where I connect my Cloud Functions with a MySQL database running on Cloud SQL. There I store my documents with a unique index on the document id. After insertion or deletion in the database, I perform a COUNT query and store the result in a counter in Firestore for easy retrieval.

This approach is more complex than the others that stay within the realm of Firestore, and there are many possible variations.

How does it evaluates?

  • the count is obtained in a single read
  • the value is exact but a few seconds late
  • the complexity and cost of running an additional database is high

4. Ad-hoc

Ad-hoc solutions rely on application specific knowledge about the data to derive the number of documents or apply a specific method from above. For example (a silly one, just to illustrate), if documents are only created at a slow pace, one could use 1, 2, 3… as ids, so the count is the id of the last document.

By definition each solution will be different, so it is not possible to evaluate them all. But this approach should be considered as it may yield awesome results and performance!

Conclusion

I hope this overview was helpful, or at least interesting to read. Let me know if I have overlooked an approach, or if you come up with a new one!

--

--