Should I query my Firebase database directly, or use Cloud Functions?

Just about every app you use has to query a database and show the results on screen. Firebase makes this easy by providing SDKs for apps to directly read and write the databases provided by the platform (Realtime Database and Cloud Firestore). But there are some situations when you want to route all requests through a server side component, such as Cloud Functions, to manage the query. But how do you make that decision? When is it better to directly query from the client, and when is it better to route the request through a backend? There’s no right or wrong answer here, so let’s weigh each option by the properties that matter the most.

In this article, I’ll use the word “direct” to talk about database access using the Firebase SDKs that query a database without going through a backend. And I’ll use the word “indirect” for access going through Cloud Functions, or some other backend you control.

Here’s an example of direct access to Cloud Firestore from a web client using JavaScript. It’s simply requesting all documents in a collection, sorted by a field with a timestamp, limited to the first 100 results. The returned snapshot object contains all the query results, ready to use:

const firestore = firebase.firestore()
const snapshot = await firestore
.collection("posts")
.orderBy("lastModified")
.limit(100)
.get()

And here’s an example of indirect access, via an HTTP type Cloud Function also written in JavaScript. It’s almost exactly the same, except now the client has to invoke it via an HTTP request, and process all the results returned in JSON format:

const admin = require("firebase-admin")
const firestore = admin.firestore()
exports.getLatestPosts =
functions.https.onRequest(async (req, res) => {
const snapshot = await firestore
.collection("posts")
.orderBy("lastModified", "desc")
.limit(100)
.get()
res.send(snapshot.docs.map(doc => doc.data()))
})

I’ll compare these options by some important characteristics.

Performance

Of course, everyone wants their database access to be fast. This might be the first factor to consider when choosing between direct or indirect access. It’s intuitive that direct access is usually going to be faster than indirect access. Here’s why.

Local caching

The Firebase SDKs provide a local persistence layer (cache) that stores query results for future use. When a client app issues a query, if the SDK determines that the cache contains up-to-date results for that query, the results can come directly from the cache. The obvious benefit here is that network bandwidth and latency are reduced. The results appear faster, even while offline, and the end user pays less data costs to see those results.

It’s worth noting that the local cache is not always available on the mobile client, and sometimes it might have to be explicitly enabled. Be sure to check the documentation for your database (Realtime Database or Cloud Firestore) for your specific mobile platform to understand the requirements and limitations.

If you make the request via Cloud Functions, there is absolutely no client-side caching done by default. If you want to cache the results, you’ll have to do that on the client, using some mechanism you choose. For example, Android apps might choose to use Room to persist query results. You’ll have to write and test all the code to make sure it works. You’ll also have to figure out if and when cached query results become stale.

The case is similar for database writes. If you write a document using the SDK while the client is offline, the write will be persisted locally, then automatically synchronized later when connectivity returns. However, if you write via a call to Cloud Functions, the HTTP connection will obviously fail fast while offline, and the client will have to retry as needed.

Performance (and convenience) wins obviously goes to the Firebase client SDK. But there is one case to be aware of when performance can be poor for the local cache. If the SDK’s local cache becomes very large, and a complex query has to sort through thousands of records in order to satisfy the query, the cost of doing that on the client might become worse than the same query executed on the server. I would expect that most apps won’t run into this situation, but you should be aware that it can occur. It’s good to measure the performance of your queries, and you can do that in production, on your users’ devices, with Firebase Performance Monitoring.

Payload size

One important behavior for Firebase databases, when accessed from the client SDKs, is the fact that if you read any node or document, the client always transfers the entire contents of that entity. The client SDKs don’t support the limiting of child nodes or document fields in the response, which is sometimes called a “projection” in SQL. To work around this constraint, sometimes developers will structure their database to support the minimal transfer of data for a query. Sometimes this involves the duplication of data in various places, which is common for NoSQL type databases.

Imagine you have a collection for blog posts in a collection called posts. Notice it has a field for text that could be very long.

If you queried this collection on the client to display a list of posts that match some criteria, the client would necessarily download the entire blog post just to satisfy that query. So, in order to speed things up, you could move the large text field into a separate collection, “posts-text”:

Now, queries against “posts” will execute faster on the client, and the document with the text of the post can be fetched only as needed.

However, if you’ve already committed to a schema that structures documents that cause performance problems when queried, and you can’t change it, using a Cloud Function might be the best choice. The function could perform the query using its fast connection to the database, extract only the necessary fields for display, and send the minimal results to the client.

There is not just one correct way to decide whether or not to use direct or indirect access for performance reasons. You’ll need to weigh (and hopefully benchmark!) your options to figure out what’s best.

Price

The overall pricing for both Firebase databases (Cloud Firestore, Realtime Database) is tied (partly) to how much data your app reads. As mentioned previously, the Firebase SDK’s local cache can prevent many data reads from happening on the server. Data coming from cache, unchanged on the server, prevents the cost of the query and the cost of the bandwidth to bring that data to the app.

If you query indirectly through a Cloud Function, you will pay for the cost of the query in addition to the cost of the execution of the function. The server SDKs you use in Cloud Functions do not cache data, so each execution pays the full cost of the query, and its data transfer. Some developers may opt to implement a custom caching layer in memory or in another Google Cloud product (such as Cloud Memorystore) in order to reduce costs.

Security and permissions

Both Firebase databases provide a way for you to control access to data coming from apps using security rules (Cloud Firestore, Realtime Database). Implementing these rules correctly and comprehensively for your app is crucial to its security. But this only works for traffic originating from the provided client SDKs (and the REST APIs, when provided with a Firebase Authentication token or no authentication mechanism).

However, when querying indirectly through Cloud Functions, the client SDKs can’t be used. You’re required to use the Firebase Admin SDK, or one of the other server SDKs. These SDKs are initialized using a service account instead of an end user Firebase Authentication account. Queries from the server SDKs are considered “privileged” and completely bypass all security rules. So, if you need to control the data coming in and out of your code deployed to Cloud Functions, you’ll have to write that logic separately from your security rules. (Note that Realtime Database has a provision for initializing the Admin SDK with a given Firebase Authentication UID, which then limits its access using the security rules that apply to that UID. No equivalent feature is currently provided for Cloud Firestore.)

Firebase security rules are great for limiting direct client access, and you should definitely start there for security. You’ll discover that security rules are backed by a special expression language, which is not a full programming language. In order to promote speed and security, there are limitations to what you can do. If you run into one of these limitations, you might have to route client access through Cloud Functions in order to perform whatever checks are necessary to allow a particular read or write operation. For example, if you want to implement strict rate limits for queries, you would have to use Cloud Functions for that, and force clients to call the function instead of using direct access. Or, if clients should never be able to read certain fields in a document, a function could filter out that data before it reaches the caller, similar to the earlier example.

I’ll also mention that if security rules get you most of the way to your security requirements for a database write, you could also use a Cloud Function to implement a database trigger (Cloud Firestore, Realtime Database) to run after a database write completes, and perform further checks there as needed. If the data found isn’t acceptable, you could then simply delete it or move it somewhere out of the way for auditing. Just bear in mind that the data will still exist in its original written location for a brief period of time.

Realtime data

Firebase databases have a very special feature that lets you listen for realtime updates to data (Cloud Firestore, Realtime Database). So, if a client is interested in a particular location in the database, it can attach a listener at that location, and the listener will receive callbacks whenever the underlying data changes. This also works with queries that receive multiple child nodes or documents — if the results of the query would change over time, those deltas are also received by the listener. When no more updates are needed, the client just removes the listener. For example, with Cloud Firestore, you can attach a listener to the query at the start of this post by using onSnapshot() instead of get():

const firestore = firebase.firestore()
const unsubscribe = firestore
.collection("posts")
.orderBy("lastModified")
.limit(100)
.onSnapshot(querySnapshot => {
// this gets called whenever the results
// of the query change over time
})

This ability to receive realtime updates works great in client app code, but is almost always not appropriate for code deployed to Cloud Functions. Functions need to run quickly and get their work done in a finite amount of time. There are very few good use cases for adding a persistent listener to the database inside a function. For HTTP type functions, there is also no streaming of results back to the caller (also known as HTTP chunked transfer encoding). The entire HTTP response is delivered as a unit, and if the response isn’t delivered before the function’s configured timeout, the connection is closed. So, realtime data is not really possible withCloud Functions.

Exposing a public API

If you need to build a public API for your data in Firebase (such as Hacker News did), you could allow global read access to your data via security rules, and ask developers to use a Firebase client SDK. This makes a lot of sense if you intend to expose very “live” data, as the client APIs make it easy as access that realtime data, as described in the last section.

It gets tricky, however, if there is no client SDK that supports realtime listeners for your client’s platform. You can send people to the REST API, but streaming realtime updates is only supported by Realtime Database. Cloud Firestore has a REST API, but there’s no support for streaming.

It’s possible that the product’s REST APIs won’t work well for your case, or you want to provide something easier for clients to consume. In that case, you’ll definitely want to look into building your own API with Cloud Functions. This is actually quite common. Of course, you’ll be paying for your usage of both products, but there is one huge optimization you can apply here. Cloud Functions integrates with Firebase Hosting which you can use as an edge-caching proxy. What you do here is effectively direct web requests to Firebase Hosting, which checks to see if it already has a response in its cache. It will either serve previously-cached content, or forward the request to Cloud Functions.

You can read more about the integration between Cloud Functions and Firebase Hosting in the documentation.

So, which should you choose?

It’s impossible to say without knowing your specific use case! It’ll take some benchmarking, cost estimation, and an understanding of your requirements in order to select the best option. It could also be a matter of preference, for example, if you’d rather hide the details of a series of database queries behind a more simple web API. If you’re looking for a conversation to help hash these things out, I’ll recommend posting to the Firebase Google Group firebase-talk or the Firebase subreddit. The Firebase community is very active and helpful!