Model your Cloud Firestore Database The Right Way: A Chat Application Case Study

15 min readJan 18, 2024

I’m a frequent user of Firebase, and when it comes to building with Backend as a Service (BaaS), Firebase is always my top choice. For a while, I have always wanted to try to use Firebase services for a Chat app but my biggest problem has always been how to structure data on Cloud Firestore.

If you have ever built something more complex than a todo app or any app without just basic read-write operations using Firestore, chances are you have struggled with modelling your data right. Maybe you’re worried that the structure won’t scale.

To be honest, I don’t think there’s one right way to organise your data, for Firestore it depends on your use case. That’s right, the way you structure your data depends on your requirements.

One thing I have come to know and ultimately accepted is, that there are always going to be trade-offs when using Firestore, you’re just supposed to pick the best one and work with it. With that in mind, you can be at peace with yourself when in doubt.

Some things to consider or take note of…

Before we begin, I believe you know that Cloud Firestore is a NoSQL database and doesn’t support traditional relational data models. Applications that are heavily reliant on complex relationships between entities might find Firestore less suitable.
Another thing to note about NoSQL databases like Firestore is that data duplication is a norm and shouldn’t be frowned upon but it can impact storage costs and increase the complexity of keeping data in sync. While data duplication (a form of denormalization) is a common practice in NoSQL databases, it’s essential to balance it to avoid unnecessary redundancy carefully.
Designing an effective data model in Firestore requires a deep understanding of your application’s use cases. Choosing the right balance between normalization and denormalization can be challenging and might require iteration as your application evolves.
Reading the docs. Just do it and thank me later 🙂

All I have stated above is not meant to scare you, they are rather important things to know before you begin using Firestore and some considerations I made in my use case, so take note of these points.

Database Structure

For the database structure of our chat application, we need to consider the main feature of our app. For the sake of this article, our chat app supports person-to-person conversations and group conversations we have to design our database to accommodate that. Another thing to accommodate would be our user information.

Some of the structures I’m going to be talking about are utilized in a Flutter chat application I built, you can check that out. Please Star the repo if you find it helpful.

GitHub - maykhid/min_chat

Contribute to maykhid/min_chat development by creating an account on GitHub.

github.com

Users Collection

In a typical chat application, we would surely have authentication. I do not know how you would prefer to handle the authentication (use Firebase authentication😅) but regardless we have to have the user in our db. The structure can look like this:

users (collection)
  |_document0
      |_ email
         uid
         imageUrl
         name
  |_document1
      |_ email
         uid
         imageUrl
         name
    ...

We have a top-level collection (users) encompassing documents; each document has our user information. Of course, you can have more user information in there but in the spirit of keeping things simple, I’m going with that. The user document does not need to be complex, I would advise against having secondary user data relating to things like their groups, their favourites, users they converse with or something similar. I wouldn't want all that information there, there are better ways to handle cases like that as you’ll soon see.

Your Users’ collection should look like this on Firestore.

Conversations Collection

This collection contains all our person-to-person (p2p) conversations. How we structure this depends on your use case but I'm going to take a general consideration here which is common for most apps.

An app like Whatsapp which is similar to what we have below.

As you can see in that list of conversations we need some information about our recipients like their image and name and other information like the last message, who sent it, and the time the last message was sent.

With that in mind, our collection could be structured like so:

conversations (collection)
  |_document0
      |_ documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participants (list of map containing both participants info)
         participantIds (list of participant ids)
  |_document1
      |_ documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participants (list of map containing both participants info)
         participantIds (list of participant ids)
    ...

With this structure, we can get the information we need. We have a top-level collection (conversations) encompassing documents; each document has information about a particular conversation.

Right off, you can notice some duplication we have in our conversation documents which are the participants’ data. A downside to this obviously, is that when the user updates their data you would need to update all occurrences of that data. But this, in my opinion, is an okay trade-off, how many times does a typical user update their information? Not every time at least not every day. Updating all of that data also wouldn’t be much of a problem when you employ the help of cloud functions.

A map is okay to use within a document if you want to fetch some data alongside the parent document.

But an advantage of that duplication is that we get the user data instantly when we need it as opposed to making a firestore query to fetch them.

The same logic applies to the map of lastMessage if we update the lastMessage field whenever a new message is added we can fetch that instantly also.

Your conversation collection should look like this on firestore:

Showing Conversations collection firestore.

Querying the collections to get the conversations of the current user would be as simple as checking the participantsIds array and getting all documents where the user ID is included.

// snapshot (stream) of conversation
_firebaseFirestore
          .collection('conversations')
          .where(
            'participantsIds',
            arrayContains: userId,
          )
          .orderBy('lastUpdatedAt', descending: true)
          .snapshots()

Notice how we order by using the lastUpdatedAt field setting descending to true, this makes sure that the most recent conversation is always at the top.

Conversation — Messages

For messages of a particular conversation, you can add those in the conversation as a subcollection (messages). Each document will be a chat message with all the necessary information about the message.

messages (subcollection of conversations)
  |_document0
      |_ message (string of actual message)
         senderId (sender user id)
         recipientId (recipient user id)
         status (message sent status)
         timestamp (message time sent)
  |_document1
      |_ message (string of actual message)
         senderId (sender user id)
         recipientId (recipient user id)
         status (message sent status)
         timestamp (message time sent)
    ...

The above information would suffice to show the time when the message was sent and arrange the chat bubble depending on who sent the message.

We have a sub-collection (messages) encompassing documents; each document has information about a particular message.

Your conversation — messages subcollection should look like this on firestore:

Getting all messages is pretty straightforward for a particular conversation, we also order by timestamp here.


final docId = {the document id} // the id we saved in the documentId field

// snapshots (stream) of messages
_firebaseFirestore
          .collection('conversations')
          .doc(docId)
          .collection('messages')
          .orderBy('timestamp')
          .snapshots()

Group Conversation

Designing your db for a group conversation can be tricky, but having an idea of what’s necessary for your group conversation and what’s not can help you better structure your db.

I built a Flutter app with group conversation, I just wanted the group to send messages to multiple users within a group. Some assumptions for the app were:

Users couldn’t change their information, just use the data obtained from authenticating with Google (I used Google auth).
A basic group that can send messages like typical group chats.
Displaying the list of users in the group wasn’t necessary.

With all these in mind, the way I structured the data for that particular app was quite similar to the p2p conversation from earlier with minor differences:

group-conversations (collection)
  |_document0
      |_ groupname (name of group)
         documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participantIds (list of participant ids)
  |_document1
      |_ groupname (name of group)
         documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participantIds (list of participant ids)

Notice we introduced a group name in the document and removed the list of map participants. Now, you might be wondering why I removed the participants, well hang on I'd be explaining that soon.

Fetching the data group conversation is similar to our p2p conversation:

_firebaseFirestore
          .collection('group-conversations')
          .where(
            'participantsIds',
            arrayContains: userId,
          )
          .orderBy('lastUpdatedAt', descending: true)
          .snapshots()

My messages subcollection is still similar to messages from my p2p conversation, with some differences:

messages (subcollection of group-conversations)
  |_document0
      |_ message (string of actual message)
         senderId (sender user id)
         senderInfo (map of sender user details)
         status (message sent status)
         timestamp (message time sent)
  |_document1
      |_ message (string of actual message)
         senderId (sender user id)
         senderInfo (map of sender user details)
         status (message sent status)
         timestamp (message time sent)
    ...

Here I removed the recipient ID because, obviously we aren’t just sending a message to a single person this time around, I added sender info data to messages to lazily help us get the user (sender) information of a particular message.

Image showing chat bubble of a group chat with sender name. — Chat bubble with sender information (name)

You can see the downsides to this approach pretty quickly data duplication at its peak! If a user updates their info I would need to update every piece of that user information for every message they sent. I mean it’s doable but come on. But this works for me and my use case because for this app, like I said earlier, a user cannot update their information. So this works.

Fetching the messages subcollection from our group conversation collection is similar to how we got our p2p conversation messages subcollection:

final docId = {the document id} // the id we saved in the documentId field
_firebaseFirestore
          .collection('group-conversations')
          .doc(docId)
          .collection('messages')
          .orderBy('timestamp')
          .snapshots()

What if we wanted the real deal from our group conversation?

What if we wanted to view the list of users? What if the users are allowed to update their information? What if the users can be given roles just as regular group chats have e.g. roles like Admin?

How would we structure the data in our db?

Option 1

We could try a similar approach to our regular P2P conversation.

group-conversations (collection)
  |_document0
      |_ groupImage (image url)
         groupname (name of group)
         documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participants (list of map containing both participants info)
         participantIds (list of participant ids)
  |_document1
      |_ groupImage (image url)
         groupname (name of group)
         documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participants (list of map containing both participants info)
         participantIds (list of participant ids)

And also adding a subcollection of messages in the group conversation collection:

messages (subcollection of group-conversations)
  |_document0
      |_ message (string of actual message)
         senderId (sender user id)
         recipientId (recipient user id)
         status (message sent status)
         timestamp (message time sent)
  |_document1
      |_ message (string of actual message)
         senderId (sender user id)
         recipientId (recipient user id)
         status (message sent status)
         timestamp (message time sent)
    ...

This could work but does it scale? Let’s take into consideration a typical chat app with group chat.

A typical group conversation could need a group photo url, a group name and some of the regular fields we had in our p2p conversation document like last message, initiatedBy, initiatedAt, participants etc. This could be how we structure our group chat because all the information we need, we have. Say we wanted to show the user that sent the last message.

Showing the last message sender name and message

So, what we do is check the ID of the person who sent the last message. We go through our list of participants, and when we find a match with that ID, we grab their name. Then, we display their name as the sender of the last message.

This trick works not only for the last message but also for any message in our group chat. It might sound a bit technical, but it’s doable.

Displaying a list of participants in a group conversation would also be easy because we have a map of participants within our group conversation.

Updating a participant’s, say for example, imageUrl in a list of participants however is a bit tricky. You’d have to fetch the list of participants (list of user maps), find the participant you want to update, update it and then update the whole list.

Even assigning roles to the users would be done the same way. But you should know having the roles of users available on the client side is not advisable.

// sample of user data with role
{
  "name": "Steve Jobs",
  "id": "89xctyrpec4ty",
  "email": "stevejobs@apple.com"
  "photo_url": "image.com",
  "role": "Admin"
}

This all works right. We could structure our db like this and call it a day. Well, not exactly. So one thing you should know is that for Firebase, a single document size is limited to 1MB and also the document is limited to ~20,000 fields.

{
  "name": "Steve Jobs",
  "age": 60,
  "company_name": "Apple",
  "known_for": {
    "computer": "Mac",
    "phone": "IPhone",
    "other": "IPod",
   },
}

Map fields also count against the said quota of 20,000 fields. So the above counts as 7 extra fields.

So Imagine, you have a group with a lot of participants think about your document with all that user info, would it pass the test of time when more users are added? Even if it did, it wouldn’t be okay to have that much data coming from a single document in a single read. While this structure might save you potential read costs, it does not scale well. This scenario could work if you’re sure your data wouldn’t exceed those limits.

Option 2

Another approach we could try is creating a subcollection of participants within our group conversation documents.

Similar to our previous example structure for our group conversation collection, excluding the participants' list of map fields. The messages' subcollection remains pretty much the same. Alongside the messages subcollection, we now have a participants subcollection.

group-conversations

group-conversations (collection)
  |_document0
      |_ groupname (name of group)
         documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participantIds (list of participant ids)
  |_document1
      |_ groupname (name of group)
         documentId (id of this current document)
         initiatedAt (time document was created)
         initiatedBy (id of user who created document)
         lastMessage (map containing the last message and info about the message)
         lastUpdatedAt (time document was last updated)
         participantIds (list of participant ids)

messages subcollection

messages (subcollection of group-conversations)
  |_document0
      |_ message (string of actual message)
         senderId (sender user id)
         recipientId (recipient user id)
         status (message sent status)
         timestamp (message time sent)
  |_document1
      |_ message (string of actual message)
         senderId (sender user id)
         recipientId (recipient user id)
         status (message sent status)
         timestamp (message time sent)
    ...

participants subcollection

partipants (subcollection)
  |_document0
      |_ userId
         imageUrl
         email
         name
  |_document1
      |_ userId
         imageUrl
         email
         name

Now fetching, the list of participants is indeed simple, all we need to do is go fetch documents from the ‘participants’ subcollection within a specific document in the ‘group-conversations’ collection.

final docId = {the document id} // the id we saved in the documentId field

final participantsRef = await _firebaseFirestore
          .collection('group-conversations')
          .doc(docId)
          .collection('participants')
          .get()

final participantsDocs = participantsRef.docs; // list of query document snapshot

When a User who is a part of the group updates their data say imageUrl, changing all occurences can be done by querying the ‘group-conversations’ collection to find documents where the ‘participantsIds’ array contains the specified ‘id’. For each matching group conversation, it then queries the ‘participants’ subcollection to find documents where ‘userId’ is equal to the specified ‘userId’ and updates the ‘image_url’ field.

 // Reference to the 'group-conversations' collection
    final groupConversationsRef = _firebaseFirestore.collection('group-conversations');

    // Query to get documents where participantsIds array contains the specified user Id
    final querySnapshot = await groupConversationsRef.where('participantsIds', arrayContains: id).get();

    // Iterate through the group-conversation documents where the user is a participant
    for (QueryDocumentSnapshot documentSnapshot in querySnapshot.docs) {
      // Reference to a specific group-conversation document
      final groupConversationRef = groupConversationsRef.doc(documentSnapshot.id);

      // Reference to the 'participants' subcollection within the group-conversation document
      final participantsRef = groupConversationRef.collection('participants');

      // Query to get documents where user Id is equal to the specified id
      final participantQuerySnapshot = await participantsRef.where('id', isEqualTo: userId).get();

      // Iterate through the documents in the query result
      for (QueryDocumentSnapshot participantDocumentSnapshot in participantQuerySnapshot.docs) {
        // Reference to a specific document within the 'participants' subcollection
        final participantDocRef = participantsRef.doc(participantDocumentSnapshot.id);

        // Update the 'image_url' field with the newImageUrl
        await participantDocRef.update({'imageUrl': newImageUrl});
      }
    }

Disclaimer: The code snippet above just shows how to perform this task in dart, but you should really use a cloud function to do this.

And you’re done!

But there’s a catch. Notice that I can’t perform a user update with a much simpler query.

Say a UserA is a participant in 5 groups. When they update their image, the following will occur:

5 reads for getting the group conversation documents which the user is part of.
5 reads to get participant documents where user Id is equal to the specified id.
5 writes to update the user information (imageUrl) for each of those documents.

You can do a quick math to see how much that would cost you.

Option 3

One other way you can achieve this is by creating a top-level collection of group-participants. In this collection of documents, you store information like the participants' (user) details and group ids.

group-partipants (collection)
  |_document0
      |_ userId
         groupId (document id)
         imageUrl
         email
         name
  |_document1
      |_ userId
         groupId (document id)
         imageUrl
         email
         name

This way, when you want to get all users in a group you can perform a simple query like:

final groupId = {group Id}

final participantsRef = await _firebaseFirestore
          .collection('group-participants')
          .where('groupId', isEqualTo: groupId)
          .get()

final participantsDocs = participantsRef.docs; // list of query document snapshot

Whenever a user updates information like their imageUrl you can perform simple queries like:

final userId = {user Id}

final participantsRef = await _firebaseFirestore
          .collection('group-participants')
          .where('userId', isEqualTo: userId)
          .get()

final participantsDocs = participantsRef.docs; // list of query document snapshot

for (QueryDocumentSnapshot participantDocumentSnapshot in participantsDocs) {
    // Reference to a specific document 
    final participantDocRef = participantsRef.doc(participantDocumentSnapshot.id);
    // update imageUrl
    await participantDocRef.update({'imageUrl': newImageUrl});
}

Disclaimer: The code snippet above just shows you how to perform this task in dart, but you should really use a cloud function to do this.

With the UserA example we used earlier to update the user info, we perform:

5 reads to get the group participants' documents.
5 writes to update the user information (imageUrl) for each of those documents.

A quick math here tells you you have saved yourself some extra read cost💰 when compared to the previous scenario. Cheers🥂.

Oh, if you’re wondering how much cost you’re going to incur for using method one using the UserA example:

5 reads for getting the group conversation documents which the user is a part of.
5 writes to update the user information (imageUrl) since you’ll be updating the participants' field for those 5 documents.

Similar to option 3 in cost for updating user info but not as scalable or efficient.

I think the best option here is option 3.

Previously, I said adding user roles to user information is not okay. If user roles, say admin or editor, are things your app requires here are two ways to go about them.

You could create a subcollection within the group conversation documents called participants-roles there you can have data like their user id and their role in each document or if it sounds overkill you can create a subcollection with a single document with a map of key-value pair of user’s id’s and their roles. This approach enhances security, scalability, and maintainability while providing better control over user roles and access permissions.

Conclusion

By strategically leveraging Firestore’s capabilities, you can create a chat application that scales effortlessly.

If you have any questions, you can comment and I’ll be sure to respond.

Do clap and share if you find this helpful. It encourages me to share more stuff like this.

You can also reach out to me if you have questions.

Happy coding!

If you’re interested in building a chat app in Flutter, you can take a look at the one I built. All chat UI screenshots are from this app.

GitHub - maykhid/min_chat

Contribute to maykhid/min_chat development by creating an account on GitHub.

github.com

Model your Cloud Firestore Database The Right Way: A Chat Application Case Study

Some things to consider or take note of…

Database Structure

GitHub - maykhid/min_chat

Contribute to maykhid/min_chat development by creating an account on GitHub.

Users Collection

Conversations Collection

Conversation — Messages

Group Conversation

What if we wanted the real deal from our group conversation?

Option 1

Option 2

group-conversations

messages subcollection

participants subcollection

Option 3

Conclusion

GitHub - maykhid/min_chat

Contribute to maykhid/min_chat development by creating an account on GitHub.

Written by Henry Ifebunandu