Playing with Fire — Working with Firebase in production at Ithaka

Mithilesh Said
Founding Ithaka
Published in
6 min readApr 6, 2018

It is 2018. There are only two types of developers in the world right now. Those who have been burnt by Firebase and those who don’t admit it.

There is more content on the internet criticising Firebase, than there is documentation from Google. A part of me wants to take that route but then saner heads prevail. So instead of talking only about the limitations of Firebase, I am going to talk about all those times when we hit these limitations and how we hacked our way around them to save development time and cost. About the design decisions we took, the mistakes we made and what we learned from them.

First, a little context

At Ithaka, reliable realtime communication between the travellers and the wizards who help them plan their trips, is of paramount importance. As an early stage startup we needed to save time and money on building this reliability into the system. Remember, the internet was designed to work on the “pull” model, not the “push” model. This is what makes building reliable realtime applications challenging even to this day. So we started looking out for solutions that would deliver this quality at a low price. Firebase turned out to be a good option. Why, you ask?

The pros of Firebase

  • Availability of SDKs on all platforms. The wizard side application is a web interface and the traveler side application is mobile(native android and ios). Firebase has SDKs for all three.
  • Ubiquitous API on all platforms. Which meant communicating implementation details between developers was easier.
  • High reliability and support for offline availability of data meant less work on the mobile platforms for transmission and storing of data.
  • Easy to get started and try things out from the firebase console.
  • Free to certain limit (which was enough in our early days)
  • Allowed higher number of simultaneous client connections than other services
  • Schemaless

Some of these points would turn out to be double edged swords down the line. But in the beginning, it was all nice and shiny.

Burning the fingers — stage 1

The first iteration of the product that we built was such that the travellers’ mobile app and the wizards' web interface would connect directly to Firebase. We were dazzled by the three way data binding provided by firebase and angular.js. The architecture looked like this

stage 1

But we soon hit our first set of issues.

  • Limited querying capabilities meant we had to fetch more data and write extra code to process it in order to be able to do complex operations.
  • More data and processing meant the client side would slow down or even hang (more so on the wizard side because it was a many travellers to one wizard association)
  • Not having relations meant duplication of data. For example, in cases where we needed to show the groups that a traveller was a part of and all the travellers in a given group, we’d have to duplicate the data.

Our workaround:

  • We added a server side layer. Using Firebase’s node.js SDK, we implemented all the realtime listeners(which was mostly chat, back then) on the server side. This allowed us to maintain a copy of the data in a more mature database like mongodb. That allowed more complex aggregate queries and sorting operations.
stage 2

Adding the server layer didn’t fully negate the benefits of Firebase because the mobile side continued to benefit from the reliability and offline capabilities of Firebase. Our travellers come from various geographies and network conditions. The wizards on the other hand are vetted through a thorough process and generally operate under good network conditions.

Burning the fingers — stage 2

The way we implemented our chat was that on signup each traveller gets a ref in Firebase. All his messages would go under his ref. The server would then attach two listeners for each traveller. One child_added listener to listen for new messages from the traveller and one child_changed listener to listen for updates to messages (to indicate the delivered or read status). This lead to the next set of issues.

  • At thousands of users, we had thousands of listeners on the server. child_changed listener was specially problematic. Firebase’s node.js SDK does not have disk persistence. Which meant in order to figure out which messages changed, it would keep all the messages in memory on our server. This lead to memory bloat and the servers would go unresponsive. Things would get worse when node.js’ garbage collector would kick in.
  • Another example of poor querying capabilities was the child_added event. Everytime the server would be restarted, it would fetch the messages again even though they were already fetched and saved to mongodb.

Our workaround:

  • Cloud Functions to the rescue. Luckily around the same time when we were hitting these issues, Firebase launched Cloud Functions. They allow us to write code that runs on the firebase servers. As a result, all the listeners that were taking up too much RAM on our servers now moved to the firebase server. Every time a new message is sent, it invokes the corresponding cloud function which then relays that message to our servers over an HTTPS webhook. This added a small amount of latency to the message sending process, but reduced the load on our servers thereby improving uptime and user experience and reducing server cost. I’ll write a follow up post on how to utilise cloud functions to their fullest.
stage 3

Burning the fingers — stage 3

When Firebase cloud functions were launched they were in beta (and at the time of this writing, they still are). Which meant it wasn’t 100% reliable. And Murphy’s law stepped in. We saw two major outages and a few minor service disruptions within the first few months of using cloud functions. Our service would be down and there wasn’t much we could do about it till the firebase guys got cloud functions back up.

Our workaround:

  • We developed a switch on the server. On flipping this switch, the server could go from relying on cloud functions to attaching its own listeners directly to the Firebase database and vice versa. While cloud functions were down, we could continue serving our users at the cost of increased RAM consumption and then flip the switch again to start using cloud functions once they would be back.
stage 4

So far so good, what next?

Despite all its limitations, Firebase turned out to be a big plus for us. Everytime we hit a major roadblock, there was an alternate path available. Thanks to it, we saved thousands of man hours on developing reliable realtime infrastructure and focused more on releasing features which added real business value. It served us well.

However we think that we have now seriously outgrown all our reasons for using Firebase. With multiple microservices in the backend which need access to it, an increasingly complex use case and the addition of multiple interfaces has made it infeasible to continue working with Firebase.

It only makes sense that we start work on building our own realtime infrastructure now. If you are interested in working with us or know someone who is good at this kind of work ask them to drop us an email at kickass@ithaka.travel. Or if you are using Firebase and would like to discuss some approaches, DM me on twitter @MithileshSaid

Stay strong and hack along.

--

--