Why we churned from Algolia (It’s not for everyone)

A square peg in a round hole

Published in

Equify | {tech:’blog’}

8 min readFeb 6, 2022

I should start by saying that I un-ironically really love Algolia, the product is great and the doc is almost perfect. I will still recommend and use it for projects where it makes sense. This article is about our experience with Algolia, and aims at guiding anyone considering this solution, helping you understand the pros and cons based on your use case. Finally, if you are looking for an “Elastic VS Algolia” article, this is not it, sorry. With that out of the way, let’s get to the meat of it!

Equify is a SaaS that lets you manage your cap table and shareholders, it has a lot of filterable and searchable views across 15 indices. Views include full-page tables with facets, simple lists, global search input across multiple indices, and searchable select / multi-select input. We were using Algolia pretty much everywhere that involved a list of entities.

Unfortunately for us, we realized that Algolia is not the right tool for the job. It has a lot of features we don’t need (eg. AI, analytics, synonyms…), it has no sorting capabilities (technically possible but requires a lot of effort), it has very limited filtering which introduced a lot of complexity on our engineering side, and finally, pricing is way too high when your business does not revolve around search.

This sounds bad? Surprisingly enough, those are not the reasons we decided to churn, instead, the one thing that practically forced us to find another solution is the indexation time. Performance of indexation is way too slow for a decent SaaS. I am not blaming the product, this choice can make perfect sense, but not for us.

Numbers don’t lie

This section will walk you through our process of analyzing the situation we had in an objective manner. It might give you an idea of what to expect if your situation is comparable to ours, but I’d take this with a grain of salt.

When you create an entity in your database, let’s say an article, you have to index it in Algolia for it to be searchable. Algolia does not hit your database directly, it has a copy of only the relevant data and uses that copy to return results. This copy is in the form of indices, indices contain all the data that can be used to search or filter your records, it is your job to keep them in sync with your database.

Algolia works with “operations”, an operation is a unit of work that alters an index, basically creating, updating, or deleting a record in an index is an operation. You can send a “create” operation with all the relevant data about the user, Algolia then builds the index, and the freshly created article is now searchable on your app.

Note that after the entity has been saved to your database, and before Algolia is done building the index, the entity is not searchable, it cannot be returned as part of the results. You clearly want to minimize this time if the user expects to see what it just created.

What exactly is too slow? There are two things we can measure: how long does it take for a single operation to execute, and how long is it when you batch them. What the user sees is the latter: the user does something on the app, everything that should be indexed is batched and sent to Algolia, and once the batch is executed the user sees the results. Of course, batching is very important for performance, the more you batch operations together, the faster it is per operation since Algolia has to build an index only once even if multiple records were created.

We logged our data directly from the production environment, over a few days, to have something as close as possible to what our users actually experience. We were interested in the time it took for a batch to execute. Let’s plot a histogram to have a feel of the distribution:

We get a bell curve with a mean of about 20 seconds. The vertical axis is the number of batches that fell in a given 2 seconds wide bucket. We can plot the cumulative histogram to see the percentiles:

Cumulative histogram of the indexation time per batch

That means that half of our users experienced a loading time of more than 22.46 seconds. And this is an optimistic number since in reality batches are executed in series, we only send a batch when the previous is finished, allowing us to accumulate more operations for the following batch. And yes we only measured the time between the moment we send the batch to Algolia and the moment Algolia tells us the batch is executed.

This gives us the following percentiles

This is the closest you can get to what the users see in the end, but to give more context to the reader we decided to include a few graphs about batch size. Here is the number of operations per batch on the same data sample:

Histogram of the number of operation per batch

So nothing extraordinary, 50th percentile is 4 operations per batch, and 90th is 32 operations. We can safely assume that we were not overworking the system. Next, we can divide the time it took for a batch to execute by the number of operations in said batch. We get the following results:

Histogram of the indexation time per operation

Cumulative histogram of the indexation time per operation

We end up with a classic logarithmic distribution where in theory half of the operations take more than 4 seconds to execute. Remember that this has no correlation to what the users actually experience on the app, batch indexation time is the key metric.

Defining a target

SaaS are meant to be highly interactive, and loading screens are probably what users hate the most after bugs. For us, indexation time was by far the number one factor that impacted loading time, which resulted in a bad user experience. Can you imagine waiting thirty seconds for that “thing” you just created to show up? Or even worst, still seeing something you just deleted just because it has not been de-indexed yet?

user: *deletes entity*
app: *still shows entity*
user: “Have I really deleted it?”
user: *clicks delete again*
app: “This entity does not exist”
user: *hits refresh*
app: *shows entity*
user: “wtff??”
app: *finally removes entity*
user: “…”

There is no perfect solution, it is a balance between indexation time and search speed. Depending on your business it might also be a balance between the number of searches and the number of indexations, or it could be “how critical are those two factors to your bottom line?”.

Like everything else, search and indexation speeds obey the law of diminishing returns, it is important to have that in mind when comparing solutions. Bellow a given threshold (~150ms for us) it becomes unnecessary to improve search response time. We’d rather lower the indexation time to something more reasonable.

Algolia claims to have a 99th percentile of 17ms, way below our acceptable threshold, but at a cost of slow indexation times as we saw. If you are building an e-commerce site, then Algolia makes perfect sense, apparently, every 100ms of latency cost Amazon 1% in sales. But if that is not the case, maybe find a solution that fits your needs better.

Finding a better fit

When we started to look for other solutions, Elastic was an obvious candidate: open-source, engineers can work locally, more complex filtering is possible, sorting is possible, and we could dial in the perfect settings to have a better compromise between search and indexation. But we wanted to stay open and consider any solution that matched our 150ms target. The 100% SQL solution was mentioned with very appealing promises: no indexation at all, no risk of sync issues, and pretty much free. On paper, better than Algolia or Elastic, we had to test it…

It was extremely easy to implement, and we would end up deleting a lot of code dedicated to managing and optimizing indexation. We suddenly were able to sort our results however we wanted (this is not a thing when you use Algolia), and a few days later we had our first search endpoints ready.

On the downside, search is definitely not as good as Algolia regarding typo tolerance, but for our business users are not searching through huge lists of objects and they usually know exactly what they are looking for. Fuzzy search is not as important as it is for Amazon or Google for instance, once again this largely depends on your business.

Postgres has a few very useful functions for fuzzy matching that worked just fine for us. We also do some extra filtering and faceting in the code for things that we cannot do in SQL.

Conclusion

Overall the SQL solution is the best fit for us. Definitely an order of magnitude slower than Algolia on search but still below the threshold we fixed to ourselves. And Indexation is just nonexistent compared to the 30 seconds at the 75th percentile for Algolia.

The simplicity that we gain is also an enormous advantage: no code is good code, it never fails, is easy to maintain, and easy to explain to new hires. We do not depend on a third party that can change their pricing (last one was very spicy 🌶), and developer experience is way better when you do not need an internet connexion, a third party account, and to re-index everything every time you seed your database.

User experience has improved drastically due to the much shorter loading screens. We are very happy with the results and would recommend the switch to anyone who is in a similar situation: if search speed is not a top priority it is not worth the investment and the indexation cost.

When should you use Algolia, or Elastic for that matter? When the quality of the search really matters, when your bottom line depends on the user's ability to find what they are looking for (e-commerce, marketplace), when users search through an unknown/uncontrolled list of entities (they don’t know exactly what to type and what to expect), when users do not expect to interact and instantly impact the list of results (blog, documentation).