Working around Google App Engine’s Full text search maximum rate of 250/s

Constant Oduol
Echo Mobile
Published in
3 min readMar 20, 2019

Introduction

For a while now we have been using App Engine’s search solution (I’ll refer to it as Full Text Search or FTS in short from here on) to enable our users to search through their data. It has been serving us well until recently we started hitting its upper limits. We noticed this because some of our search documents were out of sync with their corresponding entities in our Datastore.

Google imposes a limit of 250/s for document additions and deletions .This seems like a generous limit but it can easily be exceeded if you don’t control access to this resource from a central place. In certain circumstances we could have a spike of requests to FTS that exceeded the 250/s limit so that requests started failing.

Our application was not designed from the ground up to think of FTS as a limited resource.

Currently we call index.put from any point of the application and expect the document put to succeed. This is problematic because we can never really tell if we are exceeding the upper limit imposed on us.

Considering Elasticsearch

We proposed different ways of solving this problem. A radical one was to migrate completely from FTS to Elasticsearch, hoping that we could achieve higher rates there. Since the hosted Elasticsearch provided by Elastic on Google Cloud is only partially managed, scaling Elasticsearch would squarely fall on us, yikes! more devops work.

We performed some experiments on Elasticsearch to determine the maximum put rate we could achieve with one node. The results were promising. We achieved peak rates of 1200 puts/s and an average of 600 puts/s overall. Though Elasticsearch seemed promising it would take us a lot of time to migrate, hence we shelved the migration for another day.

Using Task Queues and its challenges

The other idea we considered was centralizing updates to FTS via one queue that would guarantee that we never exceed the rate of 250/s. Initially this design seemed promising and simple enough until we started prototyping it. The main problem was that by making puts to FTS asynchronous this would introduce race conditions on puts to the same document. This is a side effect of the fact that tasks added to Google Cloud task queues have no strong order guarantees — tasks are processed in roughly First In First Out (FIFO), but experiments quickly show that this is not at all strict.

A naive way of working around this would be to schedule tasks that update the same document with ETAs (estimated time of arrival) in the future to reduce the chances of an earlier put overwriting a more recent put. This however does not have very good performance and introduces unnecessary delays

Working around task queue challenges

We solved the race conditions problem by splitting out a search document into its constituent fields and creating key value store records for all the search document’s fields. In this manner there is no longer a race on the search document since the fields can be independently updated or deleted.

For a task to update a search document it acquires a lock and only releases it after an update is successful. Other tasks that are racing to update the same document have to wait until they acquire the lock.

To write to FTS, we consolidate the fields of the search doc from the individual key value stores into one document. The document is then written to FTS. This approach removes the risk of racing tasks overwriting documents and solves our initial problem of controlling access to the limited FTS resource.

--

--