Nomad Socialite was a new geo-location game, where players traveled all around the world to rack up points, catch the latest in urban events, and meet a whole community at the same time. So, catching a plan to San Fran, hitting the hot new Poké popup, and catching their rare collectible digital item can help rack up some huge points.
The challenge, however, was that from a producer perspective, it was getting hard to track analytics on all these events to hand back to the proprietors running events. Once the event size got large enough, the analytics system, (which was powered by Datastore) started to slow significantly when updating.
When you’re talking about Datastore, this type of problem comes up all the time, and is simple to identify: Contention
Review, entity contention
Simply put, Datastore contention occurs when a single entity or entity group is updated too rapidly. The datastore will queue concurrent requests to wait their turn, but if the number of edits on an entity is too high, things will eventually slow down (as the requests stack up) until time-out errors are thrown.
The most common way this limitation gets encountered is when you update an entity with every request — for example, counting the number of views to a page on your site.
Or in the case of Nomad Socialite: Updating the analytics data for an event meant keeping a running count of the number of users logged in, and all their related data sets.
Now, when this happens, there’s lots of ways to address it, from a code perspective (I’ll leave you to this document for some of the more common ones).
From my perspective though, there’s only one good solution : Sharding.
Sharding to the rescue
As we’ve discussed before, Datastore runs on top of Bigtable, and thus is at the mercy of it’s performance characteristics.
If you need to update an entity at a higher rate than Bigtable permits, then you can use replication to accomplish your task.
Sharding effectively duplicates an entity and uses some manual load balancing to write updates to the clones randomly instead of all to the same item. The result is that each clone is partially updated (and the data is not the same between them). When you read, later, fetch all the copies and sum them together on the client. Using this strategy, you would store N copies of the same entity allowing N times higher rate of writes than supported by a single entity.
To prove this, we wrote a small test and then did a 60 minute Load test via Artillery in order to simulate users. In the Google Cloud console, we can see the execution time looks like for trying to update a single entity counter for users.
We can also see that in our sharded version, performance is drastically better:
Notice that the execution time drops considerably since there’s less contention on the entities themselves.