Analyzing TimeHop’s use of DynamoDB

Asif Ali
5 min readApr 3, 2015

--

You surely know that Amazon Cloud services has become the object of fascination for this generation’s CTOs and DevOps in SF when you can hear people talking about them on the side walk, at the restaurants and most public bathrooms.

So lets for a moment take a step back and analyze the phenomenon that is Amazon Cloud Services. It is already one of the fastest growing software businesses in history at $5+B (estimated) in revenues.

Amazon Revenue Ramp up from a Bloomberg Businessweek article

Dynamo and Redshift have especially become suddenly prominent with widespread adoption.

The most logical argument is startups don’t have to manage the infrastructure and they can focus on the core functionality of app. This is a valid argument. The other benefit is the ability to scale when necessary and scale back when not.

I am totally bought by these arguments however, I am not convinced that Amazon Cloud Services represent value to startups in the long term. It can be extremely expensive, non performant (for certain kinds of apps) and can create a vendor lockin problem requiring you to be stuck with Amazon for life without an easy way out.

I am also afraid that Amazon’s cloud is making software technologies less competent and people are deciding based on what they are comfortable with rather than what is the best solution for the problem.

Plenty of VC money means no one cares how much they are burning until of course they run out of money an realize that they were spending $500k on Amazon each month.

Many startups talk about how they stored large amounts of data, processed them but no one talks metrics when it comes to cost to store / compute that data. It apparently is not important anymore because, “hardware is cheap”. But it is not, Amazon can be very very expensive as startups start to scale.

To prove the point, I thought i’d analyze TimeHop’s blog and their DynamoDB implementation and costs associated with the adoption of tech.

One Year of Dynamo DB @ Timehop

Kevin Cantwell, Lead architect of Timehop writes at https://medium.com/building-timehop/one-year-of-dynamodb-at-timehop-f761d9fe5fa1

2,675,812,470

That’s the number of rows in our largest database table. Or it was, one year ago today. Since then, it’s grown to over 60 billion rows. On average, that’s roughly 160mm inserts per day.

When I think about it, that seems like quite a lot. But because the database we use is Amazon’s DynamoDB, I rarely have to think about it at all. In fact, when Timehop’s growth spiked from 5,000 users/day to more than 55,000 users/day in a span of eight weeks, it was DynamoDB that saved our butts.

A couple of points

TimeHop’s use case and the decision to use DynamoDB is not clear.

Kevin doesn’t clearly mention the reasons for DynamoDB or what other options he considered when he shifted to DynamoDB. The usage of MongoDB for their use case wasn’t apt but okay if that was their first version or proof of concept. DynamoDB was chosen because he read somewhere that DynamoDB was good. It is important when you choose a particular technology, you analyze it for its compatibility to your use case.

DynamoDB’s cost and throttling mechanisms is not suitable for writes, but the first TimeHop’s use case seems to be primarily fairly large amount of writes (when users sign up) — peaking at 275m writes at 55k signups per day. Even for 20k concurrent writes, DynamoDB is expensive (when compared to a custom implementation that involves an in-memory db).

Just for DynamoDB, and the pre-cache Redis infrastructure (not including other servers), my estimation is that they are spending over $10k a month. Add the rest of the infrastructure and that number is close to $20k-$30k (Redis cluster, UI, Image serving machines, CDN etc..)

Scaling out DynamoDB for writes can be very expensive.

There was no information on how many users will possibly sign up concurrently. For 1k concurrent signups, 5–6m items will need to be written into Redis and then subsequently batch written into DynamoDB. Since we don’t know how Timehop is able to access the data (in parts, or in full), these assumptions maybe wrong. But having 5–6m items means enabling a larger Redis cluster to be able to support that many writes. We also need enough reader / batch writers into DynamoDB processes and their compute instances which I have not considered in this chart below.

Scaling out DynamoDB for reads can be very expensive too.

200k concurrent reads (which is not that much at all for a high traffic consumer application), the costs would be $18,000 per month just for DynamoDB alone. 200k reads are possibly with just a few servers in a custom implementation using any in-memory db. But considering 60 billion items (approximately about 60TB of data @ 1k per item), I would think that all of the read / write infrastructure can be spread between 10–20 large nodes nodes which share read / write / compute and storage. This is considering the fact that there will be for about 180TB of data on SSDs with 3 factor data replication. 10–20 nodes is not much considering that such an infrastructure can easily be scaled to provide 10X more than the 200k concurrent reads that DynamoDB at $18k would provide.

TimeHop’s architecture seemed to be designed around DynamoDB rather than their use case.

The use of Redis (which is a key value store itself like Dynamo) was forced onto TimeHop thanks to the limitations that DynamoDB imposes on them. The limitations are not technical…but merely related to costs.

TimeHop has two primary use cases

  1. Highly concurrent writes when items come and on user sign ups.
  2. Highly concurrent reads when items are read

For use case (1). a messaging queuing / streaming + async writes into an in-memory db would be ideal. Bursts of traffic can be accommodated by writing messages into a resilient queue like Kafka. Messages can be asynchronously processed to be written into any in-memory cluster or database for fast lookups.

For data writes on (1) and for use case (2) RocksDB (http://rocksdb.org/) with SSDs might just be perfect.

Conclusion: By using DynamoDB, TimeHop has added needless costs, complexity and bottleneck in case they would need to scale out. It would be a lot cheaper if they implement a shared nothing, fault tolerant fast lookup based system that used SSD for storage and lookups and get the same level of functionality with a lot more throughput at a fraction of the cost.

--

--