Why Realtime uses Amazon DynamoDB

João Parreira
6 min readNov 29, 2014

--

From a static web to a live web

In Realtime we believe the static web is a thing of the past and we want to contribute to the necessary change. We’ve decided to create a set of easy to use cloud services that would enable this change, from a static web to a live web, a web where total interaction and engagement with users is performed in real-time, as in life.

Messaging as the basic building block

We knew that message-passing would be the underlying building block of this new live web so we started there. Leveraging on the new WebSockets standard while keeping compatibility with older browsers, we developed a global cloud message broker with SDKs for desktop and mobile, providing developers with a layer of pub/sub messaging magic.

A message published by a user in Singapore could trigger, with a latency of only a few milliseconds, a reaction in another user in New-York using the same application. And vice-versa. The goal was achieved, live engaging web applications can now be easily developed with Realtime Messaging Framework and because it runs in the cloud start-ups and big corporate companies just register, use it and pay-as-they-go, never needing to buy and manage any messaging servers whatsoever. The revolution of the web is underway.

The live web needs a live database

During the last two years we have been watching developers being creative and finding innovative ways of using our cloud message broker to power their real-time applications. From interactive digital marketing platforms, to real-time advertising networks, to engaging multi-player games, our cloud message broker has been powering all these use cases.

But one thing was becoming evident; in most cases those applications would be saving data in some database for persistence purposes and afterwards broadcast the same data through the message broker so the interested subscribers could update their user interface in real-time. These applications could benefit a lot from a database with real-time triggers.

A database where applications could subscribe to data-changing events and receive messages in real-time with the changed data, as soon as the change occurred.

This new paradigm of updated data finding its way to the users interested in the change would change for the better the way collaborative real-time applications are developed. In a nutshell it would be the holy grail of collaborative applications: simple real-time data sync among users.

We had the message-passing building block covered but we still needed to find a scalable database, able to handle the hundreds of thousands operations per second our message broker was handling. The database we needed should not only be scalable but it should also be elastically scalable, scaling in and out according to the different workloads it would be subjected. Naturally it should also be fast, predictable, easy to manage, tolerant to failures, secure and really cost-effective.

Enter Amazon’s DynamoDB

We pitched the real-time database sync to one of our largest clients, a global Ad Exchange Network, to validate the opportunity and when you hear a client say “That would make our day, any day!” you know you’re up to something. Time was of the essence and we didn’t have much of it, so we added a new requirement to our database requirements list: it must be a ready-to-use cloud service with a nice SDK so we can wrap our real-time features around it.

We were inclined to choose Cassandra, namely because it has been in production in many demanding environments for years and had the NoSQL data model that we were so much in love with since we started using Redis in our messaging backend.

However our time-to-market was short and we needed a managed host service that could reduce the need of infrastructure operations. One that would allow us to focus mainly on the real-time data sync features. We didn’t want to manage Cassandra clusters and certainly didn’t want to manage the required servers.

At some point of the brainstorming someone mentioned “Amazon’s DynamoDB”. The feature list resembled a lot our requirements list and it was a cloud service, a managed cloud service. On top of it SSDs were being used so probably it would be fast and predictable as our friend Redis.

We decided to take DynamoDB for a spin and that proved to be a very good decision.

The API was simple, with good documentation and we’ve come to the conclusion that in DynamoDB we had pretty much the same features we liked in Cassandra: key + columns data model, composite key support, distributed counters and so on.

We would face a few limitations though, the largest value supported is substantially smaller (400KB in DynamoDB against the 2GB in Cassandra) and the deployment options are more restrictive (DynamoDB could only be deployed inside AWS), but we could engineer around those limitations, at least for the cloud service we wanted to provide.

Our major concern was the scalability of the service. We were afraid that we might run into problems when scale-ins occur. We’ve heard of a few issues some companies with highly seasonal applications were having with Cassandra clusters when they needed to decommission nodes, facing slow, manual and error prone operations. We couldn’t afford this type of problems so we were afraid DynamoDB could not be up to the task in this domain. But you know what? It was.

The provisioned capacity units pricing model might not be the easiest to explain to the CEO but technically it’s almost perfect for our current workloads (heavy during the day and evening, lighter late at night). Increase capacity at peak and decrease when off-peak. Dynamo was flawless in all the crazy scale-out and scale-in tests we’ve thrown at it. No data loss requiring restores from backups. And even more important, the application suffered absolutely no downtime during capacity transitions.

This was a major win, not only we would be able to provide our clients with an awesome real-time data sync but we would also be providing means for them to adjust their database capacity to the demand without any downtime. All this through the use of a simple API call. Perfect in these times where budgets are shrinking and you have to do more with less.

It can’t be perfect

“Perfect is the enemy of good”, so they say. And I find that to be very much true. When we started using DynamoDB the indexing options were limited to the Hash and Range attributes. During the development of Realtime Cloud Storage, AWS launched the Local Secondary Indexes with support for up to five local secondary indexes per table, enabling a more efficient retrieval of data. Nice!

A few things are still missing though. At the time of this writing there’s no means of backing up a table periodically to S3 (you have to code it yourself and use some sort of cron) but something tells me that AWS is working hard to fix that. And that’s really another thing we love in AWS. Like us they constantly improve their services and this turns them into a great partner.

You’ll never reach perfection but you can get closer and closer.

The final result

At the end of the day, allowing our engineering team to focus on the real-time data sync features, DynamoDB has been proving a spot-on choice, enabling our clients to write elegant and easy-to-read code like this (a JavaScript client reacting to an item update performed by another user):

If you’re looking for a managed hosted NoSQL cloud database that scales, Amazon’s DynamoDB might be the best option for you at the moment. If you also need real-time data-sync features to keep your users synchronized, DynamoDB can’t do much for you there, but Realtime Cloud Storage can. You’ll have all the benefits of DynamoDB along with super-sleek real-time database triggers that will make your app almost write itself.

But don’t take my word for it, go to http://framework.realtime.co/storage get your free license and take Realtime Cloud Storage for a spin.

Let me know your findings.

--

--

João Parreira

Software Development Manager at Amazon. Husband, father, christian, passionate about highly-distributed systems and Fender Stratocasters.