How to work with AWS DynamoDB

6 min readJun 17, 2016

Background

A few months ago at Maple Inside we accepted to take in charge an existing project to improve and finish it. Backend API were written in Python with PostgreSQL as database and the frontend was built on top of the PHP framework Laravel.

As the code was really a mess with a lot of security holes, we found it easy to do a complete rewrite using our stacks (NodeJS for API, and React for application).

As data were poorly relational and because I always want to try new stuff, I decided to use DynamoDB as the main database.

What is dynamoDB

According to the doc, Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability

DynamoDB is similar to Cassandra or Redis, it’s a key-value database which stores JSON object. It’s pretty different from mongoDB because the query langage is a lot simplier. You can’t aggregate data while querying like SUM, AVG...

Here is a brief overview of the main dynamoDB terms.

Hash key &Range key

Hask key (also called partition key) is mandatory and must be unique. Each object has a hask key, it’s the primary key of your tables. There is no auto increment feature, so you can use whatever you want as key as long as it is unique.

Range key (also called sort key) is optional. If you choose to use it, the combination hash+range key must be unique. In that case you can have duplicate hash key. This key allow you to sort your objects on it.

Global Secondary Index

You can create 2 type of index to query data efficiently. You have to think about Index like automatic tables that are synced with your table. DynamoDB stores objects on several servers but the hash key is always on the same server of the cluster.

A global secondary index (GSI) is an index across all servers.

Ex : the user table contains email which is used for the login action. So we need to create an index on that attribute. Like we need all email across all server, we create a GSI. Now when a user log into the website, we look for his email in the Index instead of the user Table.

Local Secondary Index

The local secondary index (LSI) is an index restricted to a single server. I didn’t use LSI yet for that project but according to the doc, an example is, suppose you have a forumName (hash key) which contains topics. You can add a LSI using Topics attribute. It will allow you to sort by Topics into a specific forum.

The pros

API

DynamoDB is a webservice accessible via an API and the documentation is clear enough to easily use it.

Easy, Fast & Cheap

It’s really easy to start with, just create a Table via AWS console and you’re done, you can start add data into it. I was surprised by the speed of the queries. Event with hundred thousands of objects, it takes less than 100 ms including network latency.

The pricing depends of the provisioned throughput you set on read and write capacity. I will not detail here how it works but you can refer to the doc

NoSQL

DynamoDB as a NoSQL database is schemaless, so you can put whatever you want. No constraints on the data you want to store. Generally you will use an ORM to access your data which will force you to follow a certain schema but that’s not mandatory.

Events

You can enable stream logs to trigger some actions according to events you receive. For each DELETE you can throw the event to a lambda function for example. That’s also the solution to replicate your tables.

DynamoDB local & dynalite (github)

Of course for development you will need to have dynamo in local. I personnaly work with docker so I test 2 differents images of dynamo. The 1st image is build compiling the official source of dynamoDB local (docker) but it has some small differences with the service so I found dynalite (docker). It was a bit better but I still have some differents behaviors so I ended working directly on dynamoDB and accept the latency.

The cons and how bypass them

Of course after using it we also faced some issues with it.

No database

Seem incredible but you don’t have any Database with DynamoDB, you only declare Tables under your account. It means that your table name must be unique by region. It also harder to fully separate your differents environments. We had 2 solutions for having a staging and a prod env. First we could use 2 differents regions (us-west-1 ans us-east-1). Secondly and that’s what we did, using the environment as prefix for tables.

Backup

When AWS says it’s fully managed … I would say, almost. There is no backup include natively. However you can easily do it with datapipeline to backup your data in S3.

Case sensitive

This behavior make the search very hard. Contrary to mongoDB, you can’t search using regexp, which is not efficient by the way. We have 2 solutions for this.

First, we use a normalized attribute which contains the lowercase and deburr (lodash) version of the attribute we have a search feature on.

Secondly for more powerful searches, we are using ElasticSearch.

Pagination

There is no limit and offset parameter as we know with MySQL or mongoDB. DynamoDB send you a 1Mb response max with the last key it found and if the result is truncated. It means you have to loop until the resultset isn’t truncated. The problem is that you can’t have the total of objects before running across all your table. No total, no pagination. However you can change your pagination for a next and previous button only.

You can find a such pagination on reddit ; https://www.reddit.com/r/nosql/?count=25&after=t3_41d9sd . We see that t3_41d9sd refers to the last item on the page.

The resultset also contains a count value which tells how many items has been found.

Our solution… we are still working on it. For now we are counting on our side but we know that’s not efficient.

Sorting data

To be able to sort on an attribute you need to have a local index on it. It means it’s pretty hard to have a a sort feature on every attribute. You can add the range key on the attribute you want or add local index, but one more time, it’s hard to sort on everything.

Our solution: focus on the client needs, see how he uses the administration of his application and don’t create feature he will never use. Even if it’s cool to sort email by alphabetical order, you would probably never need it because searching an email you want is a lot more efficient.

For pagination and sorting, this cause troubles only on the administration of the application, where you want to list all your data. But using DynamoDB you have to think in a different way. Always think big, suppose your database is hosted on 50 servers and you have billions of records, would you need to list all that data in a table ? Probably not, but you want to be able to find the right data in few milliseconds. That’s why dynamoDB is design for.

Dynamo Migrate

I didn’t find a tool for running migration as I usually do with mysql so I quickly code one, calling it dynamo-migrate (github)

Final thoughts

Even with some cons, I personnaly find dynamoDB very efficient and useful. I start using it on personal projects, just because it was very easy to use and I love the schemaless model.

I strongly recommend to give a chance to DynanoDB to store and fetch data but don’t use it if you want to manipulate data.

Like any NoSQL database, the feature is driving the data. With relationnal database it’s very easy to store data even if you don’t know how you will use it later. With NoSQL, you have to know how the data will be use to store it in the most efficient way. And if you can’t know how it will be use, then probably you’d better keep a relationnal database.