Comparing an elephant to a washing machine: Redis and SimpleDB

Kyle
8 min readJul 1, 2015

--

You might have noticed that my writing about Redis has slowed a bit — I usually write about what my projects use. Lately, I haven’t been using Node.js / Redis quite as much, although I almost always find a way to add Redis to my projects.

I’m juggling three projects right now that use AWS SimpleDB in varying levels of intensity. I’ve used SimpleDB for years — it was my first foray into non-relational databases. For the uninitiated, SimpleDB is one of the early parts of Amazon Web Services, it was introduced in 2007. It’s a distributed, schema-less, managed, cloud datastore. It’s also an odd-duck — because you query it with an SQL-like language, but you can also access it via row keys (in SimpleDB parlance ItemNames). So, we can’t call it NoSQL, but maybe KindaSQL as it only allows SELECT commands. The odd-duckness theme will persist, so read on.

First things first — I know SimpleDB and Redis don’t serve the same function. Totally different. Like Consumer Reports evaluating an elephant vs a washing machine, this will yield some interesting comparisons. Let’s get started.

HSET until it’s very much not HSET

In SimpleDB, each row has a key (aka ItemName) and any number of attributes / value pairs. Sound familiar? It’s roughly the same as a Redis hash type. But wait, there is more and it seems unhinged if you’re used to Redis. You can assign a value to an attribute multiple times and store them as an a set of values. In fact, if you update an attribute, the default behaviour is to just add a new value. Think of it this way, it’s analogous to a Redis hash that stores sets at each field instead of strings. Here is how it looks in JSON:

It can be quite a shock when you are working with SimpleDB for the first time. You add values to your row by calling putAttributes, here is an example using the AWS SDK in Node.js:

While I’m not crazy about the AWS SDK’s naming methods (damned uppercase-ness) and would much rather run HSET the-item-name name kyle, it’s pretty apparent what is going on here. Now, if we run this function over and over the data will be roughly this if represented by JSON:

So, let’s say later your user wants to change their name to say, KyleD:

You might expect that the attribute name would now equal KyleD. You would be wrong. The new data would be:

If you aren’t expecting this, it can be unpleasant because it might not come up in simple testing routine. You can control for this using the Replace flag in your attribute:

This will get you back a more familiar HSET like paradigm.

Single Thread vs Multiple Machines

If you’re like me, you spend a lot of time optimizing how you arrange and run your requests to Redis — EXEC/MULTI, HGETALL, SADD member member member, carefully crafting Lua to avoid clogging up your Redis server. As you should, as Redis is a performance champ. You do this because when you talk to a Redis server, it’s a single thread.

SimpleDB is not a performance champ. It is an order of magnitude or two slower than Redis in getting a single item/key. But it is designed to be highly parallel and since you aren’t running your own infrastructure, you can abuse this part of SimpleDB’s nature to achieve the performance you would need. For example, let’s say you want to get 100 hashes from Redis with keys like item0…item99. In Redis, you might do something like this:

Using multi and exec, you are pipelining it down to a single request and this will likely be the quickest way to get these values as you connect to a single Redis server that is single threaded. Now, let’s say you want to do something similar in SimpleDB, SimpleDB is based on a REST API, so you’re making a HTTP request for every single item (well, you can do a query, but for arguments sake, let’s just use getAttributes):

I’ll assume you’re working with something like Node, where this is all asynchronous and opening 100 HTTP connections is speedy and not an issue. So, let’s think about these two examples. Let’s say Redis takes an average of 5ms per hgetall — your server will take roughly 500ms to return back the values. Now, let’s say SimpleDB takes 500ms to return back a single item — since you are requesting all the values all practically at the same time you’ll get your results back for all the items in about 500ms. Since you are communicating with AWS as a whole and not a single server, you’re not running the chance of saturating a server or thread. Two caveats here — one it will cost you to do this and it has a bit of an inverse scaling problem. If you make two requests instead of a hundred — it’ll still take you 500ms which is roughly the same performance profile as cold molasses.

Dirty word: Consistency

SimpleDB is distributed among multiple location in the AWS network (always). With that we should talk about consistency — that is that some of the nodes in the distributed system are not always up-to-date. So, if Client A updates a value, then immediately reads that value, it may not return the most up-to-date value. This is because you may have written to SimpleDB Server 1 and read from SimpleDB 2 — eventually they will be the same, which is termed Eventual Consistency. In Redis, you may be familiar with the concept if you use master/slave or cluster configurations.

So, SimpleDB = eventual consistency. Except when it doesn’t. SimpleDB allows for consistent reads which take longer (and cost more, remember this is pay-as-you-go AWS). From my understanding, it waits for consensus among all the internal SimpleDB replications and sends you the response. The latency is non-trivial to do this (in my experience). When you turn off consistent read, no guarantees are made, but the official developer’s guide indicates that everything should be up-to-date about 1 second after writing.

Dirty word: SQL

While Redis is firmly in the NoSQL tribe, SimpleDB sits in a strange SQL purgatory. It’s schema-less, has keys (ahem “itemName()”) and data can be accessed without using a query, yet it still has a SQL pidgin for select only. Best of both worlds? Maybe, maybe not.

I don’t claim to be a superstar when it comes to SQL, but I know my way around and I can get things done in several dialects. I moved to primarily using Redis a few years ago after having moment of thinking “This is just too much complication for what it is accomplishing.” Thankfully, with SimpleDB you’re using mostly the good parts of SQL.

Pro
On the pro side, you only have to deal with one type of query and you can’t access other domains, so no complicated join non-sense. Aside from the itemName(), everything else is BOG standard:

SELECT [attributes or star] FROM [table/"Domain"] WHERE [predicate] ORDER BY [...] LIMIT [start,count]

With a little background in SQL, you should be comfortable in a day or so.

One other pro that you might not have thought about — comfort. If you’re doing client work, you may have had clients look at you funny when you mention Redis — it isn’t something that everyone knows about yet and there is a whole swath of professionals that will look at anything but SQL with suspicion. Clients are generally comforted when you say you’re working with a managed service that uses SQL as opposed to an in-memory key-value store. I’m not saying this comfort is logical, but that people are creatures of habit and are resistant to change (even if it is positive change).

Cons
SQL brings back a feeling deep in my stomach: injection anxiety. As you probably already know, Redis has the most beautiful line ever written in a security document:

The Redis protocol has no concept of string escaping, so injection is impossible under normal circumstances using a normal client library.

With SimpleDB, since you are still concatenating strings to make queries, you’re still in-danger of letting a baddy have fun getting a peak at your tables. If you aren’t careful someone can place a quote in the right place and get more than you wished, but general careful coding and a little regex will make your code safe(r). Now, the danger is more limited than in something like MySQL because there are not commands like DROP, INSERT, GRANT, etc. in SimpleDB.

SQL has an effect of masking terrible query behaviors where the Redis protocol seems to draw attention to bad behaviors. A simple query can yield really terrible performance and it might be difficult to know exactly why it is a pig. Redis is more transparent (in my mind) as you’re often rolling your own way to get to the data you want.

So, why SimpleDB?

While I can’t support it with a quote, I get the feeling that SimpleDB was built as an internal tool for Amazon. The other day I was shopping for a tea kettle and I started sorting and filtering items — I could completely understand how easy that type of query would be in SimpleDB. It has some really elegant and simple (har har) ways to accomplish normally very complex tasks that would be difficult with Redis or a more traditional SQL database. Something like adding multiple tags to a product on an arbitrary aspects of the product’s design is trivial with SimpleDB. It would be messy with Redis and a nightmare in a schema-based DB.

Another thing about it is that it is cheap and managed. AWS is pay-as-you-go and they offer generous free-tiers — this bothers some people because it is like a bait-and-switch, they get you dependent on a free service then start charging money once you get used to it. I don’t worry about that with SimpleDB as it is so cheap. The free-tier is 25 “machine hours” of activity— and many simple operations are in the millionths of machine hours. After that, each machine hour is $0.14. For me, even production scale apps cost less than flushing the office toilet once per day. The other plus is that it is managed — just create and go. No backups, no upgrades, no monitoring, no worries.

So here is my use cases for SimpleDB over Redis:

  • Iffy consistency is okay
  • Data that lends itself to parallel requests
  • Larger amounts of data accessed less frequently
  • Client wants extra-API access to app data
  • Little or no ongoing maintenance budget
  • Latency and speed are not priority #1

That being said, I’d say that this only applies to a fraction of projects I work on and Redis fits the bill more precisely the lion’s share of the time.

--

--

Kyle

Developer of things. Node.js + all the frontend jazz. Also, not from Stockholm, don’t do UX. Long story.