Zip’em ALL!

Optimizing Redis memory usage for strings, integers, jsons and anything you like

Whats inside:

  • How much +1 symbol to every redis key costs, and does key names optimization worth it?
  • Key/value store abstraction over redis hashes. Getting 10 times less memory for stored integers or getting 5% memory discount for medium previews.
  • To GZip or not to gzip — comparision of gzipping VS other types of memory optimizations on a various range of data.
  • Introducing me-redis gem — all types of redis memory optimization in one set of tools. Tiny code changes for a max result!
  • Memory savings ( starting at 2.5 less up to 12 times less memory ) and performance tradeoffs in numbers.

Universal memory optimization ideas

Redis is one of the most popular key-value in-memory storage's, it delivers blazing fast O(1) speed for most of its operations. To keep it that way redis refuse to do any additional manipulation to data, like: value zipping, key shredding, auto zip of small objects and so. I see two major reasons for a such decision: a single-threading nature of a Redis and a data structure unawareness on a Redis side. You can’t change Redis nature to multi threading, but you can optimize the way you store your data.

Here are three different relatively universal ideas on how to become memory efficient:

  1. Shorten your keys
  2. Abstract plain key/value store over hash for small objects
  3. Compress values on application side

They all have their limitations, and must be used properly. I’ve tried them all, different kinds of their combinations and summarize them to me-redis gem as a ready to go set of tools.

Zipping keys

Lets start with a key length investigation. Usual point of view is how a key length affect performance: Does name length impact performance in Redis?, Redis Performance — Does key length matter? and so. But how much is one additional symbol to all keys will impact a memory used?

Simple assumption for a size of a memory allocated just for storing keys would be a key.length * Keys.count, so +1 symbol to each key may bring +Keys.count bytes to memory used.

After all it’s quite accurate!

Only one thing left to mention: a memory growth may not be spotted on each +1 symbol. It’s more like a long stair, where you can do couple of steps on a single stair, but after all there will be another stair and the trend will go linear.

In numbers: an integer value of 1 stored in 10K key/value pairs, consumes memory around 0.6Mb for 3 symbols per key, 0.9Mb for 20 symbols per key and 2.2Mb for 128 symbols per key. Inspecting memory consumption from 3 symbols to 128 will reveal the trend — 1 byte per symbol. See code section at the bottom for additional details and key zipping examples.

Resume for key zipping: if a value size is comparable to a key size, than zipping keys may bring noticeable results. And vice versa: if your value is 1K size, than optimizing 10 bytes on each key will never be more than a 1% memory win.

Plain key-value store abstraction over hash

In official memory optimization topic there is a paragraph: “Using hashes to abstract a very memory efficient plain key-value store on top of Redis”. You can read it in full from a given link, here is the essence:

Let’s assume the objects we want to cache are numbered, like:
object:102393
object:1234
object:5
This is what we can do. Every time there is to perform a SET operation to set a new value, we actually split the key into two parts, one used as a key, and used as field name for the hash. For instance the object named “object:1234” is actually split into:
a Key named object:12
a Field named 34
So we use all the characters but the latest two for the key, and the final two characters for the hash field name. To set our key we use the following command:
HSET object:12 34 somevalue
As you can see every hash will end containing 100 fields, that is an optimal compromise between CPU and memory saved.

A hash abstractions pitfalls

Had to mention that if integer tail of keys is rarely distributed, than such sharding leads to hashes with a very few keys, making this approach is wore than useless.

Example:

# code similar to redis guide 
# without hash-max-ziplist-entries awareness
def custom_incr(key)
split = key.scan(/(.*)(\d{2,2})$/).flatten
split.length == 0 ? key.scan(/(.*)(\d{1,1})$/).flatten : split
hincrby(*split, 1)
end
redis.pipelined { 
10000.times {|i| redis.custom_incr("user:#{i}") }
}

Allocates only 70kb for 10K records! This because of i is consequent. Lets make HOLES!

redis.pipelined { 
10000.times {|i| redis.custom_incr("user:#{rand(1000000)}") }
}

Holey-molly, it’s a 685Kb, practically same size as incr would do. Of course it would do, since 1000000 / 10000 equals 100 — the exact shard size. This means roughly one element per hash, “quite an optimization”!

hash-max-ziplist-value and hash-max-ziplist-entries

A redis topics code example (not the topic itself) actually ignores them. But you need either to set hash-max-ziplist-entries to 100 or to use the real hash-max-ziplist-entriesvalue as a sharding size to be productive.

For example if a default value for hash-max-ziplist-entries is 64, than using 100 as a sharding size is a bad idea, because:

If a specially encoded value will overflow the configured max size, Redis will automatically convert it into normal encoding.

Same with hash-max-ziplist-value you need to be aware of your hash-max-ziplist-value size, otherwise you’ll break the optimization and go back to usual key/value pairs.

If hash-max-ziplist-entries value is greater than shard size this is not a disaster just resource waster. For a given case of “holey-molly” real number of hash-max-ziplist-entries was actually 512, and 412 slots were “wasted”. Using sharding size of 512 would give us a 5 keys per hash roughly, and 260Kb of allocated memory, compared to 700kb for direct key/values.

Hash abstractions resume:

This kind of optimization must be:

  • aware of a size of a stored data,
  • aware of a keys structure ( you need to shard properly so you must understand the nature of a keys structure ),
  • aware of hash-max-ziplist-value and hash-max-ziplist-entries values, or directly set them, to keep this memory optimization working.

A memory discount is limited: you can’t get more than 90 bytes from each key/value pair! It may be a huge win for storing integers, and only a 10% for 1K strings, for example short previews on main page of medium.com is 5K size, so it’s 5% less memory then.

Values compression

This is not the Redis specific topic, so I’ll get straight to the resume.

The bad parts:

  • its never performance free, at application level it will slowdown your code.
  • general string compression algorithms may not compress well all kinds of strings, may not even compress :). For example, short strings compressed with general purpose compression algorithm usually getting even bigger.

The good part: it may be the only way to optimize memory consumption for large values.

Zip’em ALL, introducing me-redis gem

Disclaimer: me — states for Memory Efficient.

Features:

  • seamless integration with code already in use, apart from me_* methods it’s all in MeRedis configuration, not your current code.
  • hash key/value optimization with seamless code changes, you can replace set(‘object:id’, value) with me_set( ‘object:id’, value) and free 90 byte for each [‘object:id’, value] pair.
  • zips user-friendly key crumbs, i.e. converts for example user:id to u:id
  • zip integer parts of a keys with base62 encoding. Since all keys in redis are always strings, than we don’t care for integers parts base, and by using base62 encoding we can ~1.8 times shorten integer crumbs of keys
  • respects pipelined and multi, properly works with futures.
  • different compressors for different keys namespaces. You can compress objects, short strings, large strings, primitives with a best suitable way for each other.
  • hot migration module with fallbacks.

Memory discount in numbers

Memory consumption comparison are done for a 100K pairs for a keys from stackoverflow question: set-allBooksBelongToUser:id, hash-max-ziplist-value is set big enough to contain any piece of a tested data.

Values sets are: integers, plain English strings variable length, html pieces variable length, ActiveRecord objects json serializations of two different models. The values inside cells are Mb of memory used, cleared from clear Redis base memory use (~500Kb).

It’s all in the tables but I’ll make some short over all resume:

  • best result is 12 times less memory for compacting integers to hash abstraction with key zipping.
  • other datatypes: short plain English strings smaz compressor show best result — 3 times less memory (it’s generally useless compressor but still for an example purpose), 2.5–3 times less memory for large strings or html pieces with gz compression, 3 times less memory for object serialization with a custom compressor and full set of optimization.
  • idea of optimized compressors for namespaces may work well since there isn’t any universal compressor for all datatypes. You can tune performance and compression independently.
  • using a hash optimization works better with small types, and not so well with large, because it brings nearly fixed result by freeing ~100-bytes per pair.

Perfomance tradeoffs

At redis level we speaking only about comparison between hash abstraction performance against classic methods. So we are comparing get/set/incrby to hget/hset/hincrby and also advanced commands like getset, mget, mset to transaction wrapped multiple calls of hget/hset.

  • get/hget VS incrby/hincrby — same-ish,
  • hset VS set — hset is slightly faster.
  • mget/mset VS multi + mget/hset — mget/mset are faster, dependent on how much values we try to set at once.

You can recheck all with redis-benchmark tool by yourself.

At ruby level, of course doing nothing is instant fast against pipelining multiple calls, compressing values, decomposing and recomposing keys.

I will measure only clear me_method performance for single namespace, leaving out compression impact. Single calls goes from same-ish for me_set/me_get/me_incr to 1.7 times slowdown for me_mset/me_mget, for 10 pipelined operations in a row me_get/me_set/me_incr methods show 30–50% slowdown, me_mset/me_mget are 2–3 times slower than mset/mget , for 100 pipelined operations in a row me_get/me_set/me_incr show similar results — 30–50% slowdown, me_mset/me_mget’s slowdown shows logarofmic growth — to 3.5–4 times less performant. All details at the bottom section.

Compression tradeoff — depends on the algorithm and compression level, it’s up to you to decide what is the best for you.

Links:

  1. Redis official guide in memory optimization.
  2. How we cut down memory usage by 82%
  3. Does name length impact performance in Redis?
  4. Redis Performance — Does key length matter
  5. Redis memory optimization practical advices
  6. Caching through hashing
  7. Redis benchmark tool, benchmarking hset/hget
  8. Strings VS Hashes stackoverflow question

Code samples, benchmarking, CLI outputs e.t.c.

Memory consumption VS key size:

Key zipping example:

Me methods raw performance to original

Custom compressors example

MeRedis config example