More Optimized Redis

Mahdi Tavasoli
6 min readAug 20, 2022

--

Redis is nice. it and can process 80k set or 110k get requests in a second! this can be enough for many scenarios. but there is always a way to improve and this improvements sometimes are very important! in this post we will learn about various tips to increase performance in Redis.

Use the most suitable data type

there is many datatypes in redis for various usecases. (string, list, set, sorted set, hash, bitmap, hyperloglog, streams).

there are some tips for use this datatype (also read all commands of each datatype in redis document):

  • an important command for strings are mset and mget for multi set and get.
  • use hash instead of set N string key for save an object. make a hash key and set elements on it. this is so good for use less memory. (and good for performance because in string datatype you must get keys with a pattern for a object and then get values of keys.)
  • bitmaps is so good for analysics data. this is not actualy a datatype. this is a method for behavior with bits. let me give you a example: we want to save userIds that visit a page. we can make a hash key and set userIds in elemetns of key. if there is just a few hundred every thing is ok. but what if its more? we have a key with 100k or 1m element and a lot of memory usage! we can use bitmaps. imagine there are 1 million bit in a sequece. each bit for a userId and value of a bit can be 0 (not visited) or 1 (visited). this is only 1 milion bit. not 32 * N bit(N is users that visit the page). but be careful if you thinkg there is little users that visited the page, its better to use a hash key! (or if userIds are too large numbers, its better to compress them to smaller numbers!) a real world example: there are 5m user that 2m of them visit a page. if we use set datatype there are 8mb memory usage and 600kb for bitmap
  • set are a batch of related strings that they are unique. you can union them. this datatype is so good for fast reading data. there are commands for check members and … . you can set 4B element in each key!
  • sorted set have features of sets. in addition, each element have score and we can sort elements by score and check the count of elements with a score and … . this datatype can be use for handle tasks and prioritize them.
  • hyperloglog is good datatype when we have a key with many many elements (milions or bilions!). we can add elements and get count of elements or merge elements or.. very fast.
  • streams is an append-only datatype for make redis a message broker (producer/consumer) or write large stream data. redis main documentation and seasion 8 of Redis in action book discussed about it.

Use datatypes more optimally

  • use the best tool for serialization data before send it to redis. for example in python, use cpickle or pickle or orjson instead of json for faster object serialization and reduce memory usage. (see benchmarks)
  • string. in a string key, if value is only some numbers or short text or long text, encodings and memory usage changes a lot. try to have short values or values with only numbers and in application layer translate value.
  • list. if count of element is fewer than list-max-ziplist-entries config and size of each element is fewer than list-max-ziplist-value config. encoding have change in redis and memory usage of data will be much less
  • set. if all elements are ineger and count of them is fewer than set-max-intset-entries config, encoding have change in redis and memory usage of data will be much less
  • sorted set. if count of element is fewer than set-max-ziplist-entries config and size of each element is fewer than zset-maxziplist-value config, encoding have change in redis and memory usage of data will be much less
  • hash. if count of element is fewer than hash-max-ziplist-entries config and size of each element is fewer than hash-max-ziplist-value config, encoding have change in redis and memory usage of data will be much less

Use Pipeline

  • concept: redis is so fast in write data but commands execute in one thread (I/O is multhithread and transactions are single thread). when we execute one command there is network betwen client and redis server. so there is more latency in addition to the execution of command. in pipeline we send a batch of command for redis and redis queued them and execute them one by one.
  • implementation: so easy :) just read the original document of redis or redis client of your programming language!
  • python example:

pool = redis.ConnectionPool(host=’localhost’, port=6379, db=0)

conn = redis.Redis(connection_pool=pool)

pipe = conn.pipeline()

for key in [‘key1’, ‘key2’, ‘key3’]: pipe.delete(key)

pipe.execute() # returns: [True, True, True]

Use RESP(redis serialization protocol)

  • if there are just some write command its ok to execute them one by one. if there are a few dozen or a few hundred we can use pipeline. but what if it is a few million? yes we can use pipeline again and use many instances for redis server. but what if there are millions command in many clients in many times a day? dont worry, redis have a feature for this: RESP! this is so easy. there is only a text file with some redis protocol commands:
  • you can see redis main documentation and there is clean and enough. short description: we make a binary file and execute command cat redis-data.txt | redis-cli — pipe for execute commands.

eaxmple for SET samplekey testvalue:

3<cr><lf> # *3 means there are 3 input for command

$3<cr><lf> # $3 means there is 3 byte

SET<cr><lf>

$8<cr><lf> # $8 means there is 8 byte

samplekey<cr><lf>

$9<cr><lf> # $9 means there is 9 byte

testvalue<cr><lf>

Correct data persistence strategy

read about AOF (write data in disk with each transaction) and RDB (write periodical) methods for persist data. its depend to your condition to make a decision and it is not a complicated task.

Administration tips

avoid memory leaks
we can set maximum memory with maxmemory 100mb command. (0 means unlimited)
when there is no free space in RAM and limits, we can set policies for new set requests with maxmemory-policy config. notice that in worst cases, in back-up redis doubled the memory usage.
policies:
noeviction -> send error to new commands.
allkeys-lru -> keys that recently less used will removed.
volatile-lru -> among keys have expiration, remove keys that recently less.
allkeys-random -> remove some keys randomly.
volatile-random -> among keys have expiration, remove some keys randomly.
volatile-ttl -> among keys have expiration, remove keys that have fewer ttl.
allkeys-lfu (from version 4) -> removee keys less used at all.
volatile-lfu (from version 4) -> among keys have expiration, removee keys less used at all.
note: in volatile policy if there is no key with expiration, redis will behaviour like noeviction.
note: if some keys is used more, -lru is better. if keys use with balance, -random is better. -ttl is good at all.

sharding
why? imagine there are one instance of redis for your appliaction. if a break happens, all of data will lost. always is better have distributed resources. if we have 2 instane of Redis, we can store important data to one of them and persist them (now we persist only importamt data for less disk usage and write frequency). and there is another instance for cached data and have some replica for lots of read requests.
how? there are no special feature for sharding in Redis. this can be handle in application layer. some keys with a particular key prefix can go to a redis server with a particular host and port. so simple :)

monitoring
there are two simple command for monitoring memory usage
memory usage key (show data in bytes)
info memory (get cleaned data about memory usage)
note: in info command we can see condition of various sections in redis like cpu and memory usage. replicas and ..

Redis stack

yes! redis stack! there are some products for full text search, graph, json and sql search, time series and etc that you can see redis main documentation

Resources

  • main document of redis
  • redis group for discussion
  • books:
    oriely’s redis cookbook (have good tricks for analitycs/time series/administary/…)
    manning’s redis in action (have many tricks and best practises. have tips for performance and administary)
    mastering redis (jeremy nelson) (have performance and administary tips)
    redis essentials (maxwell dayvson) (have tricks. have tips about clustering/replciation/partitioning/sharding/scale/..)

if you interested about caching and buffering, you can read about another key/value databases, cache strategist(Cache Aside, Read Trough, Write Trough, Write Back, Write Around, Reverse Cache), experiences and best practices in application layer and ..

also some good github repositories:

--

--