Storing hundreds of millions of simple keys in 282 MB with Redis

Published in

The work we do at hike messenger

3 min readJun 25, 2015

To start off with, I am Rajneesh, an engineer at Hike Messenger. Today, hike is the fastest growing messaging app in India.

In this blog I will be talking about how important it is to choose right Redis data structure — In-memory key–values store for best performance and minimal memory usages.

We are using Redis extensively at Hike and it is core of our caching layer backed by persistent databases like MySql, MongoDB, Hbase etc. We also use Redis as persistent storage for some of use cases.

For one of our use cases we tried different different approaches and Redis data structures to store data with minimum memory consumption without compromising performance. And the results motivated me to share my findings with the community. So lets first understand our use case.

Hike is mobile messaging application where one hike user can send unlimited Instant Messages to another hike user as well as non-hike user via SMS. At the same time, there are TRAI rules & regulations about sending SMS to DND (Do Not Disturb) users in India. So we need to query whether a given user ( MSISDN with only digits) is on DND or not with minimum latency and we decided to use redis for it. There are approximately 250 millions users registered on DND in India. Well that’s very huge data to deal with. isn’t it?

A very straight forward approach would have been using simply key — value store in following manner

> redis-cli set 9199717880XX 1
OK
> redis-cli exists 9199717880XX
(integer) 1

but Instagram’s blog on redis and Redis’s own documentation on the same saved our time and we tried to use hashes to store this whole data.

Hashes in Redis are dictionaries that are can be encoded in memory very efficiently; the Redis setting ‘hash-zipmap-max-entries’ configures the maximum number of entries a hash can have while still being encoded efficiently. We didn’t modify this value ( which is 64 by default for Redis server >= 2.6 ) as setting higher value would have caused HSET commands to higher CPU usages.

We decided to make the buckets of maximum 10000 fields in a hash with the assumption that all users in 0–9999 range with common starting prefixes won’t be DND users and average size of a bucket will be nearly 500.

Data in Redis for the same msisdn example looks like following

> redis-cli hset 91997178 80XX 1
OK
> redis-cli hexists 91997178 80XX
(integer) 1

Using this approach, we were able to store the whole data in 14GB memory on Amazon EC2 m3.xlarge instance. But storing dummy value as 1 with each entry means a lot of wastage of costly memory. So we looked at a Redis bitmap where you can manipulate bits at given position. We found that this perfectly suits our use case where we simply need to check the presence of a member.

So taking the same example again our data looks like following

> redis-cli setbit 91997178 80XX 1
(integer) 0
> redis-cli getbit 91997178 80XX
(integer) 1

Results of using this approach were amazing. We are now able to store same data in just 282 MB. Sound’s great, right ? and memory consumption will not increase linearly as number of DND users increases. Currently there are 189,692 buckets so we are able to store ~1.8 Billion users’ information using the same memory :-).

Time complexity of member presence check ( GITBIT) and add/delete of member (SETBIT) is still O(1).

One may suggest using Redis Sets but our requirement wasn’t taking advantages of any set operations and don’t you think it is wastage of memory also where keys contain common prefixes?

Written by Rajneesh Mitharwal