bitmapist: Analytics and cohorts for Redis
In 2012 I released bitmapist, a powerful realtime analytics library that can help you answer following questions (for millions of users and events):
- Has user 123 been online today? This week? This month?
- Has user 123 performed action “X”?
- How many users have been active have this month?
- How many unique users have performed action “X” this week?
- How many % of users that were active last week are still active?
- How many % of users that were active last month are still active this month?
Additionally bitmapist can generate cohort graphs that can do following:
- Cohort over user retention
- How many % of users that were active last [days, weeks, months] are still active?
- How many % of users that performed action X also performed action Y (and this over time)
- And a lot of other things!
At Doist we store hundreds of millions of events inside bitmapist and we have saved $100.000+ USD on our setup during the years. Here’s a guide of why it’s useful and how to get started.
Please note that bitmapist is for Python, but there is a PHP port here.
Why I implemented this
I looked at Mixpanel’s retention feature — which looks amazing. The problem for us is that we would need to track tens of millions events pr. month and Mixpanel is very expensive (it would cost us over $2000/month to get this feature!)
So I did what any sensible hacker would do: I coded my own version and open-sourced it so others can use it and contribute to it.
What are bitmaps?
Bitmaps are the foundation of bitmapist. They enable storing events for millions of users in a very little amount of memory.
In general bitmaps are an array of bits (zeros and ones). A bit in a bitmap can be set to either 0 or 1. You can then do some simple operations to them such as XOR and OR. Bitmaps are a basic data structure in Redis.
If you want to read more about bitmaps please read following:
Now let’s look at how to install and use the simple bitmapist API.
Installation
Getting bitmapist is quite simple:
$ pip install bitmapist
Example usage
For more complete documentation please see the Github page.
Setting things up:
from datetime import datetime, timedelta
from bitmapist import mark_event, MonthEvents, BitOpAnd, BitOpOr
Mark user_id 123 as active:
mark_event('user:active', 123)
Answer if user 123 has been active this month:
assert 123 in MonthEvents('active', now.year, now.month)
assert 123 in MonthEvents('song:played', now.year, now.month)
assert MonthEvents('active', now.year, now.month).has_events_marked() == True
How many users have been active this week?:
print len(WeekEvents('active', now.year, now.isocalendar()[1]))
Perform bit operations! How many users that have been active last month are still active this month?
active_2_months = BitOpAnd(
MonthEvents('active', last_month.year, last_month.month),
MonthEvents('active', now.year, now.month)
)
print len(active_2_months)
# Is 123 active for 2 months?
assert 123 in active_2_months
bitmapist cohort
With bitmapist cohort you can get a form and a table rendering of the data you keep in bitmapist. If this sounds confusing please look at Mixpanel.
Here’s a screenshot of how this looks like:
Generating the HTML form for querying bitmapist
The following code:
Will render this part:
Generating the HTML data
The following code:
Will render this part:
Happy hacking! :)
That’s about it!
Please try to give bitmapist a spin, I am sure you will love it :-)