bitmapist: Analytics and cohorts for Redis

In 2012 I released bitmapist, a powerful realtime analytics library that can help you answer following questions (for millions of users and events):

  • Has user 123 been online today? This week? This month?
  • Has user 123 performed action “X”?
  • How many users have been active have this month?
  • How many unique users have performed action “X” this week?
  • How many % of users that were active last week are still active?
  • How many % of users that were active last month are still active this month?

Additionally bitmapist can generate cohort graphs that can do following:

  • Cohort over user retention
  • How many % of users that were active last [days, weeks, months] are still active?
  • How many % of users that performed action X also performed action Y (and this over time)
  • And a lot of other things!

At Doist we store hundreds of millions of events inside bitmapist and we have saved $100.000+ USD on our setup during the years. Here’s a guide of why it’s useful and how to get started.

Please note that bitmapist is for Python, but there is a PHP port here.

Why I implemented this

I looked at Mixpanel’s retention feature — which looks amazing. The problem for us is that we would need to track tens of millions events pr. month and Mixpanel is very expensive (it would cost us over $2000/month to get this feature!)

So I did what any sensible hacker would do: I coded my own version and open-sourced it so others can use it and contribute to it.

What are bitmaps?

Bitmaps are the foundation of bitmapist. They enable storing events for millions of users in a very little amount of memory.

In general bitmaps are an array of bits (zeros and ones). A bit in a bitmap can be set to either 0 or 1. You can then do some simple operations to them such as XOR and OR. Bitmaps are a basic data structure in Redis.

If you want to read more about bitmaps please read following:

Now let’s look at how to install and use the simple bitmapist API.

Installation

Getting bitmapist is quite simple:

$ pip install bitmapist

Example usage

For more complete documentation please see the Github page.

Setting things up:

from datetime import datetime, timedelta
from bitmapist import mark_event, MonthEvents, BitOpAnd, BitOpOr

Mark user_id 123 as active:

mark_event('user:active', 123)

Answer if user 123 has been active this month:

assert 123 in MonthEvents('active', now.year, now.month)
assert 123 in MonthEvents('song:played', now.year, now.month)
assert MonthEvents('active', now.year, now.month).has_events_marked() == True

How many users have been active this week?:

print len(WeekEvents('active', now.year, now.isocalendar()[1]))

Perform bit operations! How many users that have been active last month are still active this month?

active_2_months = BitOpAnd(
MonthEvents('active', last_month.year, last_month.month),
MonthEvents('active', now.year, now.month)
)
print len(active_2_months)

# Is 123 active for 2 months?
assert 123 in active_2_months

bitmapist cohort

With bitmapist cohort you can get a form and a table rendering of the data you keep in bitmapist. If this sounds confusing please look at Mixpanel.

Here’s a screenshot of how this looks like:

Generating the HTML form for querying bitmapist

The following code:

Will render this part:

Generating the HTML data

The following code:

Will render this part:

Happy hacking! :)

That’s about it!

Please try to give bitmapist a spin, I am sure you will love it :-)