Minimum Viral Product: Activity Stream and Message Queues
REWQ.co : Linkedin for learning
Starting Question: How do I efficiently implement activity streams on REWQ.co?
Activity streams are database intensive operations. They can get quite complex — I found some good videos from Twitter engineering . It was a good introduction to what this beast means.
Activity streams can be of two types,
- Whats there on my timeline?
- Whats happening on my network?
The first is an easier problem to solve and thats what we do in an MVP.
Thought: Building an activity stream can make your Minimum Viable Product into a Minimum Viral Product. Can you make your users look cool — with the activity streams that they have created?
Background & challenge: We use MongoDB as our database on REWQ. Assuming the writes are slow (not really experienced them), we should do the heavy lifting work on a different server. This is where Messaging Queues come in.
Messaging Queues are reliable ways to instruct “the helper” servers to accomplish an intensive task on the background. In our case, we have two main tasks running in the background (1) a link scrapper (network intensive) (2) an activity page builder (database write intensive) .
The introduction to message queues videos have been quite boring. I got two cans of beer.
So some message queue options available are :-
- RabitMQ — started way back in 2008, by JP Morgan Bank
- Kafka — open sourced by LinkedIn
- ZeroMQ — for simple messaging needs
- Redis PubSub — cloud9 mentioned it as a mechanism to decouple code in a presentation about scaling nodejs apps
What would make one of them relevant for my use-case would be?
- Light weight — no extra overheads on CPU or memory
- Something that lasts me till my first 100K users
- Stable client library for NodeJS (my programming language). Well written — allow me flexibility to shift from one messaging system to other.
- Reliability requirements — should the broker be persistent or not.
What amount of data will I be producing?
Not much. A 100K users reading 3 links each day leads to 300K messages everyday. Lets say 60% of the traffic is in USA, spread over some 4 hours — thats probably our darkest time for now.
Messages/sec at peak = 60% of 300K / 4 hours = 12.5 Messages / Second
Okay, just to compare here, RabbitMQ handles 20K+ messages/second and Kafka handles 100K+ messages/second .
The Plan
Plan A) Use Redis Pubsub to send activity messages to the client that does heavy lifting. This is a lot like using Redis to collect log/analytics data.
Plan B) I plan to use Redis and ZeroMQ to form a layer of cache and a networking library to talk between the server that does the heavy weight work on the activity streams.
I am tilted towards plan A because it looks simpler.
Other conclusions of the Study
One of the other conclusions is that some of the technologies for scaling websites still evade me — for example blogs about scaling Twitter, or making Disqus realtime etc on highscalablity are not easy to get a full import of. But I don’t expect to be needing 10+ servers anytime soon.
References:-
(1) Links on Heroku.
Stardate: Recent … Nov, 2014
Describes a scenario where Ruby app writes data safely into Mongodb. Separately, a consumer continuously streams data from MongoDb and pushes it to the client (which is a browser). This use-case seems too simple for my scenario.
Next, I would like to see something which talks about user facing server communicating with a backend server. The backend server should store the particular messages. Redis can be implemented as a cache.
(2) Comparing RabbitMQ, Kafka, ZeroMQ and Redis
Leaving out SMQ-AWS (simple message queue) as I am looking to integrate one on my own.
Seems like Kafka is better suited for Social and consumer applications. RabbitMQ looks like is more used in the enterprise. An MVP does not really need a Kafka. Lets look at ZeroMQ and Redis-PubSub.
We are still left to explore ZeroMQ and Redis-PubSub, in search for a messaging and job queue system for REWQ.
The ZeroMQ evangelizers create a feeling of something out of the underground. Though I like it because it looks simple. I can use it from a NodeJS client library without making any changes. Theres one producer and one consumer. I can use it to talk between the frontend server and server storing activity streams; and in addition, I can use a Redis to cache data.
Some links on ZeroMQ and Redis.
This is my bookmark for future learning — ZeroMQ Playlist — stardate 2012. It might be intresting to know more about ZeroMQ.
The last one left to write about is Redis PubSub. Redis Pubsub can be userful for usecases such as brodcasting one user action to multiple users.
Here is the first link that compares Redis PubSub and ZeroMQ.