Minimum Viral Product: Activity Stream and Message Queues

Published in

Naval’s Products, Engineering and Startups Blog

4 min readDec 3, 2014

REWQ.co : Linkedin for learning

Starting Question: How do I efficiently implement activity streams on REWQ.co?

Activity streams are database intensive operations. They can get quite complex — I found some good videos from Twitter engineering . It was a good introduction to what this beast means.

Activity streams can be of two types,

Whats there on my timeline?
Whats happening on my network?

The first is an easier problem to solve and thats what we do in an MVP.

Thought: Building an activity stream can make your Minimum Viable Product into a Minimum Viral Product. Can you make your users look cool — with the activity streams that they have created?

Background & challenge: We use MongoDB as our database on REWQ. Assuming the writes are slow (not really experienced them), we should do the heavy lifting work on a different server. This is where Messaging Queues come in.

Messaging Queues are reliable ways to instruct “the helper” servers to accomplish an intensive task on the background. In our case, we have two main tasks running in the background (1) a link scrapper (network intensive) (2) an activity page builder (database write intensive) .

The introduction to message queues videos have been quite boring. I got two cans of beer.

So some message queue options available are :-

RabitMQ — started way back in 2008, by JP Morgan Bank
Kafka — open sourced by LinkedIn
ZeroMQ — for simple messaging needs
Redis PubSub — cloud9 mentioned it as a mechanism to decouple code in a presentation about scaling nodejs apps

What would make one of them relevant for my use-case would be?

Light weight — no extra overheads on CPU or memory
Something that lasts me till my first 100K users
Stable client library for NodeJS (my programming language). Well written — allow me flexibility to shift from one messaging system to other.
Reliability requirements — should the broker be persistent or not.

What amount of data will I be producing?

Not much. A 100K users reading 3 links each day leads to 300K messages everyday. Lets say 60% of the traffic is in USA, spread over some 4 hours — thats probably our darkest time for now.

Messages/sec at peak = 60% of 300K / 4 hours = 12.5 Messages / Second

Okay, just to compare here, RabbitMQ handles 20K+ messages/second and Kafka handles 100K+ messages/second .

The Plan

Plan A) Use Redis Pubsub to send activity messages to the client that does heavy lifting. This is a lot like using Redis to collect log/analytics data.

Plan B) I plan to use Redis and ZeroMQ to form a layer of cache and a networking library to talk between the server that does the heavy weight work on the activity streams.

I am tilted towards plan A because it looks simpler.

Other conclusions of the Study

One of the other conclusions is that some of the technologies for scaling websites still evade me — for example blogs about scaling Twitter, or making Disqus realtime etc on highscalablity are not easy to get a full import of. But I don’t expect to be needing 10+ servers anytime soon.

References:-

(1) Links on Heroku.

Stardate: Recent … Nov, 2014

Building a Real-time, Polyglot Application with Node.js, Ruby, MongoDB and Socket.IO

Describes a scenario where Ruby app writes data safely into Mongodb. Separately, a consumer continuously streams data from MongoDb and pushes it to the client (which is a browser). This use-case seems too simple for my scenario.

Next, I would like to see something which talks about user facing server communicating with a backend server. The backend server should store the particular messages. Redis can be implemented as a cache.

(2) Comparing RabbitMQ, Kafka, ZeroMQ and Redis

Leaving out SMQ-AWS (simple message queue) as I am looking to integrate one on my own.

RabbitMQ vs Kafka: which one for durable messaging with good query features? - Quora

Answer 1 of 7: Kafka is a general purpose message broker, like RabbItMQ, with similar distributed deployment goals, but…

www.quora.com

Who Uses Kafka?

@VisualDNA We use Kafka 1. as an infrastructure that helps us bring continuously the tracking events from various…

cwiki.apache.org

Seems like Kafka is better suited for Social and consumer applications. RabbitMQ looks like is more used in the enterprise. An MVP does not really need a Kafka. Lets look at ZeroMQ and Redis-PubSub.

We are still left to explore ZeroMQ and Redis-PubSub, in search for a messaging and job queue system for REWQ.

The ZeroMQ evangelizers create a feeling of something out of the underground. Though I like it because it looks simple. I can use it from a NodeJS client library without making any changes. Theres one producer and one consumer. I can use it to talk between the frontend server and server storing activity streams; and in addition, I can use a Redis to cache data.

Some links on ZeroMQ and Redis.

JustinTulloss/zeromq.node

zeromq.node - Node.js bindings to the zeromq library

github.com

This is my bookmark for future learning — ZeroMQ Playlist — stardate 2012. It might be intresting to know more about ZeroMQ.

The last one left to write about is Redis PubSub. Redis Pubsub can be userful for usecases such as brodcasting one user action to multiple users.

Here is the first link that compares Redis PubSub and ZeroMQ.

A Tale of Two Queues

February 23, 2013 / Home I've been playing around with Publish/Subscribe queues (or pub-sub queues) for the last few…

blog.jupo.org

zmq vs redis for pub-sub pattern

I have worked with both ZeroMQ and Redis with python. I would say ZeroMQ is more robust, it offers real simple load…

stackoverflow.com

Redis Cluster vs ZeroMQ in Pub/Sub, for horizontally scaled distributed systems

If I were to design a huge distributed system whose throughput should scale linearly with the number of subscribers and…