You can’t stop Real-time

Pankaj Gupta
Oct 3, 2014 · 9 min read

Consumer products are often about processing and delivery of information, services or goods. The biggest “aha” for me in my last couple of years at Twitter was the disproportionate value that being real-time provides to such a product. This actually sounds really odd — even to myself now — given that I was inside Twitter, the real-time information network! I and others around me intellectually understood it, but it is another thing to see it. After conversations with many people, I claim that very few in the industry still really understand real-time’s intrinsic unfair advantage.

First, what does it mean to be real-time? Let’s look at some popular examples. In terms of products: Twitter is real-time and CNN is not, Uber is and Yellow cab is not, Hipmunk is and that neighborhood travel agent is not. In terms of technologies: Storm is real-time and Hadoop is not, the serving layer of a website is but the offline analytics layer is not, etc.

But this is too broad a brush. I claim that Twitter, Uber and Hipmunk and the on-demand instant gratification startups in vogue today are not really real-time! Wait, what? Yes, but I get ahead of myself. I need to first tell my own story — the story of a feature I came up with at Twitter called MagicRecs.

A story

In my early years at Twitter, I had worked on a recommendation product feature called Who-to-follow (then and still lovingly called WTF). It was based on an offline pipeline that recomputed recommendations for a subset of users every day, and thus for all users every few days. About a year after its launch, we started noticing that it has been doing really well for two types of users: (1) new users, and (2) those users whose recommendations had been recomputed the previous day. The increase in our metrics for these cohorts was dramatic. On digging deeper, it turned out that fresher recommendations that are based on user’s recent actions are just a lot more effective. But why new users? Well, turned out that we had special code for new users. Because we did not have any prior data for new users, we had to put in code to recompute the recommendations for them in our online serving layer itself, ensuring the use of the freshest underlying signals for them.

With that data driven insight, my team quickly worked to change the pipeline to aggressively recompute the recommendations as frequently as possible, limited only by the capacity of our Hadoop cluster. Looking around the company, it was actually not difficult to notice a pattern emerging in the backend infrastructure powering a lot of Twitter product features. System after system was finding that real-time computations were much more effective in delivering product goals and were simultaneously getting disenchanted with the latencies and best-effort-ness of Hadoop.

I had become a full convert by this time. I knew that we had to come up with a way to compute recommendations not just quickly, but in real-time. This is not only a hard technical problem but almost seemed impossible, as the computation involves pretty sophisticated graph traversals and machine learning that are simply too complicated to think about doing in real-time. But I was on a mission by now, and pressed my (excellent) team. They had big concerns. Here is my recollection of how the dialogue went between me and some of (really strong) engineers in the team.

Me — We’ve seen how great fresh recs work. We got to make them even faster, perhaps real-time…

Team — Let’s somehow try to run the Hadoop job every hour instead of every day. (also mumbling: but that will take up a lot of Hadoop capacity. We are already one of the biggest users inside Twitter.)

Me — I think we need to do this in real-time. Let’s move to a Storm based pipeline?

Team — Umm, what use is that ? These are just recommendations after all. Why do they have to be computed in real-time?

Me — Well, we have seen that the fresher the recs are, the better they perform.

Team — Sure, but may be we can try first doing it every hour. Besides, what is the point of computing a rec for someone who is not even going to see it until she visits Twitter next time, say the next morning. We will be wasting so much compute resources.

Me — You have a point there…. Oh wait! Aha! We not only need to compute these in real-time, we need to push these in real-time too ☺These can be like notifications!

Team — Wait, what! That doesn’t make sense at all. I mean even if we compute them, we can enqueue them into our email queue and send by email. What’s the hurry in a user knowing right away that a rec has been generated for him?

More arguments and discussions followed back and forth. The team had really good points. It was indeed too expensive a problem to solve without a clear product win. But I was convinced that it would be a win, and to the team’s immense credit, they were eventually game to try. We found a way to test it by creating a test Twitter account (making it initially protected to not get in the public’s eyes). Whoever followed the account would get these recs by Twitter direct messages (DMs) so it would be an opt-in feature in this testing phase. The team came up with a somewhat hacky, brute-force solution which worked fine because scalability was not yet a concern until we knew the product feature worked. I started recruiting my colleagues and one by one purely by word of mouth it started getting popular.

Somehow, Dick, Chris and the rest of the exec staff got wind of it. They started following the test account, liked it (not all of them wanted to launch it to my unpleasant surprise, but let’s leave that) and we were ready to make it a real-product. We first un-protected the Twitter account and then worked on delivering these recs as mobile push notifications (instead of by DM from an account you had to explicitly follow). The launch that happened in September 2013 opened it up to tens of millions of users. (For the technically curious, before we could do a full launch, we had to solve the problem of scalability which the team — now armed with a lot more confidence in the success of the product — did manage to crack, significantly enhancing the state of the art in real-time graph processing at scale. This was recently published as a recent paper in VLDB).

I can only say that MagicRecs has been quite successful for Twitter. The strong team behind it has taken it from strength to strength, and it has been received enthusiastically by users. The idea inspired other product features inside Twitter and seems to have at least partially inspired others in the industry as well (Nuzzel, ThinkUp, etc.). Even the phrase “MagicRecs for X” is sometimes heard. It is not hard to extrapolate how Twitter (if it wants to, as I don’t know its plans now) could take it forward in a notifications-centric mobile experience as seems to be enabled by both iOS8 and Android platforms.

Back to being real-time

MagicRecs worked especially well only because it was real-time. The median latency of delivering the recommendation after the triggering event is in seconds (some more technical details are outlined in the VLDB paper mentioned above). In the beginning of the project, I was adamant that the copy of the notification (e.g., see some here) include exactly how many seconds it took, in order to create that sense of timeliness of the notification. What was being recommended was not new content, just summarized and pushed in real-time to the user.

On the face of it, this is not new. Remember the time when you had to run a spell-checker at the end of your document versus now being told in real-time while typing? From Google to Uber, people have realized the benefits of sheer speed. Google’s findings at improving the speed of their search results page is now folklore. And that is my point — despite the need of speed being folklore, I still find big resistance to making products real-time. Perhaps this is because of an inherent bias of the curse of past knowledge in a lot of us — those on the engineering side presume that real-time processing must be hard or expensive (it generally has been) and those on the product side like to question whether something really needs to be real-time? What I have found is that it pays to give the benefit of doubt to being real-time.

The flip side of anything that is not widely understood is opportunity. We in the Internet community have clearly made a lot of strides in improving speed but it is in no sense a done deal. Some examples below: (I hope new startups will continue to work away at solving the technical and business challenges to make them a reality.)

  • Uber and Lyft give instant gratification as they try to service a request in real-time. However, I currently use them about twice a week and I think would use them twice a day if the average latency from me requesting a car to sitting in one is less than a minute and not the current ~10 minutes (averaged over many locations and times in San Francisco). Small difference you might say, but I find myself preferring driving to taking Uber (when other factors like parking are less of an issue) just because I want a predictably low time to reach a certain place. The lack of predictability forces me to add on a further 5–10 minutes to my estimate.
  • When Amazon introduced free two day delivery to Prime members, it seemed magic at the time. I don’t know Amazon Prime’s de-activation rate but I have been hooked since. I think free same day delivery is in our near-future and will have a similar effect on users in the long term. In fact, I hope for a world in which delivery is with in an hour (besides being free).
  • Talking about deliveries in hours, the other day I tried to use Postmates to order my favorite Walnut pie from a shop 15 minutes away in SF. The delivery time I was quoted was 1 hour. That is not real-time. I did not order the pie — the urge for pie would have long gone by then. Likewise, I still can not get a cup of hot coffee from my nearby favorite cafe delivered to me at home cheaply.
  • I use Hipmunk for my travel bookings, and it is of course much better than working with a travel agent. But it is still too slow (and painful) to do planning of even the most simple flight/car/hotel itineraries. My time waiting as the site contacts various airlines and tries to present me options might not sound like too much, but if you have to tweak your search multiple times, it gets annoying fast.
  • On the technical side, one example is of Hadoop/MapReduce that now seems destined to be relegated to important but somewhat specialized tasks of background model building and daily analytics and charts — and even those need to be complemented by real-time versions. Another example is of a/b testing frameworks that need to be real-time. The ones prevalent in industry today take days to get useful data and that is really too slow. As a simpler example, programmers prefer using an interpreted language for small tasks that skips compiling partially or completely. Personally, I am hoping for much higher abstractions than todays’ languages provide to allow us to decrease prototype times by an order of magnitude. Alas, too few people (if any?) seem to be working on that.

Real-time’s unfair advantage

There are many more important aspects related to real-time — noise, rawness vs relevancy, expense of computation, combination with non-real-time processing — but this blog post is already too long. There is something about real-time that appeals to us humans at a fundamental, psychological level. Indeed, economists and psychologists have systematically studied Time preferences of people. In particular they have identified a cognitive bias called Hyperbolic discounting — “Given two similar rewards, humans show a preference for one that arrives sooner rather than later.” This discount in perceived utility with increasing delay is not at a constant rate, it falls much more rapidly in the beginning.

Chris Anderson wrote in his 2009 book ‘Free’ that “If it’s digital, sooner or later it’s going to be Free” and that “You can’t stop Free”. I think we can say something similar: “If it’s going to be delivered, sooner or later it’s going to be delivered in real-time” and “You can’t stop real-time”.

(Image credit: Wikipedia)

    Pankaj Gupta

    Written by

    @Google Next Billion Users. Past: Founder/CEO Halli Labs (acquired by Google), Stayzilla, Twitter. Cofounder: Halli, @specializedtype, Phulki, Sahasra Networks.

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade