Business Intelligence Now Streaming Live!
Manish Sharma, Spandan Singh
When I started out in the business of technology in late 90, my first stop was business intelligence, so even before we had any data in India I was preaching Business Intelligence. Everyone was intrigued and nobody was interested. The market for Business Intelligence simply did not exist or nobody cared about it— save for one project that we sold to Tata Steel -A Data Mining Project — something, even they did not believe in. But it was hugely successful and they even wrote a case study. Apart from that we had almost no success in convincing the customers about the importance of Business Intelligence.
Cut to 2017 and Business Intelligence or BI is a hot hot thing which is driving people out of jobs and companies out of business. Of course BI now has a new name — Analytics and Data Mining is now called Machine Learning. Even though basics have not changed BI or Analytics is now a completely new animal. It is no longer that smelly pond of data that you have to exhume to find meaning — its now a running stream of data that needs to be analyzed and acted upon in real time to make your business better.
Its also the thing which stands between your organization and success.
At 99roomz we built our website and Android App and started advertising on Google search and Facebook and we had spent quite a bit of money without much success (ok no success, literally no success, we got no customer - zero, zilch). We had Google analytics, MixPanel and KissMetrics embedded on our site and we tried to find meaning. Alas there was no meaning to be found, every single analysis meant sitting in a library and doing cross referencing on what meant what and what not. The Aha moment of our company had arrived — we needed analytics more than we needed the website or the app. We needed to know what people are doing, when they are doing and hopefully why they are doing. Or we could just go on spending money on Google without knowing what is happening to that money.
So for even a startup to be successful it needed analytics from the word go. This was a new thing for me. This was also a moment of truth — analytics was now core of every business, new and old, small and big. For a startup it was even more important to invest in analytics, if they had to beat or compete with more established player.
Aha moments mean nothing, the idea needs execution.
I started pottering around to see what technical advancements have been made in the field of analytics, I started with Big Data, the most common and pervasive word in analytics that you can find. It permeates the internet like sand pervades the pores of those who have sex on the beach. I started with Big Data and promptly ended up against a big wall. Big Data actually meant nothing. Zilch — Zero. Everything was big data and nothing was big data.
I figured, it must be one of those words invented by speaking consultants who are forever looking to make money by spouting one word solution to your problems. “See what you need is Big Data” — “it will cure your company of your bad management, bad managers and stupid CEO Gini.” Everybody would be happy and then after six months discover- Big Data did not mean anything. It was not a cure — heck it was not even quackery.
Big Data was both the solution and not the solution. What we needed was technology that would help us live stream the happenings on the website straight to us and would allow us to manipulate the behavior of the player in real time. Think of it like watching cricket — live and manipulating the shots of the batsman by manipulating the bowling without being on the field. Hmm, complex, complicated — yes much — but not as much as the array of technologies available to make that happen. So we looked at Hadoop, MySQL, MongoDB, Apache Spark, Apache Storm, Apache Kafka, Amazon Kinesis, RabbitMQ, Druid among other jumble soup of “big data” products.
So now we had a problem and an array of products to solve the problem, Goody Good, how do we choose from the many products to make the solution. It took more than a month to even figure out what product does what and how it can be useful or not useful to our purpose.
The problem was now designing the architecture for live streaming analytics.
After much deliberation and trials we finally zeroed in on the probable solution to our problem — NodeJS + Kafka + Druid + Spark + Dashboard. NodeJS served as the point to do all our processing for us including pulling data and doing manipulations if required, Kafka served as real time stream of data quite literally, Druid was our real time analytics server, Spark served as our data lake from where we could run machine learning algorithms and Dashboard was the visual representation -so called TV of our live streaming app.
This is just the beginning of our analytics solution, as we discover more ways in which we can enrich our incoming data so that our information is no longer just — incoming website visitor visiting page xx on our website, but Male from Delhi, is visiting our property in Dehradun, which has three bedrooms, 2 baths and is rated 5 by guests. This kind of enrichment has made our analytics much better and in near real time, enabling us to take better and faster decisions.
Only Big Data could have made this possible!