Why fast-data is suddenly everywhere.

Making a new class of applications possible for the first time in computing history

Pritam Roy
The Startup
3 min readJan 19, 2018

--

A nicer problem to have ..

A recruiter at Facebook once told me a few years ago that Facebook builds its own data and infrastructure solutions because standard solutions just don’t fit into the Facebook scale…

That statement may have been true some years ago, The Cassandra storage engine certainly began out of Facebook’s need to store high volumes of messaging data.

The challenges that only the likes of Amazon, Google and Facebook faced even a few years ago are however increasingly faced by every-day organizations and start-ups alike, that of high volumes of transactional data that actually make sense.

The practical implications of having access to fast-data are huge for any organization.. the ability to run fast-data analytics unlocks the potential for improvement in processes and directly improving their bottom-lines.

Big-Data and Smart-Data

The fast data revolution itself has been made possible by two underlying revolutions - That of Big Data and Smart Data.

Big data systems like Hadoop and HBase made it possible to store huge volumes of unstructured data on commodity hardware.
Micro-services and standardization of communication protocols meant that the data suddenly started to make sense in a sort of grander scheme of things.

Many awesome open-source projects were born out of the need to solve this very unique challenge of fast-data.

Two pieces to the puzzle - Storage and Messaging

The Apache Kafka project started at LinkedIn and the Apache Storm project started at Twitter attempted to tackle the streaming messaging queue part of the problem.

Projects like The Apache Kudu began to solve the need for a storage engine to run analytics for fast-data on commodity hardware.

Redis+Cassandra based systems have also been proposed as storage engines to run fast-data analytics on. The speed of a Redis combined with the performance of strongly consistent transactional Cassandra implementations can make processing fast data applications actually painless. The YugaByte project is a Polyglot Redis + Cassandra solution to the Fast-Data challenge.

Ushering in changes for consumers and enterprises alike

On August 25, 2008, when the Cassandra project was first announced, the release stated-

The system currently stores TB’s of indexes across a cluster of 600+ cores and 120+ TB of disk space. Performance of the system has been well within our SLA requirements and more applications are in the pipeline to use the Cassandra system as their storage engine.

These volumes which only Facebook could boast of in 2008, don’t look that daunting anymore as firms improve their IT processes and storage capabilities to handle the massive surge in fast-data.
Fast-data has suddenly also opened the road for consumer applications that once upon a time only seemed possible in the realm of science fiction and potential for enterprises to make smarter decision making systems and improve their process like never before.

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 286,184+ people.

Subscribe to receive our top stories here.

--

--

Pritam Roy
The Startup

Activist-developer, CMU Alum, Founder/Developer at Kashmere Labs