The Quick and the Dead

The digital arena is becoming deadly — you need to pick the right data weapons if you want to survive.

Matthew G. Johnson
DataSeries
5 min readApr 20, 2019

--

Competition amongst companies on the digital battlefield is becoming fierce. Survival depends upon choosing the right weapons to manage your data. Once the abandoned love-child of business and technology, data has grown up to become the smartest and richest kid on the block. However, like many precocious kids, he is near impossible to manage.

© 2015 Thom Ross, Gunfight at the O.K. Corral

But before we take up the question of how to manage data, let us pause to imagine a moment in 1881 on Fremont Street in Tombstone, Arizona. Six doors west of the O.K. Corral, we see Marshal Virgil Earp, his two brothers and Doc Holiday confronting five cowboys led by Billy Clanton. The feud between the men has been building for years and the situation is desperate. Whether each man decides to run or to shoot it out, his survival depends on two skills:

  • a quick draw — shooting or moving faster than his adversary;
  • a straight shot — landing a bullet on his adversary or choosing the right path to escape.

While a digital enterprise from Santa Clara in Silicon Valley, is far removed from a gunslinger from Tombstone, Arizona the strategy for survival has many similarities. The digital enterprise needs to interact with customers in real-time as they make transactions, request bookings and seek guidance. Moreover, the enterprise needs to understand and accurately address customer needs. A slow or poor response will likely prompt customers to defect. Whether for the gunslinger or the digital enterprise, responding both quickly and optimally is essential for survival.

In Tombstone, this requires lightning reflexes and hours spent shooting practice targets. Whereas in Santa Clara, real-time data streams and machine learning supported by big data take their place. Whether in Tombstone or Santa Clara, survival depends on an effective nervous system, which can put the right data, in the right place at the right time.

The requirement for a nervous system in computers arose, as in animals, during the evolution from single-celled to multi-celled systems. Between the 1960’s and the 1980’s, many large enterprises moved from having a single mainframe computer to a large network of computers. As this happened, the centralised data store was quickly superseded by a patchwork of disconnected data islands. The wide range of technologies and data standards in each island made it extremely challenging to connect them effectively. For the next three decades, generation after generation of messiahs offered to lead enterprises to the promised land of data harmony and integration. Some provided critical life-support for ailing data architectures, but none delivered the integration nirvana they promised. However, numerous and successive failures provided essential lessons to guide the development of the digital nervous system. These can be summarised in two key narratives:

(1) Data is more important than code
Many of the early integration platforms extended the native programming paradigm, whether procedural or object-oriented, from an individual program to the network. Examples such as RPC, CORBA and SOAP provided an elegant solution to the network programming problem, but did not solve the long-term data challenge. If your problem required a data event reaching more that one recipient, having more than one use, or having differing usage schedules, then you were out of luck. What was needed was a more versatile data-centric approach, which emphasised the value of knowledge over the imperative of immediate action.

(2) No single computer is ever big enough
As the data-centric approach gained prominence, solutions like MQ, JMS and AMQP emerged. These resolved most of the limitations of the program-centric approach, supporting unlimited recipients, unlimited purposes and more flexible usage schedules. However, they all had one specific flaw, they were limited by the size of the computer on which they were hosted. As a result, many organisations deployed many separate data integration platforms, some for small time-sensitive data and others for large slow batch-data. This approach worked, but not well.

Historically, manufacturers solved the problem of limited capacity by building faster and larger servers: an approach which is now called vertical scaling. However, hyperscaled digital companies like Google, Facebook and LinkedIn simply could not find any computers big enough to address their needs. Their solution, which we now call horizontal scaling, was to have clusters of smaller computers work together and act as one. Horizontal scaling was an enormous technical challenge, but solving it allowed them to reduce cost, increase resilience and most importantly provide near limitless scalability.

The last piece of the puzzle had finally fit in to place, and in January 2011 LinkedIn released Kafka, the first streaming data platform, to the open source community. The platform included four key elements:

  • A data-centric model delivering series of real-time data events between producers and consumers
  • Data events transmitted between an indefinite number of producers and consumers
  • Extended data history allowing consumers to run independently with different speeds and schedules.
  • Horizontal scaling providing resilience and extreme scalability

Kafka development was led by Confluent and was quickly adopted by big data platforms, Cloudera and Hortonworks, while cloud platforms created their own versions such as Amazon Kinesis, Google Pub/Sub and Azure Event Hub. Adoption continues to grow exponentially.

Digital enterprises could finally deliver both real-time and large volume data anywhere in their organisation with a single platform. For the first time, digital enterprises had access to an effective digital nervous system which could put the right data, in the right place, at the right time.

Now it is time to return our attention to the fierce competition developing on the digital battlefield. We know that the leading digital enterprises are implementing both streaming data and machine learning platforms. We know that these will provide the digital nervous system and digital intelligence necessary to give them a quick draw and a straight shot. If you are facing one of these formidable digital gunslingers, you had better come prepared. You might implement a streaming data platform, or you might prepare to go the same way as poor Billy Clanton, who ended up in a wooden box on the front page of the San Francisco Exchange. You’ve got to ask yourself one question, “Do I feel lucky?”. Well, do ya?

© 1971, Warner Bros., Dirty Harry

Acknowledgements

This story was inspired by a conversation with Unni and Fred and guided by boundless inspiration and thoughtful critiques from Ravi: many thanks to you all.

--

--

Matthew G. Johnson
DataSeries

I am an informatician, fine arts photographer and writer who is fascinated by AI, dance and all things creative. https://photo.mgj.org