This is a healthcare and technology story, but it begins, of all places, at LinkedIn. By the late 2000s, LinkedIn had become the country’s leading online professional network. It had more than 50 million users and was sitting on a treasure trove of data, including the educational backgrounds, career histories, and interpersonal relationships of many highly respected and sought-after industry leaders. Like most online companies at the time, LinkedIn wrote all this data into a traditional database. When a user did a search on LinkedIn, the application would query the database and display the results. That should certainly have been good enough for a directory. The thing was that LinkedIn aspired to be something far more than a directory.
The Big Realization
LinkedIn realized that being the place where professionals congregated empowered it to pursue a range of value-added services beyond mere online advertising, such as career building, recruiting, and content. The key to pursuing these services was to build an intelligence layer atop all the data that LinkedIn was collecting. It turned out that profiles and resumes — the stuff we think of when we think of LinkedIn — were actually a very small portion of the data LinkedIn was collecting. Most of the data had to do with user activity on the platform — the clicks, the searches, the comments, the messages, and the views. And this data was not only massive but continually growing. After all, more than fifty million people were continuously using the site, and millions more were signing up each year. The company’s systems were literally drowning in a flood of data.
Why did LinkedIn need to record all this data in the first place? Why not just stick to profiles and career histories? The simplest way to understand the importance of such data is to consider a real-life example. Imagine two professionals: Aaron and Zoe. Both graduated from Emory University and have spent the last five years working for Coca-Cola’s Atlanta office in the marketing department. Based on their profiles, Aaron and Zoe look the same. However, imagine we also know that Zoe has spent the past month logging onto LinkedIn and looking at job postings in New York. If LinkedIn did not collect information about user activity on the platform, it would not realize that Zoe was looking for a job in another city and it would be unable to offer job search services to Zoe or to New York recruiters looking for professionals like Zoe.
Another way to think about it is that while profiles and resumes tell you where someone has been, it is online behavior and activity that tell you where someone is going. In turn, knowing where someone is going is extremely valuable from the standpoint of providing targeted marketing and value-added services. So, the business case for collecting this data was clear. The challenge was that traditional databases had been built for storing state data, meaning data that does not change frequently — data like profiles and resumes. What LinkedIn needed was a way to capture all the activity data being generated by its tens of millions of users, to feed this data to services that needed it, and to do all this in real-time.
The Innovative Solution
At first, LinkedIn tried to solve this problem with a traditional service-oriented architecture (SOA) approach. They built APIs for their data stores that allowed other systems in their infrastructure to request data from those data stores. The problem with this approach was that the systems that needed the data had no idea when the data changed. One way to address this was to push out notifications every time the data changed, but when the data was changing all the time and there was so much of it, this would result in volumes of web services calls between the data stores and the consuming systems that were unsustainable from either a reliability or a speed perspective.
Again, a real-world example can help illustrate the problem. Imagine that the LinkedIn data store is a guy named Larry with an excellent memory. He works with Rachel, who handles recruiting, with Mary who handles marketing, with Sage who handles messaging, and with Steve who handles security. When Larry receives a piece of information that is related to recruiting, such as a user looking at a job posting, he sends an IM to Rachel. When he gets a piece of information related to marketing, such as a user clicking on an ad, he sends an IM to Mary. And when Larry gets a piece of information related to messaging, such as a new message being sent, or to security, such as multiple failed login attempts, he sends an IM to Sage or Steve, respectively. In turn, his co-workers reply to his IMs by asking Larry to provide more detail about what happened. Which job posting did the user look at? What ad did the user click on? Who did the user send the message to? What IP did the user try to login from? As you can see, all this back-and-forth is highly inefficient and if Rachel goes to lunch or Steve takes a coffee break, any possibility of capturing everything in real-time completely breaks down.
The LinkedIn engineering team realized that instead of Larry having separate conversations with Rachel, Mary, Sage, and Steve what they needed was a way for Larry to simply write down everything that happened in a running log and for his colleagues to look at that log and read the portions that they cared about. They looked at it like this: Larry is a producer of data and his colleagues are consumers of data. We can classify all the data that Larry logs by topic and then Rachel, Mary, Sage, and Steve could simply subscribe to the topics they care about. This eliminates the back-and-forth between Larry (data producer) and his colleagues (data consumers). It also allows the data consumers to subscribe to overlapping topics. For example, both Rachel (recruiting) and Mary (marketing) would be interested in knowing that a user clicked on a job ad, so they could both subscribe to the “job ad” topic.
Using this approach, LinkedIn successfully built a system that enabled it to process massive amounts of data in real-time and deliver a personalized experience to its now more than 300 million users. LinkedIn called the system Kafka and, in 2011, made it available to the open-source engineering community via the non-profit Apache Software Foundation. Kafka represented a quantum leap for massive data stream management and was soon used across a range of industries. Massive e-commerce sites, like Walmart.com, were using Kafka to manage millions of pieces of inventory in real-time. Gig economy juggernauts, like Uber and Lyft, were using Kafka to track millions of cars on the roads. Digital media powerhouses, like Netflix and Pinterest, were using Kafka to track and instantly react to user actions.
The Conceptual Shift
While Kafka was a transformative piece of technology, perhaps the most lasting contribution made by LinkedIn was conceptual. Its approach to building Kafka showed that, for many technology-enabled industries, the most valuable information is based not on state data, but rather on event data. LinkedIn learned that the most actionable information about Zoe was not that she was working at Coca-Cola in Atlanta, but that she was actively looking for a new job in New York. Similarly, Walmart learned that the most actionable information about its inventory was not that it had a certain number of widgets, but that it could determine the location and destination of each widget in real-time. Uber and Lyft realized that what mattered was not just how many cars they had on the road, but where each of these cars is located at any given moment and where it is headed. And Pinterest and Netflix learned that what a user was doing on their platforms at any given moment was often more actionable than their previous history of streaming and posting.
Thus, event data is critical for real-time decision making. In any business case where we must respond to how something changed rather than what it is, we must get proficient at processing event data. Kafka and other event-processing systems have powered a revolution in managing large-scale data streams and we have felt this revolution. The sites where we shop, ship, stream, study, and socialize have started to feel like they are responding to our every action and click, sometimes even anticipating our intentions. Yet while everything about our lives is seemingly turned into actionable intelligence to speed us to our destination, there is one enormous piece that is missing, and it is arguably the one piece that has the most impact on our happiness.
This is where the story turns because the missing piece, as you have likely guessed, is our health. The human body is a highly complex piece of machinery that is constantly putting out event data that can provide actionable insights. If processed early enough, this data can be used to address issues before they turn into problems that deposit us at the doctor’s office or the emergency room. And when we do see a doctor, what happens? The doctor tries to emulate a data stream processor, reviewing your vital signs, symptoms, and systems. The only other data the doctor can fall back on is the stale state data from your last visit, which may have been years in the past. The event stream of how your body changed in the intervening months, weeks and days is a black box.
The doctor only seeing you when you feel sick is like LinkedIn only seeing when a user has already changed jobs, or Uber only learning where its cars are located after they have missed a pick-up, or Walmart only discovering that it is running short on widgets after thousands of customer orders have gone unfulfilled. Of course, even if we could record all our vitals, symptoms, and systems performance by wearing sensors every day, that would be way too much data for a doctor to review. On the other hand, there is no technical reason why we could not couple sensors with an event-stream processing system that looks for patterns and creates alerts the moment we start tacking toward a health problem.
The Impact for Insurers
Our health is not only a problem for each of us individually, but also a macro problem for the health insurers and managed care organizations that have a vested financial interest in keeping us healthy. While you and I are primarily occupied with our own career trajectories, LinkedIn is equally interested in the career trajectories of me, you, and every other of its three hundred plus million users. Similarly, large healthcare payers must find a way to effectively manage the health trajectories of each of its millions of enrollees.
Advances in consumer electronics have made an unprecedented level of data streams available. For example, the latest Apple Watch supports not only activity tracking for walking, sleep, and exercise, but also heart rate monitoring and fall detection. The challenge for healthcare payers is to acquire or build systems that can take all this data, combine it with provider-based claims and contact data, and drive actionable, real-time insights.
Historically, health insurers have built their business around state data rather than event data. The traditional insurance business is based on actuarial tables, which provide statistics related to the incidence of covered events within a population. In turn, actuarial tables are compiled based on decades of historical data, which does not change day to day, week to week, or even month to month. On the spectrum of state data versus event data, actuarial tables are about as state as state can get. Using actuarial data, health insurers set coverages and premiums at a level that is more than sufficient to cover the expected costs of healthcare services for their enrolled population. Given this approach, why would insurers be interested in event data?
The answer has to do with the rise of value-based care (VBC). In value-based models, managed care organizations, many of which include health insurers, are given set premiums and service levels for each enrollee from an upstream payer, such as a Medicare or Medicaid program. In VBC contracts, insurers cannot simply increase premiums or reduce services when healthcare costs rise, so they are left with only one option: keep enrollees healthier. And how do you keep millions of enrollees healthier? By figuring out what is happening to their health between all those doctor and hospital visits and identifying opportunities to catch health problems before those visits become necessary in the first place. And as we have already seen, event data is the key to making this happen.
The Holy Grail
When I was in grade school, I remember seeing a movie called “The Indian in the Cupboard”. It was about a young boy who was gifted a wooden cupboard that has a very special quality: any toy placed in the cupboard was magically brought to life. The boy discovers this property of the cupboard when he accidentally brings to life the figurine of a Native American warrior named Little Bear. When Little Bear is accidentally hurt, the boy brings to life another figurine — a First World War British Army medic — to treat Little Bear.
During my years working with healthcare companies I have often had occasion to reflect on that movie as many healthcare initiatives are really about finding a way to have a tiny doctor placed in every person’s pocket that would be able to proactively monitor, educate, and guide the person before an issue could become so serious that it required hospitalization or surgical intervention. Today, we find ourselves in a world where a different kind of intelligence lives in every person’s pocket: the smartphone.
The smartphone provides a hub for sensors, such as smartwatches and wearables, that are becoming progressively sophisticated at tracking our biometrics. We are already recording our steps, our sleep, our exercise, and our heart rate. Before long, we will be able to record the equivalent of the clicks, the searches, the messages and the views being performed by our organ systems. This will result in a data stream even more massive than what LinkedIn, Walmart, or Netflix have to contend with because it will be happening continuously, and not just when we are using a particular website or app.
This is precisely why breakthroughs around large-scale event processing are so critical. We are on the verge of a quantum leap that will take us from tracking health data once every few months or years to tracking it every second. Taking this information, processing it in real-time, and collating it with other data streams will provide actionable insights at an unprecedented scale and empower us to catch health issues when they are still germinating seeds rather than full brown redwoods.
Managed care organizations that have adopted value-based care models are optimally positioned to build this end-to-end technological ecosystem. It will require the innovative combination of a stack of technologies, including event stream processing, analytics, and machine learning at the level of managing the population coupled with care management and engagement technologies to drive change at the level of the individual patient. The exciting part is that all these technologies already exist, along with the financial, regulatory, and, most importantly, social incentives to make it a reality. The race is on for the holy grail of healthcare: an AI doctor in the pocket of every patient.