The data flow value chain
In my last post, I highlighted the basic architecture of data flow, describing the “50 thousand foot view” of what data flow is from a technology perspective. I started with this, in part, because the core model (known by many as “stream processing”) is increasingly well understood. The work to define what the “user need” is has already been done by those who had that need and were forced to address it on their own.
In this post, I want to take that basic model (illustrated below) and begin to explore a value chain that satisfies the need that is implicitly expressed by the streaming architecture.
Basic needs of stream processing
So what are the components that deliver on that user need? Well, I’ll argue that the immediate needs are as follows:
Let’s break this down:
- Real-time data. Of course, the existence of real-time business automation depends first and foremost on the existence of real-time data representing the ever changing state of the business as a whole. Not only transactions, but inventory, contracts, employee skills, cash flow, etc. represent key elements of representing the business digitally. Each business will have a somewhat different data inventory, but also has a significant amount of digital state data already.
- Fast processing. In order to handle the incredibly vast amount of data required to represent all of these elements of business in a timely manner, there needs to be a sufficiently fast and scalable processing model that can support it. This processing would apply to a wide variety of needs, such as data transformation, security (e.g. encryption/decryption), anomaly detection, and much, much more.
- Data distribution. In order for this model to work, data must be easily (and quickly) distributed to all that may rely on it. Interestingly, this means the distribution mechanism must also be the source of record for the history of these events. So, you need a way to capture all incoming events (most of which contain data) and allow other systems to grab the data they need in a way that the sequence of events can be determined and maintained.
- Applications and Services. In addition to the existence of data, data flow systems wouldn’t exist without actors to create and/or consume that data. These are the applications and services reliant on timely data distribution to complete their appointed tasks. In some cases, services might be nearly indistinguishable from the basic processing tasks described above, but often these software components will have a variety of technologies, architectures and granularity. The primary difference between “applications” and “services”, in my mind, is the level of direct interaction with human users.
- Streaming Integration. It is perhaps stretching things a little bit to call out integration functionality from basic “fast processing”, but I do because many existing systems treat major “built-in” integrations as something different than functional processing. Streaming integration is how the streaming mechanism and applications/services can quickly and safely exchange data and events as required. Often, this might involve such activities as protocol mapping, authentication/authorization handshakes, and error handling.
- Digital Operations Capability. Where ever software executes, there is a need to operate that software. This model of stream processing creates a new distribution of operations responsibility, and with it a new challenge to the tools, practices and scale of existing operations tools. As you will see, we are talking about an increasingly complex systems model, and we need capabilities that take that into account.
This is just my rough stab at the needs that are implicitly defined in the model depicted above. You may disagree with parts or all of it. If so, I’d love your feedback. But, for now and for me, it remains a consistent and workable model to take the next step: breaking down the user need into component parts.
Beginning a value chain for data flow
What is the user need? What is the “Job To Be Done”? In my opinion, it is something I like to call “real time business automation”, a term that comes from my belief that the entire reason for the existence of most enterprise software is the automation of our economy. Or at least a part of it. (Market systems, software supporting treasury functions, etc, also play a key role.) We build software to increase the productivity of the enterprise in key business operations.
Taking advantage of new technology models, such as cloud computing and log streams, to make as much of this automation as possible “real time” is the “Job To Be Done”.
So, let’s use the needs model we listed above as a starting point:
Pretty straight forward, right? OK, now let’s break down each piece.
Data Distribution and Real Time Data
I’m going to start by analyzing the real time data and the data distribution as one sub-component. This is because real-time data is a basic element — a unit of work within the system, and is the basis of what is being distributed.
Here’s the next step in our value chain, as I see it:
Data distribution encapsulates the ability required to distribute data, but is worthless without data and a way to capture that data. And none of that can happen without a compute utility on which to execute it all.
Distributed Operations Capability
Another quick and dirty value chain revolves around how we operate the entire system. And, as in all operation systems there are basic components:
- Monitoring to capture data about what is happening in the system
- Visualization to allow humans to interpret that data
- Policies that define what is acceptable operations state
- Alerting to raise the alarm when that state is violated
Applications and Services
The detailed needs chain for all forms of applications and services is probably immense, so I’m just going to use an extremely simplified representation here — hopefully what is important to this analysis and no more.
This is a great example, however, of how the “map is not the territory”. As swardley and others will quickly tell you, the map you build should be applicable to the need that drives the map. In another context, I’d probably draw this portion of a needs map much differently:
Here, I’m probably cheating a bit by assuming functions are the way to go, but I think it could be argued that most of what this form of processing is — defining functions that do specific tasks (like transformation, et al).
The really interesting thing to me here is the dependencies. I’ve added a new need to data distribution, for example, as it needs functions to get the data in a form that it can be intelligently distributed (at minimum, a message in a topic). Also, a functional platform doesn’t get rid of any of the compute and storage components we use in other models. It just hides them (very effectively, I might add).
The final element to add to the value chain is one of the easiest to diagram, but also one that I struggled with quite a bit. Is it identical to the functional platform need? How does it differ?
In the end, I sort of punted and defined it as a functional need that also requires protocols with which to integrate. The latter is what I’m really interested in. How will integration protocols change (if at all) as data moves constantly and in real time? I’m betting we’ll see some interesting work here. The Open Service Broker is one example, though its more of a handshake protocol than a data protocol.
Where to go from here
So, there you have it. A value chain for the key components that deliver on real time business automation at scale. At least as suggested by the architecture that is quickly gaining acceptance today. Will this model stand the test of time? Most definitely not. Is it a good starting point to explore evolutionary forces that will create opportunity in this space? Good enough.
That’s what we’ll do in the next post. Begin to map this to the technologies that are coming into play to meet these needs, where those technologies are in the evolutionary process, and where gameplay suggests opportunities to take advantage of the current state of the market.
In the meantime, feel free to build on this, tear it apart, ponder new ways to address the problem, etc. All I ask is that you consider providing me with feedback, preferably through a response below (because this will probably take more than 140 characters at a time). However, as always, I can be found on Twitter as @jamesurquhart.