Five Facets of Flow Strategy
So, after a long hiatus while I did a cool project at AWS, I’m now back to continue my thoughts on flow architectures (defined as event-driven, asynchronous, highly adaptable and extendible, and (shiver) “serverless”). In the last year, I’ve had my eyes opened to some of the long term problems and opportunities with which we will be faced as these approaches become mainstream. While I continue to believe that flow architectures will be mainstream in the coming decade, I also realize there are significant hurdles to be overcome.
Generally, these hurdles can be classified in one or more of five categories (that I’ve identified so far): scale, agility, quality, optimization, and security. What I thought I’d do in my first post back is describe these categories for you, and provide some examples.
When most people read “scale” they immediately think of the amount of computing, storage, and so on required by a single service, application, or type of infrastructure. This is where “horizontal” and “vertical” scale is the common lexicon for ways you might scale. For example, as little as a decade ago, relational databases generally handled additional query traffic by being moved to a larger server (vertical scaling), while nowadays, most computing services — including relational databases — have some way of scaling by adding additional servers (horizontal scaling). Both are valid approaches to scaling components to meet increased demand.
However, while I include those types of scaling in this category, I am just as interested in how we manage the growing number of services, functions, applications, queues, and other infrastructure that are interconnected in flow architectures. Much like HTTP links created a web of interconnected content that spans the entire globe — and Internet — event passing across organizational boundaries in a flow application promises to create a giant, interconnected graph of system dependencies that will create systems behavior that we cannot anticipate. How do we scale our operations practices to handle this problem as it emerges?
While serverless technology providers are quick to tout the agility that function-level deployments, publish/subscribe queues, and asynchronous processing provides, there is little discussion about systemic agility — i.e. how these technologies enable the market to adapt software to economic opportunities and realities much faster (especially when combined with compute service utilities [aka cloud] and highly iterative software lifecycles [such as Agile and DevOps practices]). To me, identifying ways to quickly modify or reconfigure software to address economic opportunity, including lower costs or new revenue streams, is one of the great opportunities that will come from flow architectures.
Certainly there are existing demonstrations of this, such as continuously updating machine learning models, high speed trading algorithms, and the ability to rapidly update software on most edge devices. But I expect there will be other examples that will emerge in the next few years, such as “open” services that allow customers to modify the way the service works for them (similar to open source software, but in a cloud service), new learning models that continuously adapt to the learner’s interests and needs, and low cost financial management services that just about anyone can use. (In fact, for the latter example, see robo-advisor services, such as Schwab’s Intelligent Portfolios.)
While being able to move quickly while serving a large online audience is great — in fact a game changer for business — it’s useless if things don’t operate predictably in a timely fashion. The problems of availability, performance, and accuracy are all increasingly going to become problems subject to the “rules” of complex systems. You won’t be able to architect your way out of the hassles you will encounter…you are going to have to build systems for resilience, accountability, and adaptability.
There is so much to say about this subject, but let me start with two interesting observations. The first is one I learned from John Allspaw, who is one of the incredibly smart people in the DevOps movement. What John observed early, and what I firmly believe is one of the key lessons all IT and software organizations will have to take into the future, is that distributed systems at scale require practices that closely resemble safety practices in places like hospitals, airports, and construction projects. There is a whole science built around treating safety problems as complex systems problems (see Sidney Decker’s work, for instance), and some of the lessons learned there are counter intuitive.
The second observation is something that I’ve been preaching for a long time, but I am now seeing products and services explicitly address. In order to manage a complex system, you must be able to visualize the system, but you can only make changes through agents. We monitor at the agent (component) level well enough, but we have traditionally lacked tools for monitoring the system as a whole. (Solutions to this typically looked like Network Operations Centers or the like.) Enter the concept of “observability” and “big data” tools that work to gather as much data as possible from the system as a whole, and then find the patterns, anomalous behaviors, and feedback loops that require conscious tending by people (or automated tools).
There are other concepts, like “design for refactoring”, statistical quality controls, etc, that come into play with flow systems. Clearly, we have a tremendous amount to learn as we move into making this architecture mainstream.
One of my first jobs in a product company was in the late 1990s as a field consultant for Forté Software, a 4GL distributed systems development and operations toolset. Back then, networks were excruciatingly slow, and the cost of a network hop was something that was carefully managed. The rule at Forte was a) to reduce the number of network messages as much as possible, and b) to reduce the size of each message as much as possible. Most distributed systems developers would tell you that the same rules hold true today, though today’s networks allow to you increase both the number of required messages and what “minimal size” means for each message.
By working in terms of functions, queues, and microservices, flow architectures will greatly increase the number of software dependencies requiring interaction. The big question facing developers in this space is how to decide when two components should be deployed in the same process, on separate processes on the same device, or on separate systems requiring interaction over a network. Two things complicate this decision.
First, many of the serverless tools assume network traffic between individual components. You have limited ways to choose one of the two other options. You can cram common functions into a shared library, for instance. Or, you could abandon those tools, and explicitly build the components in a more traditional architecture (such as a Java service with an API running in a container or VM).
The second complication, however, may mean that the way you approach deployment today will be invalidated tomorrow. As our shared event-driven systems increasingly become a very large complex adaptive system, we will have unexpected dependencies on parties we know little or nothing about. The system will be constantly changing, and dependencies between components will strengthen and/or weaken over time, whether we want them to or not. How we handle refactoring our systems to adapt to dependency changes will be an interesting challenge in years to come. We are already seeing this in shops that focus on serverless development, as several have indicated to me that it is not uncommon to refactor a service from a number of Lambda functions to a Java service running in a container.
One other huge challenge as we face a growing number of interconnected components will be the ways in which we maintain control over information, and manage the security of our dependencies. How do we authenticate and authorize external components using our functions and queues? How do we secure event traffic running across multiple cloud and infrastructure providers without unnecessarily restricting usage of those events? How do we discover new attack profiles introduced by complex systems dynamics?
The market for security tools and practices has been in the spotlight for a few years now, but I think the science of security is going to evolve rapidly in the world of complex software systems, such as flow architectures. Not that we ever necessarily could, but we certainly can’t secure systems “top down” or “outside in” with events. This is going to take a systems approach to security that may include new forms of observability, analysis, and responsive agents. There is no silver bullet here. It’s a huge problem…and a huge opportunity.
We have time on our side
The good news here is that I doubt flow architectures will dominate the landscape for a number of years. The market, quite frankly, for tools that handle serverless at great scale is still fairly small, as far as I can tell. So we have time. But serverless adoption, and the introduction and adoption of other flow approaches, will happen rapidly thanks to the agility, economies of scale, and interconnectedness it offers.
As we build event-driven, asynchronous systems, we should pay attention to these five facets of flow architectures that need solutions. Somewhere out there are a number of entrepreneurs, pioneering engineers, and even (un)lucky engineers that will discover solutions we can’t possibly anticipate today. My hope is that anticipating needs will lead to faster identification of quality solutions.
Agree? Disagree? I write to learn, so please provide feedback in the comments below, or on Twitter, where I am @jamesurquhart.