Software Architecture for the Cloud

How to Make Implementing Cloud-Native Applications Easier

Dick Dowdell
Nerd For Tech
13 min readFeb 11, 2021

--

From the first programming languages, to advanced programming paradigms, to the development of virtualization and cloud infrastructure, the history of computer science is the history of the evolution of abstractions that hide complexity and empower the creation of ever more sophisticated applications.

The development of reliable, scalable applications for the cloud seems to be far harder than it ought to be.

In recent years, containers and container orchestrators like Kubernetes have proved to be an important abstraction that dramatically simplifies the development of reliable, scalable distributed systems. Though containers and container orchestration are just now entering the mainstream, they are already enabling developers to build and deploy applications with speed, agility, and reliability that would have been unimaginable only a few years ago — but these technologies are still new and software architects and engineers are still working to understand how to best apply them.

How do we make the real power of cloud computing available, manageable, and affordable for companies wishing to succeed at digital transformation? How do we facilitate that transformation while both managing and mitigating risk? Cloud-native applications can exploit the automated deployment, scaling, reliability, and fail-over capabilities available with the cloud. However, the old patterns of application architecture, development, and deployment — and the old methods of data organization and access — do not deliver the levels of reliability, performance, scalability, and fault tolerance we really want.

The great hockey player,Wayne Gretzky, once said, “I skate to where the puck is going to be, not where it has been.” Exploiting the real potential of the cloud requires us to skate to where the puck is going to be.

This article focuses on the distributed data management challenges of cloud-deployed applications, and not the UI/UX aspects — because the technologies and techniques for delivering UI/UX functionality to browsers and mobile devices are well understood, have an experienced developer base, and differ only little between traditional client-server and cloud-deployed applications.

To start, let’s clearly define what we mean by cloud-native applications and the requirements we expect them to satisfy. In this context, a cloud-native application is a collection of small, independent, and loosely coupled services and resources implemented to execute or enable some business function and be deployable in a cloud environment. For this article, services are executable components of an application and resources are things that reside in non-volatile storage like files, key-value stores, and databases.

Cloud-Native Software

If an application is “cloud-native”, it is specifically implemented to provide a consistent development, deployment, automated management, and communications model across private, public, and hybrid clouds. It is designed to exploit the automated deployment, scaling, reliability, and failover capabilities available through containers and container orchestration. For maximum deployment flexibility, it avoids dependence upon technologies that are proprietary to specific commercial cloud service vendors — except when such technologies are crucial to meeting the application’s functional objectives. The primary requirements that need to be met for most successful cloud-native application implementations and deployments are detailed in Section A, below.

Section B describes the five most common architectural patterns in use today when designing software applications. They each embody real-world experiences in the implementation of software — and, as with all software architectures, they each represent a mix of compromises. Each has its own strengths and weaknesses when applied to the design and implementation of cloud-native systems. Our challenge is to select the best attributes of each with which to synthesize an effective cloud-native architectural pattern for using and managing distributed data resources — and to interact with the modern UI/UX technologies necessary for complete applications.

Where Do We Find Our New Architectural Patterns?

The elements and patterns presented in this article are tried and proven and are the result of experience in the design and implementation of commercial software systems — for distributing data and processing across networks — beginning in the 1980s and continuing through the present time. Much of the technical inspiration for these solutions is owed to Alan Kay, Michael Stonebraker, and Werner Vogels, whose contributions have literally changed the way we implement and use software today — Alan Kay for understanding the power of messaging in software organization, Michael Stonebraker for his comprehensive vision of data management, and Werner Vogels for identifying and describing the underlying principles of practical cloud computing. There are a number of concepts that are extremely useful when creating an architectural pattern for cloud-native applications.

The Actor Model

The actor in computer science is a model of concurrent computation that treats actors as the universal primitive of concurrent computation. Actors communicate by passing messages. In response to a message, an actor does its job. Actors may modify their own private state, but can only affect other actors indirectly through messaging. Actors are reentrant and thread-safe. Actors are an appropriate model for microservices design.

An actor has a simple job:

  • Execute logic and read/write persistent data using persistence services.
  • Receive messages from other actors.
  • Send messages to other actors.

Actors can be many things, including microservices, microservice clients, event publishers, event handlers, message orchestrators, message loggers, event loggers, and error handlers.

Using actors, communicating through message passing, as the basic building blocks of cloud-native applications simplifies the execution of networked concurrent processing and:

  • Enforces encapsulation without resorting to locks.
  • Uses a model of cooperative actors reacting to messages, changing state, and sending messages to each other to implement application functionality.
  • Can support both synchronous (request-response) and asynchronous (event) messaging (in common practice, actors are usually asynchronous, but they can easily send messages that are responses and so implement synchronous behavior).

Message Passing

Patterns like the Actor Model communicate through message passing. Message passing is a technique for invoking behavior by another actor. The invoking program sends a message to a process and relies on that process and its supporting infrastructure to then select and execute the appropriate logic. Both asynchronous (event) messaging and synchronous (request-response) messaging can be implemented — giving application developers leverage to optimize communications for specific use cases and performance objectives — all within a common unifying framework. As a basic rule, the far more efficient asynchronous messaging should be the default messaging choice and synchronous messaging used only when a sender must wait for a response before proceeding. Message passing also makes it easier to exploit modern stream handling technologies.

Message passing implements loose coupling, but also can implement dynamic coupling. Dynamic coupling, using orchestrators, provides a very powerful mechanism for implementing load balancing, failover, and dynamic scaling. Orchestrators can also be an important mechanism for implementing self-organizing systems.

Self Organization

Complexity is the primary limiting factor in the successful implementation of large distributed systems. It is the Achilles heel of large microservices and API management implementations. As the number of things (APIs, services, resources) and connections between them grows, complexity increases non-linearly, i.e., c = n(n-1)/2. Top-down hierarchical controls, as implemented in most systems, are ill-suited to cope with this complexity. A better solution is needed.

The cloud gives us the power to create increasingly large and complex applications, integrating and operating on data spread across countries and even continents — if we can only manage them. Today, most of the working machines of that complexity occur in the natural world. We need to look at self-organizing systems, the way nature copes with complexity. Self-organizing systems emerge from bottom-up interactions, unlike top-down hierarchical systems, which are not self-organizing. Ant colonies are a useful example of emergence, where the whole is greater than the sum of its parts.

Ants, governed by very simple rules and only local interactions, can through their own activities, implement colonies that exhibit complex structures and behaviors that far exceed the intelligence or capabilities of individual ants. Ant colonies also illustrate the decentralized nature of self-organizing systems. The queen does not tell individual ants what to do — rather each ant reacts to stimuli from chemical messages (pheromones) exchanged with other ants.

In this way control is distributed over the whole system and all parts contribute to the resulting functionality — as opposed to centralized structures that are often dependent upon a single coordinating entity — and this decentralized structure, inherent to self-organizing systems, gives them resiliency and robustness. When any element fails it can easily be replaced by a like element. A successful cloud-native architecture mimics the decentralized structure of organic living systems ,where complex capabilities can emerge from the interaction of relatively simple parts — while at the same time minimizing the complexities of configuration and deployment.

Intelligent Adapters

Much of the work of data management is cleaning, validating, filtering, combining, and transforming data. The passing of a message or stream of messages provides a perfect opportunity to execute declarative rules for validating and manipulating the data payloads of those messages through the use of Intelligent Adapters. Intelligent Adapters can be chained together to enforce rules and even implement branching into one or more additional streams. Implementing this kind of repetitive rule-based processing within individual actors is wasteful and difficult to modify and manage — especially, when it can easily be handled by attaching intelligent adapter processing to the input and output message streams connecting actors.

Distributed Data Store Management

Replication services manage sets of identical physical datastores distributed across multi-cloud clusters in order to facilitate horizontal scalability and failover. These services should be transparent to application actors and include:

  • Multi-leader Datastore Mirroring maintains identical state across a set of physical datastores. Updates can be processed by any member of the set and are then propagated across the remainder of the set. This supports full horizontal scalability and is used primarily with relational and document databases.
  • Single-leader Datastore Mirroring maintains identical state across a set of physical datastores. Updates can be processed only by the leader of the set. Updates directed to non-leader datastores are routed to the leader and propagated to all the non-leaders. This supports leader failover by electing a new leader and is used primarily with relational and document databases.
  • Distributed Key-Value Stores provide a reliable way to store non-DBMS data that needs to be accessed by a distributed system of multi-cloud clusters. This gracefully handles leader elections during network partitions and tolerates machine failure, even of the leader node.
  • Near-Real Time Transaction Consistency ensures that distributed updates successfully complete and will try to resolve any inconsistencies that occur. When inconsistencies cannot be automatically resolved, it guarantees that those inconsistencies are logged and reported for remedial action. This is not conventional distributed transaction management. For performance reasons, it does not use 2-phase DBMS commits. It operates at a message level, not at DBMS level. It does not replace local DBMS transactions (which remain in effect). It is intended to ensure that all messages involved in a logical transaction are ultimately processed by all their targets across the network.

In summary, these underlying concepts can help software implementers meet the requirements identified in Section A and enable them to both mitigate the weaknesses in the Microservices and Space-Based Patterns and incorporate many of the desirable features in the other common architectural patterns. Multi-Cloud Apps: Part 1 digs a little deeper into designing and building successful cloud applications.

Section A — Cloud-Native Application Requirements

This article considers the requirements common for most cloud-native data management applications to be:

  1. Minimize complexity of application implementation and maintenance, configuration, deployment, management, and operation. Complexity adds risk and cost to any software implementation effort — especially those that involve an organization’s foray into new and unfamiliar territory. Failure to meet this requirement is the rock upon which cloud implementations most often founder. Most of the other requirements in this list exist to support it.
  2. Ensure that no runtime service or resource can be accessed or modified without proper authentication and authorization. Guarantee that all communications (messages) between services are digitally-signed and encrypted. Without meeting this requirement, any breach of any network or service involved in a distributed application can potentially breach all parts of that application.
  3. Ensure that multiple, identical services and resources can be deployed and that automated failover from services and resources to like services and resources can be implemented. Without meeting this requirement, implementing any effective failover strategy becomes complicated and expensive.
  4. Ensure that desired performance parameters for services and resources can be specified and monitored. Provide automated runtime facilities to modify the numbers and/or locations of executable services and resources to meet those specified parameters. Without meeting this requirement, implementing any effective scaling strategy becomes complicated and expensive.
  5. Guarantee the physical location transparency of a service or resource to other services and resources. Without meeting this requirement, implementing any effective scaling or failover strategy becomes complicated and expensive.
  6. Access and manage complex distributed data and present an integrated logical model of that data to applications. Without meeting this requirement, implementing application functionality and mixing legacy and new data becomes more complicated and expensive.
  7. Optimize both synchronous (request-response) and asynchronous (event) messaging. Without meeting this requirement, satisfying desired performance goals can be complicated and expensive. Both synchronous and asynchronous communications are necessary to meet the full set of distributed data use cases.
  8. Manage network limitations and tradeoffs regarding distributed data consistency, availability, partition tolerance, and latency. Without meeting this requirement, implementing any effective strategy for managing reliability and performance becomes complicated and expensive.
  9. Present a consistent, graphical, declarative, and low-code development and management environment, while allowing the use of modern programming languages when required by the developer. Without meeting this requirement when implementing application functionality, developer productivity will be negatively impacted by the learning curve necessary to absorb and apply new technologies.
  10. Expose high-functioning, consistent, and useful APIs to application developers. Without meeting this requirement, when implementing application functionality, developer productivity will be negatively impacted by the learning curve necessary to absorb and apply new technologies.

Section B — Common Architectural Patterns

Architecture refers to the fundamental structures of a system and the discipline of creating such structures and systems. Each structure is made of elements, relations among elements, and the properties of both elements and relations.

The following five architectural patterns are the most common in today’s application design, but for the most part predate cloud computing. They each have strengths and weaknesses — they each have some aspects of applicability to cloud-native design — and they all have flaws when applied to cloud-native applications. Our challenge is to select the positive attributes and mitigate the negative when defining a cloud-native architectural pattern.

The images and links, below, for the full description and analysis of each pattern are from:

Software Architecture Patterns by Mark Richards

Copyright © 2015–2021 O’Reilly Media, Inc. All rights reserved.

O’Reilly provides an introductory free view.

Layered Pattern

Copyright © 2015–2021 O’Reilly Media, Inc. All rights reserved.

The layered architecture pattern is the most commonly used architecture pattern, otherwise known as the n-tier architecture pattern. In discussions of cloud architecture and microservices, layered is often mistermed monolithic architecture. This pattern is the de facto standard for most Java EE applications and therefore is widely known by most architects, designers, and developers. It is a clear confirmation of Conway’s Law:Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization’s communication structure.” It is the pattern that most of us were trained to try first.

Characteristics as a cloud-native solution:

  • Responsiveness to Change: Low
  • Ease of Deployment: Low
  • Testability: High
  • Performance: Low
  • Scalability: Low
  • Ease of Development: High

Event-Driven Pattern

Copyright © 2015–2021 O’Reilly Media, Inc. All rights reserved.

The event-driven architecture pattern is a popular distributed asynchronous architecture pattern used to produce highly scalable applications. It is also highly adaptable and can be used for small applications and as well as large, complex ones. The event-driven architecture is made up of highly decoupled, single-purpose event processing components that asynchronously receive and process events.

Characteristics as a cloud-native solution:

  • Responsiveness to Change: High
  • Ease of Deployment: High
  • Testability: Low
  • Performance: High
  • Scalability: High
  • Ease of Development: Low

Microkernel (Plug-In) Pattern

Copyright © 2015–2021 O’Reilly Media, Inc. All rights reserved.

The microkernel architecture pattern (sometimes referred to as the plug-in architecture pattern) is a natural pattern for implementing product-based applications. A product-based application is one that is packaged and made available for download in versions as a typical third-party product.

Characteristics as a cloud-native solution:

  • Responsiveness to Change: High
  • Ease of Deployment: High
  • Testability: High
  • Performance: High
  • Scalability: Low
  • Ease of Development: Low

Microservices Pattern

Copyright © 2015–2021 O’Reilly Media, Inc. All rights reserved.

The microservices architecture pattern is quickly gaining ground in the industry as a viable alternative to monolithic applications and service-oriented architectures. Because this architecture pattern is still evolving, there’s a lot of confusion in the industry about what this pattern is all about and how it is implemented.

Characteristics as a cloud-native solution:

  • Responsiveness to Change: High
  • Ease of Deployment: High
  • Testability: High
  • Performance: Low
  • Scalability: High
  • Ease of Development: High

Space-Based (Cloud) Pattern

Copyright © 2015–2021 O’Reilly Media, Inc. All rights reserved.

In any high-volume application with an extremely large concurrent user load, the database will usually be the final limiting factor in how many transactions you can process concurrently. While various caching technologies and database scaling products help to address these issues, the fact remains that scaling out a normal application for extreme loads is a very difficult proposition.

The space-based architecture pattern, often called the cloud architecture pattern, is specifically designed to address and solve scalability and concurrency issues. It is also a useful architecture pattern for applications that have variable and unpredictable concurrent user volumes. Solving the extreme and variable scalability issue architecturally is often a better approach than trying to scale out a database or retrofit caching technologies into a non-scalable architecture.

Characteristics as a cloud-native solution:

  • Responsiveness to Change: High
  • Ease of Deployment: High
  • Testability: Low
  • Performance: High
  • Scalability: High
  • Ease of Development: Low

--

--

Dick Dowdell
Nerd For Tech

A former US Army officer with a wonderful wife and family, I’m a software architect and engineer who has been building software systems for 50 years.