Multi-Cloud Apps: Part 1
The development of reliable, scalable applications for cloud environments is far harder than it ought to be. Here we describe a proven architecture that creates a multi-cloud application framework to meet the tougher challenges of managing distributed data across multi-cloud clusters — reliability and automatic failover, high performance and auto-scaling, and near real-time data consistency. This is the first of six parts that describes how to build multi-cloud applications with minimal frustration and error. If that is interesting, read on.
In recent years, containers and container orchestrators like Kubernetes have dramatically simplified the development of multi-cloud, reliable, scalable, distributed systems. Applications have to be designed to exploit these technologies, it doesn’t just happen by chance. To take advantage of these technologies to make building multi-cloud applications easier, a good first step is to learn and master the Actor Model.
What is the Actor Model?
The Actor Model was first described as a computing model with the actor as a universal primitive of concurrent computation. What’s more interesting about the Actor Model is that, long before there was cloud computing, it was specifically designed for and has been proven to meet the fundamental requirements of multi-cloud applications — concurrency, failover, and scaling. Back in 1973 the model was originally designed for multicore, in-memory, clustered computing environments with many of the same basic computing objectives as the cloud. Serious multi-cloud application developers are now putting it to work meeting today’s needs.
Actors can be LEGOs® for building multi-cloud applications because they’re simple, make sense, and can easily be snapped together to build bigger things. That simplicity and intuitiveness makes it easier to design and build complex applications with actors as the building blocks. Actors are small, testable, deployable, executable units of code that:
- React to messages by executing business rules and logic.
- Send messages to other actor instances when they need tell them about (event), or ask them to do (request), something.
- Create and supervise other actor instances in some implementations of the model. In the Cloud Actor Model those functions are handled in a different way, using Kubernetes.
Actor instances are reentrant and thread-safe. Everything that an actor instance needs to do its job is either in the message to which it is reacting or in the persistent resources to which it connects. Actors do not use locks so they cannot become deadlocked. Among their many roles, actors can be microservices, microservice clients, event publishers, event handlers, message brokers, distributed loggers, error handlers, repository handlers, resource handlers, and Web servers. We’ll explore how they fulfill those roles in more detail as we move on.
Using actors, communicating through message passing, as the basic building blocks of multi-cloud applications simplifies the execution of cloud networked concurrent processing and:
- Enables concurrency without resorting to locks.
- Uses cooperative actors reacting to messages, changing persistent state, and sending messages to each other to implement application functionality.
- Provides the basis by which application failover and scaling can be implemented.
- Supports both pseudo-synchronous (request-response) and asynchronous (event) messaging (in the cloud, actors are inherently asynchronous, but as we’ll see later they can emulate synchronous behavior).
DISCUSSION: The Actor Model, as described here, is intended to be used for the parts of an application that implement business functionality and read, filter, transform, and write persistent data — effectively the distributed cloud-hosted server-side of application functionality — as opposed to the network attached clients that make up the front end of those applications and connect users to the server-side. Today front ends are most commonly implemented through UI/UX technologies and are deployed as mobile, desktop, or Web browser applications. Typically, these applications support one user at a time and do not have inherent scalability issues. However, they are very much affected by the reliability, latency, and scalability of the servers to which they connect — which is the domain of the Actor Model.
Extending the Actor Model
Because containers and container orchestration simplify many of the challenges of concurrency, failover, and scaling that used to be written into individual actors, and because modern message streaming has capabilities that didn’t exist in the 1970s, the Actor Model can be extended to take advantage of those capabilities. We will use the term, Cloud Actor Model, for that extended Actor Model. In the Cloud Actor Model, actor instances live in containers in pods and are deployed through Kubernetes deployments.
Container technologies do not require the cloud to operate, Kubernetes will work just fine in a traditional on-premises data center.
Though it is still the Actor Model, some specialized actor types can be introduced to create the Cloud Actor Model. These new actor types provide building blocks that facilitate the creation of cloud capable applications.
- Broker actors are the glue that connects individual actors by organizing messaging among them. Brokers are the only stateful actors and manage the failover, scaling, and self-organizing capabilities of the Cloud Actor Model. When a broker is running, it broadcasts its presence to all other reachable brokers. Brokers are federated across cloud clusters and share state information with each other. A small broker proxy lives as a sidecar in every actor pod to facilitate actor registration and message passing using the optimal broker. Brokers take in messages addressed to a specific actor type and route them to the physical address of the optimum instance of that actor type. A mailbox is paired with each individual actor instance to buffer incoming messages for the actor and to send messages and communicate with broker proxies on behalf of the actor instance. This article introduces brokers and mailboxes. The coming Multi-Cloud Apps: Part 2, Using Messaging and Brokers, covers more details of why and how we use them.
- Intelligent Stream actors are specialized for data validation, formatting, filtering, and transformation. A stream actor’s behavior is specified mostly by declarative rules rather than with imperative code. An intelligent stream has 1 to n input channels and 1 to n output channels and can be chained together with other streams for more complicated message processing. Streams act on the messages moving between actor instances. The coming Multi-Cloud Apps: Part 4, Using Intelligent Streams, covers intelligent stream actors and how and why touse them.
- Repository Handler actors are application interfaces for creating, reading, writing and deleting data through logical views. They can work with single resource handlers or multiple resource handlers. Repository handlers present a logical data model to application actor instances and interact through a REST API with resource handlers to physically map, store, and retrieve data in the physical data model. Repository handlers and their associated resource handlers do the heavy lifting of distributed data management (failover, scaling, replication, consistency) for the rest of an application’s actors.
- Resource Handler actors are application adapters to the physical data model. They are used by repository handlers to map resources to and from persistent storage, very much like an Object Relational Mapper (ORM), such as Hibernate, maps objects to and from relational databases. Resource handlers are accessed through a REST API. Resources are things that reside in non-volatile system storage like files, key-value stores, and databases.
- Event Publisher actors publish event messages to distributed streaming queue systems like Kafka. They send event messages by topic to event queues. Publishers have no knowledge of their subscribers and leave the technical details of the streaming queues themselves to the queuing software.
- Event Handler actors subscribe to a message queue of a specified topic. The handler reads each message, in order, from the queue and forwards it to an appropriate application actor instance.
- Distributed Logger actors log error and application messages to the appropriate distributed logs. If an error message type has a defined error handler the error is also forwarded to that error handler. Distributed logs can also be replayed to recover state after network partitions or the failure of persistent resources.
- Error Handler actors organize appropriate remedial action for error message types that require it — for example, a resource handler error that might compromise eventual consistency and requires resolution and/or reporting.
- Cluster Bridge actors move messages between cloud clusters, implement and enforce inter-cluster security, and optimize inter-cluster messaging. This is particularly important for Kafka performance.
- Web Handler actors are Web servers that accept synchronous requests (https://) and route them to an appropriate application actor instance. They wait for the matching asynchronous response message from the processing actor instance(s) and return it to the synchronous requestor, or timeout and return an HTTP error. Web Handlers are internal edge actors that enable access from outside a cloud application. A Web Handler can be used to expose secure synchronous messaging (including REST APIs) to external applications.
- WebSocket Handler actors are WebSocket servers that accept asynchronous messages (wss://) and route them to an appropriate application actor instance. If and when a WebSocket Handler receives a response message, it matches and sends it to the original requestor. Timeouts are handled by the requestor. WebSocket Handlers are internal edge actors that enable access from outside a cloud application. A WebSocket Handler can be used to implement secure full-duplex messaging into and out of external applications.
- Strangler Façade implements a rule-based façade for the Strangler Fig Pattern and can redirect legacy API calls to new cloud capable applications as mature applications are gradually transformed into multi-cloud applications.
DISCUSSION: Multi-cloud applications can be, and often are, implemented without using the Actor Model. Does that matter? It matters, and it can matter a lot. Multi-cloud applications are intended to exploit the automated deployment, scaling, reliability, and failover capabilities available with the hybrid cloud. Ultimately, the idea is that by increasing the isolation between software components, we can deliver discrete parts of a system both rapidly and independently, and by utilizing containers and container orchestration we can deliver a high degree of horizontal scalability and fault tolerance. The old patterns of application composition, deployable units, and methods of resource access do not promote the scalability and fault tolerance we need. This requires us to move from relatively monolithic or layered applications to deployable and stateless containerized components such as with the Cloud Actor Model.
Because the Cloud Actor Model was created to exploit the power of container orchestration available through Kubernetes, Kubernetes terminology can make it easier to describe the model. We understand that many readers are already familiar with Kubernetes and might want to skip this section.
These definitions are provided courtesy of Linode.
- Orchestration is the automated configuration, coordination, and management of computer systems, software, middleware, and services. It takes advantage of automated tasks to execute processes. For Kubernetes, container orchestration automates all the provisioning, deployment, and availability of containers; load balancing; resource allocation between containers; and health monitoring of the cluster.
- Containers are similar to virtual machines. They are lightweight isolated runtimes that share resources of the operating system without having to run a full operating system themselves. Containers consume few resources but contain a complete set of information needed to execute their contained application images such as files, environment variables, and libraries.
- Containerization is a virtualization method to run distributed applications in containers using microservices. Containerizing an application requires a base image that can be used to create an instance of a container. Once an application’s image exists, one can push it to a centralized container registry that Kubernetes can use to deploy container instances in a cluster’s pods.
- Pods are the smallest unit of the Kubernetes architecture, and can be viewed as a kind of wrapper for a container. Each Pod is given its own IP address with which it can interact with other Pods within the cluster. Usually, a Pod contains only one container, but a Pod can contain multiple containers if those containers need to share resources. If there is more than one container in a Pod, these containers can communicate with one another via localhost.
- Services group identical Pods together to provide a consistent means of accessing them. For instance, one might have three Pods that are all serving a website, and all of those Pods need to be accessible on port 80. A Service can ensure that all of the Pods are accessible at that port, and can load balance traffic between those Pods. Additionally, a Service can allow an application to be accessible from the internet. Each Service is given an IP address and a corresponding local DNS entry. Additionally, Services exist across Nodes. If you have two replica Pods on one Node and an additional replica Pod on another Node, the service can include all three Pods.
- Deployments have the ability to keep a defined number of replica Pods up and running. A Deployment can also update those Pods to resemble the desired state by means of rolling updates. For example, if one wanted to update a container image to a newer version, one would create a Deployment, and the controller would update the container images one by one until the desired state is achieved. This ensures that there is no downtime when updating or altering Pods.
The CAP Theorem
DISCUSSION: When we build distributed applications that work with persistent data, we must always be aware of the CAP Theorem. It is not quite so absolute a limiter as the speed-of-light is in the physical world, but we ignore it at our peril. Workable multi-cloud architectures always need to take it into account when choosing compromises.
Actor Instances and Addresses
In the Cloud Actor Model, each actor instance lives in a container, in a Kubernetes pod. The pods are addressable through Kubernetes services and are deployed through Kubernetes deployments.
Each actor type has a name and version that is its unique logical address. An actor type has as many individual instances as are needed to meet its desired throughput and latency requirements. Each of those instances has a physical address (network IP address and port) where it can be reached.
Actor instances pass messages to other actor instances through message brokers. It is a broker’s job to map logical actor addresses to the physical address of the instance that has the best chance of maximizing throughput and minimizing latency. In this way the location transparency of actor instances is maintained, enabling brokers to manage latency, failover and scaling. Messages passed to actor instances within the same Kubernetes pod do not pass through brokers but only through the actor instances’ attached mailboxes.
DISCUSSION: A message passed between actor instances, through brokers, must meet two mandatory criteria: 1) it must be digitally signed, and 2) it must be encrypted (usually TLS). Any message that does not meet these criteria is logged as an error and discarded by message brokers. Otherwise, a Cloud Actor Model application would have no way to determine the legitimacy of messages in a potential globally networked environment and would be open to malicious interference.
The Actor Lifecycle
When Kubernetes activates a pod, the pod’s broker proxy executes the following actions:
- It broadcasts its existence to any reachable brokers and registers with the first one to respond. Brokers are federated and share actor instance registration data with each other.
- It then registers all of its actor types (those that are able to react to external messages) with that broker.
- Finally, it starts a periodic loop that, every n milliseconds, sends a heartbeat message to the broker with which it has registered. If that broker fails to respond to n consecutive heartbeat messages, the broker proxy repeats steps 1 thru 3.
If the broker with whom the broker proxy has registered fails to receive a heartbeat message from the proxy for n milliseconds, the proxy’s registration data are purged from the broker’s registry. If the broker proxy is still alive, the next time it sends a heartbeat message it will receive an error response and will repeat steps 1 thru 3.
DISCUSSION: The relationship among actor types and message brokers is self-organizing and requires no prior configuration. As Kubernetes pods come online, the actor types in them are registered with the nearest broker. If a pod ceases to be reachable its actors are automatically removed from its broker’s registry. If a broker ceases to be reachable, its actor types will be registered with a new broker. The federated brokers share state changes with each other and always route messages to actor instances with the lowest latency (from their perspective).
There are a few points about an actor’s lifecycle that are worth emphasizing
- Actor instances only execute rules and logic when they are reacting to a message.
- Actor instances are absolutely reentrant and stateless. They react to one input message at a time and have no memory of previous messages processed. All data needed to react to an input message must be in the message itself or in a persistent datastore.
- An actor instance may react to different message types with different sets of rules and logic (channels).
- Actor instances have 1 to n input message channels. One channel for each message type to which it will react. The channel encapsulates the set of rules and logic the actor instance will use to process a message. Channels are also used to react to pseudo-synchronous responses from other actors.
- Actor instances have 1 to n output message channels. One channel for each actor type to which they can send a message. This is useful for passing messages to multiple and/or different target actor types based upon rules and logic.
- Actor instances of the same type are completely interchangeable. This is necessary to implement failover and scaling. A pseudo-synchronous response to a message may not even be processed by the same actor instance that sent the original pseudo-synchronous request. In fact, in the middle of a larger application activity, Kubernetes may have taken pods in or out of service. The only exception to this interchangeability rule applies to Repository and Resource Handlers whose underlying datastores may need to be be replicated for redundancy or responsiveness and may be synchronous in their own processing. This is discussed in more depth in the coming Multi-Cloud Apps: Part 5 — Working with Distributed Data.
Asynchronous and Pseudo-Synchronous Messaging
We’ve discussed the fact that actor instances communicate through message passing, rather than by function calls, method calls, or remote procedure calls (RPCs). For many reasons — some relating to the impact that messaging might have on system performance — in the Cloud Actor Model messaging is primarily asynchronous. An actor instance does not wait for responses so it does not waste time. That’s great for performance, but in practical terms much of what we do with computers involves requests and responses. How does the Cloud Actor Model handle that?
The Cloud Actor Model handles it by using pseudo-synchronous messaging — where the responding actor instance sends a response type message to a requesting actor type instance which will use a secondary channel to process that message type. Because actor instances are stateless, any instance of the requesting actor type can do the job and no one has to wait for a response. It is called pseudo-synchronous because it looks exactly like a normal request-response from the user’s perspective but its mechanics are purely asynchronous.
A very few specialized actor types, such as Repository and Resource Handlers are necessarily synchronous in their messaging to each other.
Actors and Intelligent Streams
In the Cloud Actor Model, a stream is a sequence of messages made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in batches. An Intelligent Stream is an actor that acts as conduit, with filters, that connects a stream of messages from one actor instance to another. The filters apply local transformations to each message passing through the conduit.
Much of the work of data management is cleaning, validating, filtering, combining, and transforming data. The passing of a message or stream of messages provides a perfect opportunity to execute declarative rules for validating and manipulating the data payloads of those messages through the use of Intelligent Streams. Intelligent Streams can be chained together to implement rules and even implement branching into one or more additional streams.
Intelligent Streams are highly reusable components. For example, a CustomerRecordValidator stream with rule specifications to validate and transform the contents of a Customer record can be used by any actor that needs to ensure that a Customer record is valid and correctly formatted. If the format or rules of a Customer record change, typically only the CustomerRecordValidator stream actor needs to be updated and redeployed in order to propagate those changes to wherever a Customer record is used.
Using Intelligent Streams has the following benefits:
- It is easy to understand the overall input/output behavior of a system as a simple composition of behaviors of the individual streams.
- They support reuse, since Intelligent Streams can be applied anywhere in an application where the same message type and transforms are needed.
- Systems can be easily maintained and enhanced, since new streams can be added to existing flows and old streams can be replaced by improved ones.
- They naturally support concurrent execution.
Declarative (Low-Code) Actor Development
The simplicity of actors and their focus on message processing makes them prime candidates for low-code development tools. Tools should begin with a GUI application to guide developers through the collection of specifications for each specialized actor type. The specifications can be saved in JSON format so that a Velocity template engine can combine them with the appropriate program templates to generate compile-able source code modules. It’s easy to support any source language supported by the JVM, such as Java or Kotlin, but that is an implementation choice.
Low-code techniques have a number of intrinsic benefits:
- Low-code GUI tools make it easier to apply common design patterns and techniques across development teams and standardize the collection of specifications for specific use cases, reducing the opportunity to introduce errors. New and improved design patterns can be introduced economically and with minimal training.
- Template programs augmented by validated specifications reduce the opportunity to introduce bugs and when errors are detected they can frequently be corrected at the template level and the related source modules regenerated.
- Statistics indicate that errors are created as a function of lines of code written. The fewer lines of code we have to write the fewer bugs we create.
Software testing can be a complex, difficult, time-consuming, exercise under the best of circumstances. How does one test hundreds (or even thousands) of executable components (actors) that interact across multiple cloud clusters? The Cloud Actor Model has an answer — preconditions and postconditions enforced by Intelligent Streams.
Intelligent Streams validate, filter, and transform the data in the messages that actors receive and send. They can assert preconditions for incoming messages and postconditions for outgoing messages. When an invalid state is detected, the Intelligent Stream posts an error message — which is picked up by a Distributed Logger, which in turn invokes an Error Handler if so specified for the error message type — rather than deliver the message to the actor. If invoked, the specified Error Handler orchestrates any cleanup or remedial actions required.
This approach guarantees that the state of application messages and data match the rules declared in the Intelligent Streams and that actors can always assume clean and valid input data — while effectively and continuously executing unit tests on all components wherever they are used. The Distributed Loggers and Error Handlers ensure that exception conditions occurring anywhere in the multi-cloud can be reported for analysis and remediation. As Kubernetes rolls out changes across deployments, problems can be quickly identified and rolled back before they can do harm and compromise an overall application.
A Multi-Cloud Actor Example
The following diagram shows how cloud actors and resources can be combined to deliver application functionality. When using the Cloud Actor Model, any actor can interact with any other actor residing in any connected cloud cluster. Obviously, distance and network bandwidth can have performance implications.
Cloud Actor Web applications look and feel like any other well-implemented Web application. A Web Handler actor has a highly efficient, lightweight, pre-configured open source Web server embedded in it. Though a Web Handler can serve static content, in this example we are focused upon the processing of dynamic data. Here’s how it works:
- An HTTP or API request to create a new Sales Order is received by a Web Handler actor.
- The Web Handler builds a CreateSalesOrder message and sends it to the SalesOrderApplication actor.
- The SalesOrderApplication actor validates the message and adds any necessary instructions to the CreateSalesOrder message and sends it to the SalesOrderRepository actor.
- The SalesOrderRepository actor formats an UpdateInventory message and sends it to the InventoryResource actor which then updates the Inventory database and returns the results.
- Upon receipt of a positive response from the InventoryResource actor, the SalesOrderRepository actor sends a CreateSalesOrder message to the SalesOrderResource actor — which adds the order to the SalesOrder database and returns the results.
- Upon successfully adding the SalesOrder to its databases, the SalesOrderRepository actor sends a message to the SalesOrderEventPublisher which publishes the SalesOrderEvent to the “sales-order” topic.
- The SalesOrderRepository actor sends the results of of the processing to the Web Handler which then responds to the original requestor — completing the Cloud 1 activity.
- In this example, the SalesOrder and Inventory databases are mirrored in both Clouds 2 and 3 which both have instances of the SalesOrderEventHandler. Their SalesOrderEventHandlers subscribe to the “sales-order” topic and when an event arrives, they send it as a message to the actor specified to process it.
- The SalesOrderEventHandlers actor validates the message and adds any necessary instructions to the CreateSalesOrder message and sends it to the SalesOrderRepository actor.
- The SalesOrderRepository actor formats an UpdateInventory message and sends it to the InventoryResource actor which updates the Inventory database and returns the results.
- Upon receipt of a positive response from the InventoryResource actor, the SalesOrderRepository actor sends a CreateSalesOrder message to the SalesOrderResource actor — which adds the order to the SalesOrder database and returns the results.
At the completion of these steps, the SalesOrder and Inventory databases on Clouds 1, 2, and 3 share a consistent state and can be used for failover and/or load balancing. As multi-leader mirrors, each each of these databases can be updated by local applications and the changes will be propagated to the other copies.
DISCUSSION: The Cloud Actor Model can not only implement application functionality in the cloud. It can also provide the building blocks necessary to manage the distributed, location transparent, mirrored datastores required to fully meet the promise of the cloud to deliver automated scaling and failover.
Transitioning to the Cloud Actor Model
Building new applications with the Cloud Actor Model can be a surprisingly painless exercise — once one gets a feel for the new clay with which one is sculpting. However, one of the most common and difficult software engineering and management challenges we all face is migrating critical applications from an old to a new architectural pattern — for example, going from a Monolithic or Layered Model to the more cloud-capable Cloud Actor Model. Because of its fine-grained, decoupled component model, and its ability to map between logical and physical data models, the Cloud Actor Model can ease the transition via the Strangler Fig Pattern as originally described by Martin Fowler. Fowler wrote:
“Strangler fig seeds germinate in the upper branches of a host tree and gradually work their way down the tree until they root in the soil. Over many years they grow into fantastic and beautiful shapes, meanwhile strangling and killing the tree that was their host.”
“This metaphor struck me as a way of describing a way of doing a rewrite of an important system. Much of my career has involved rewrites of critical systems. You would think such a thing as easy — just make the new one do what the old one did. Yet they are always much more complex than they seem, and overflowing with risk. The big cut-over date looms, the pressure is on. While new features (there are always new features) are liked, old stuff has to remain. Even old bugs often need to be added to the rewritten system.
An alternative route is to gradually create a new system around the edges of the old, letting it grow slowly over several years until the old system is strangled. Doing this sounds hard, but increasingly I think it’s one of those things that isn’t tried enough.” — Martin Fowler
So, if an old application meets the criteria for the Strangler Fig Pattern, the transition to the Cloud Actor Model maybe easier than one might think.
We hope that this article has started you thinking about how the Cloud Actor Model can help to deliver on the promise of multi-cloud computing as you build serious, game-changing, applications — with a lot less frustration and drama.
The Cloud Actor Model is a good starting point on the path to designing, building, and deploying multi-cloud, reliable, scalable, distributed applications — but, there are other things we need to have in the toolbox. This article is the first in a 6-part series, Building Multi-Cloud Apps. The others are:
- Multi-Cloud Apps: Part 2, Messaging and Brokers - Message passing implements loose coupling, but also can implement dynamic coupling. Dynamic coupling, using brokers, provides a very powerful mechanism for implementing load balancing, failover, and dynamic scaling. Brokers can also be an important mechanism for implementing self-organizing systems. This is where we discuss how and why this architecture uses them. We also discuss how modern distributed messaging technologies can process millions of messages per second.
- Multi-Cloud Apps: Part 3, Managing Complexity with Self-Organization - Complexity is the primary limiting factor in the successful implementation of large distributed systems. It is the Achilles heel of large microservices and API management implementations. As the number of things (APIs, services, resources) and connections between them grows, complexity increases exponentially. Top-down hierarchical controls, as implemented in most systems, are ill-suited to cope with this complexity. This is where we discuss how and why this architecture implements self-organization.
- Multi-Cloud Apps: Part 4, Using Intelligent Streams - Much of the work of data processing is cleaning, validating, filtering, combining, and transforming data. The passing of a message or stream of messages provides a perfect opportunity to execute declarative rules for validating and transforming the data payloads of those messages through the use of Intelligent Streams. This is where we discuss how and why this architecture uses them.
- Multi-Cloud Apps: Part 5, Working with Distributed Data - Cloud computing provides enormous power to distribute, access, and manage data. But coping with latency and network partitions while still delivering reliability, scalability, concurrency, and consistency is not a trivial challenge. It requires techniques and skills not all that commonly understood. This is where we discuss how this architecture manages distributing data in the cloud.
- Multi-Cloud Apps: Part 6, Putting It All Together -This is where we put all the concepts together to design a very simple sales order system as one would be likely to see on any E-Commerce Website. We’ll use the Cloud Actor Model and execute the following steps: 1) define application requirements, 2) design the database, 3) define the external API, 4) define the required messages and intelligent streams, and, 5) design the needed actors. We’ll then show how the component pieces can be deployed to a multi-cloud environment.
As we publish each new part, we’ll update its bullet to be a link to the article.
Appendix A: Multi-Cloud Architecture Requirements
The 10 basic objectives that any multi-cloud architecture should meet:
- Minimize complexity of application implementation and maintenance, configuration, deployment, management, and operation. Complexity adds risk and cost to any software implementation effort — especially those that involve an organization’s foray into new and unfamiliar territory. Failure to meet this requirement is the rock upon which cloud implementations most often founder. Most of the other requirements in this list exist to support it.
- Ensure that no runtime service or resource can be accessed or modified without proper authentication and authorization. Guarantee that all communications (messages) between services are digitally-signed and encrypted. Without meeting this requirement, any breach of any network or service involved in a distributed application can potentially breach all parts of that application.
- Ensure that multiple, identical services and resources can be deployed and that automated failover from services and resources to like services and resources can be implemented. Without meeting this requirement, implementing any effective failover strategy becomes complicated and expensive.
- Ensure that desired performance parameters for services and resources can be specified and monitored. Provide automated runtime facilities to modify the numbers and/or locations of executable services and resources to meet those specified parameters. Without meeting this requirement, implementing any effective scaling strategy becomes complicated and expensive.
- Guarantee the physical location transparency of a service or resource to other services and resources. Without meeting this requirement, implementing any effective scaling or failover strategy becomes complicated and expensive.
- Access and manage complex distributed data and present an integrated logical model of that data to applications. Without meeting this requirement, implementing application functionality and mixing legacy and new data becomes more complicated and expensive.
- Optimize both synchronous (request-response) and asynchronous (event) messaging. Without meeting this requirement, satisfying desired performance goals can be complicated and expensive. Both synchronous and asynchronous communications are necessary to meet the full set of distributed data use cases.
- Manage network limitations and tradeoffs regarding distributed data consistency, availability, partition tolerance, and latency. Without meeting this requirement, implementing any effective strategy for managing reliability and performance becomes complicated and expensive.
- Present a consistent, graphical, declarative, and low-code development and management environment, while allowing the use of modern programming languages when required by the developer. Without meeting this requirement when implementing application functionality, developer productivity will be negatively impacted by the learning curve necessary to absorb and apply new technologies.
- Expose high-functioning, consistent, and useful APIs to application developers. Without meeting this requirement, when implementing application functionality, developer productivity will be negatively impacted by the learning curve necessary to absorb and apply new technologies.
Appendix B: Principles of Distributed System Design
Amazon used the following principles of distributed system design to meet Amazon S3 requirements:
- Decentralization. Use fully decentralized techniques to remove scaling bottlenecks and single points of failure.
- Asynchrony. The system makes progress under all circumstances.
- Autonomy. The system is designed such that individual components can make decisions based on local information.
- Local responsibility. Each individual component is responsible for achieving its consistency; this is never the burden of its peers.
- Controlled concurrency. Operations are designed such that no or limited concurrency control is required.
- Failure tolerant. The system considers the failure of components to be a normal mode of operation and continues operation with no or minimal interruption.
- Controlled parallelism. Abstractions used in the system are of such granularity that parallelism can be used to improve performance and robustness of recovery or the introduction of new nodes.
- Decompose into small, well-understood building blocks. Do not try to provide a single service that does everything for everyone, but instead build small components that can be used as building blocks for other services.
- Symmetry. Nodes in the system are identical in terms of functionality, and require no or minimal node-specific configuration to function.
- Simplicity. The system should be made as simple as possible, but no simpler.
Appendix C: Architecture is the Key
In September 1984, Alan Kay wrote: “Atoms seem quite innocent. Yet biology demonstrates that simple materials can be formed into exceedingly complex organizations that can interpret themselves and change themselves dynamically. Some of them even appear to think! It is therefore hard to deny certain mental possibilities to computer material, since software’s strong suit is similarly the kinetic structuring of simple components. Computers “can only do what they are programmed to do”, but the same is true of a fertilized egg trying to become a baby. Still, the difficulty of discovering an architecture that generates mentality cannot be overstated. The study of biology had been underway some hundreds of years before the properties of DNA and the mechanisms of its expression were elucidated, revealing the living cell to be an architecture in process. Moreover, molecular biology has the advantage of studying a system already put together and working; for the composer of software the computer is like a bottle of atoms waiting to be shaped by an architecture he must invent and then impress from the outside.”