Building Cloud Apps with Civo and Docker — Part 3: Cloud Theory
Having now written a couple of practical articles on getting a Cloud cluster online with Civo and Docker, I wanted to take a step back and write about some of the theory of writing apps in the Cloud.
Anyone can follow a tutorial and build out a solution for scaling a stateless application, but doing so with state is a whole different ball game. When writing applications, we all benefit from a wealth of available knowledge that outline tried and tested patterns for solving common problems. Many of us have been using patterns in our apps for so long, that they become second nature. Many of these patterns are even embedded in the programming languages and virtual machines we use, so implementing them is simple. However, with Cloud orchestration, things aren’t always so clear cut.
One of the problems with Cloud orchestration is simply with how relatively new it is. Now, companies have been scaling applications for a long time, so it’s not really new, but it’s only in recent years that scaling applications have become a necessity of smaller organisations. Quite simply, the number of potential users willing to use our apps is increasing all the time, so it’s often better to prepare for success than to face embarrassment when our apps can’t meet user demand. As with anything new, the knowledge of such implementations become coveted, with developers often believing they can demand more money and acquire employment stability by having crucial information that others don’t.
The truth of the matter is both simple and complex. Much like working with analogue electronics, the answer involves very simple concepts, but when combined together can often become overwhelming and confusing. In reality, there are no fast solutions for scaling state, but there are a number of obvious truths that, when considered early on, can help realise a practical solution.
The first obvious truth regards that of data consistency. If a service doesn’t require consistency, such as REST services that do not handle state or changeable resources, then the service could potentially be scaled exponentially without consequence. Such situations should be preferred and realised wherever possible. When dealing with stateful services and resources, however, then careful planning is recommended.
The first step when building out your data-handling architecture is to identify its state. State is, essentially, any data that differentiates one persons running application from another. The data specific to that user or to that users experience. If another person booted up the app, their experience would be different because the state for that users connection is different or, perhaps, the steps they took once loaded was different. If making the same request to the server produced different results, it might be due to state changes.
State and data are really interchangeable terms, though we can choose to see them differently depending on their transience, uniqueness to user interaction and its place within the application. We commonly expect state to disappear after a period of disconnection or a hard refresh, while we normally aim to retain data for future reference.
A typical kind of state is session data. This is data specific to a single user. With monolithic applications, where all or the majority of the stateful components exist on a single server, a users state is not normally a problem. When users leave your application, the state data will eventually expire and the resources it occupied is released and made readily available to new users. However, as more and more people use the application simultaneously, resources are depleted and the need to scale becomes apparent. This can be vertically, by increasing the amount of CPU and memory on the server or horizontally, by adding more servers. Increasing a servers resources usually only proves sufficient for a short period of time. Eventually, all applications with a continually upward sliding adoption will need to scale horizontally.
Identifying the Problem
One of the great benefits of modern computing is the economy of hardware. It is now far cheaper to utilise more hardware than it is to utilise more software development. As such, increasing server resources is normally the first line of attack when tackling user load. As more users utilise your application, you scale the hardware. The difficulties only set in when you start to move your application onto multiple servers when its associated data must also distribute, such as when state is present. That’s because, in order to balance server usage, you ideally don’t want to tie a unit of data to a specific machine.
Let’s look at this in a diagram:
Now, when users make requests to the load-balancer, that request will be passed to one of the available servers. If the server application maintains state for that user, then the two servers will no longer mirror each other and problems arise.
For example, Bob the user makes a login request which is routed to server one and a session is created. If Bob then sends a request which is routed to server two, any data that is held in server one will not be available, which may cause problems for the application and for Bob’s experience.
To combat this issue, we can choose to make a decision:
- we can decide to route users to a specific server based on a set parameter of that user.
- we can decide to share all data between the servers
The first option is rather simple, as it typically means the application doesn’t need updating. We simply provide the load-balancer with an amount of logic that routes the user to a specific server. This might be, for example, to send all users with an odd IP component to server one and all other users to server two. Now, this can work perfectly well in some circumstances and is a legitimate solution, but it can occasionally be unpredictable. What if all users accessing the server happen to have an odd IP component? Then server one will be over utilised and server two will have no traffic. It’s unlikely, but it is possible.
Another problem with this scenario is if user data needs to be shared among other users, such as with chat servers. One could implement a third party data store for these messages and have the servers poll the store for updates, but that would add an unwanted level of latency and complexity.
The alternative option is to share the state. As data comes into one server, it is immediately (or within an acceptable time frame) passed to the other server and vice versa. This ensures that, regardless of which server a user connects to, they will get the same experience. Thus, requests to the servers can be `round-robin’d`, whereby every subsequent request hits a different server in a cycle that ensures even load throughout.
So, we have a solution. Pretty simple, but what about the side effects? Passing data between two servers isn’t so bad, but what if your cluster consists of fifty servers? Fifty servers means that for every user that connects to a single server, the session data created needs to be dispatched forty-nine additional times. Thus, if one user connects to each of the fifty servers at the same time, all fifty servers will receive forty-nine additional messages at once for every single message sent! This can quickly get out of hand and cause congestion in your application which is hard to resolve.
When writing monolithic apps, we typically put everything in the same bucket. All the assets, messages and connections originate in the same place and we typically don’t consider optimisation as being too important.
An obvious optimisation is to simply serve smaller or fewer packets. Perhaps your data is being served from many endpoints that can be condensed into one? A typical HTTP request can consist of a header of around 1 or 2 kilobyes, regardless of body size. Therefore, reducing ten requests into one can reduce calls by 10 to 20 kilobytes overall, which can be a big saving over many millions of users.
Reducing data verbosity is also valuable. Perhaps a large packet of configuration data can be reduced to a simpler value?
The following JSON packet, for example:
may be condensed to a numerical value, such as:
where the bits of the integers binary representation correspond to the truthiness of each configuration setting:
5 == 0101 == [false, true, false, true]
Divide and Conquer
Another optimisation is to identify the different types of data in your application and to serve them appropriately. In recent days, microservices have become very popular partly for this reason. Unlike with monolithic apps, there is no reason why your image assets should be served from the same location as your REST or Websocket endpoints. Each can be served independently and can be cached to decrease processing costs.
- Your images, video and audio files can be served over a CDN, which both caches and load balances your asset files.
- REST endpoints can make use of eTags, to reduce needless CPU cycles and database trips.
- Websocket servers can be placed on a separate server to maximise available ports and reduce the number of nodes in the cluster.
- CPU intensive processes, such as video composition and image generation, can make use of serverless technologies which reduces cost by utilising CPU resources only when necessary.
Cache and Queue
Shuttling data around your application can be expensive. Even sending and retrieving data to and from a database in a monolithic application can be expensive. Therefore, reducing these trips can speed up your application and reduce strain on the server.
Caching data in read heavy resources provides a readily available heap against associated paths. This isn’t solely beneficial for database data, but for assets and generated data, too. At Xirsys, we implemented a distributed cache on top of a read-heavy database and reduced calls to it by over 90%. That’s a huge saving!
On the flip side; implementing a queue for write heavy resources reduces calls to a given store and, as a consequence, reduces bandwidth and CPU cycles.
Depending on the use case, it is quite plausible to combine both a cache and a queue, re-serving data not yet committed to its data store.
Distributed application and microservice topographies typically form tree like structures. These can be both top-down and bottom-up. For instance, a tree may work from a single user to multiple load-balancers or from multiple service nodes to a single data-store.
When contemplating state distribution, you typically want to position it within the narrowest node level. Storing state within a cluster of microservice nodes presents the issue of consistency, as noted above, but keeping that state within the client or in a data store greatly simplifies data management.
Passing data to and from its resident node only when required is preferable to implementing a consistency strategy at scale.
When working with complex microservice architectures, the overall topography may consist of numerous clusters of microservices, each with their own requirement on shared state. In such circumstances, implementing a purpose built store for each cluster may be practical.
Eric Brewer, a computer scientist of UC Berkeley, is well known for what is alternatively known as CAP theorem. That is, data can only realistically facilitate at most two of the following three guarantees:
- Consistency — The guarantee that retrieving data from a cluster will result in the most recent value or an error response.
- Availability — The guarantee that data will always be available, even if not up-to-date.
- Partition Tolerance — The guarantee that a service will continue to operate, even if network partitions fail.
When considering your architecture, you should decide which two of these requirements are necessary for the associated services.
Numerous users connected to a service that distributes messages between them, such as a websocket server, would typically favour Availability and Partition Tolerance. In this circumstance, we forgo consistency, as the real-time element is less important than ensuring data delivery and ensuring service availability.
A database, such as Postgres or MySQL, may typically favour Consistency and Partition Tolerance. This is a little tongue-in-cheek and may be disputed. A good argument can be found here. However, considering the circumstance of a single database node, it would be preferable to ensure the data is always current and that it continues to function during a network outage, even if it isn’t accessible by your application 100% of the time.
Consistent and Available
Some databases have been known to be CA compliant, though I’ve never really understood how. According to CAP theorem, it’s possible for a service to be consistent and available while sacrificing partition tolerance, but I’ve never really understood how a service can legitimately be available but not be current and yet be consistent at the same time. The answer is that all nodes serve old data until the new data has successfully propagated across all nodes; but ensuring this requires the same fundamental rules as ensuring consistency in the first place. Get your head around that little tidbit!
Pushing and Pulling Data
In times of old, data passed from server to client or from database to server typically occurred in the same way; by polling or pulling the data. In modern times, however, we have more options available to us.
Reactive typically refers to responding to changes to data as they occur. Instead of repeatedly checking the server for updates to a given data structure, the data is instead pushed when and only when it is updated. This reduces needless CPU cycles in the consumer.
Reactive technologies are commonly used in client applications, such as with ReactJS or, more specifically, Rx libraries and are truly reactive when combined with socket technologies such as Websockets.
Data Store Feeds
Modern data stores, such as CouchDB, now provide a feed capability which notifies interested parties when data is updated. Thus, when one service sends data to the data store, all other services listening to the feed will automagically be informed. This is a fantastic resource for cache cluster development, ensuring all writes are propagated to all nodes in a cache in the shortest possible time, while providing adequate protection and optimisation for data reads.
As you may have gathered by now, there are no hard and fast rules for distributed application development. The trick is to plan and to refactor as much as possible. As your application grows, consider ways to refine and optimise your data passing needs and reduce, reduce, reduce. Whats more, do not be afraid to utilise third party tools. Keep it simple and do NOT reinvent the wheel. Chances are, the problems you are facing are the same problems faced and solved by many others in the past.