Highly available services: The non-techies introduction
You have landed on a series of articles explaining what is and how to achieve software services high availability.
In them, I explain how to create and put such services in a way they can work most of the time.
This entry is the necessary, non-technical introduction. Expect the next ones to be written for (more) technical audiences.
A computer services world
We cannot conceive today’s computing without the so-called services.
Services are a mix of software (computer applications) and hardware permanently running somewhere, connected to a network and whose mission is to transform and transmit information.
If any of the elements in the mix do not work, then the service could be useless, and it would not receive or transform the information it manages.
A service can be simple, such as a unique program on a single server (or domestic PC) or very complex, formed with many computer programs.
Services can provide countless different functions. Some display web pages like this one; others send an SMS to a cell phone after something happens, and others take input like an amount expressed in US dollars and transform to equivalent in Europe’s EURO. Possibilities are infinite. They can even use other services.
Top-used sites like Google or Facebook run many applications on thousands of servers and other gear to serve their search, maps, photography, social network, and related services they offer.
When we talk about services as a whole, we refer to all necessary things like software, hardware, networks, and others that compose it.
What is availability?
Before we can answer what is high availability, we must know what availability is.
This term means that a service is accessible and functional. A typical service is a web server, where there are pages stored and accessed from web browsers.
When users, through a web browser, tries to visit an existing web site and it does not load, they experience an “unavailable service”. Such lack of availability tends to conduct to users stopping using that service.
A service may use other services: A web page may use a database to fetch and display some content, and that database must be available for operation.
Therefore, “availability” is what allows an existing service to be accessed and used without inconvenience.
A service that works as expected most of the time makes it reliable. Reliability is something we are going to talk about in other posts.
A standard metric for high availability is a measure of how long the service has been available over a specific period, and habitually represented as a percentage.
This percentage is called “availability time” or “(service) uptime”. The time the service has not been available is called “downtime”.
To calculate this time and percentage, we need to determine how long is the period we’re going to measure, commonly expressed in minutes.
A day has 1440 minutes, a week 10080 minutes, a month 302400 minutes, and a year 525600 minutes. Minutes for months and years vary based on the days they have.
Then a simple formula is applied to calculate such availability:
(total period minutes — unavailable minutes) / total period minutes * 100
The result is the availability percentage in the chosen period. If we measure 60 minutes of downtime over a day (1440 minutes), we have:
(1440–60) / 1440 * 100 = 95.83 % of availability that day
What is high availability?
We can define high availability as a process to achieve the least unavailable service time, even if some of the underlying components are failing.
The way to achieve high availability is running services on an entirely redundant IT infrastructure, carefully designed.
It is not hardware like network gear and servers and their components. The following is a non-complete list of elements:
- Network gear: Duplicated. That includes routers, switches, cables and others
- Networking: Multiple providers
- Servers and its components, duplicated
- Power supplies: Redundant, connected to different power sources from different power origins
- Software: Multiple instances of each service and its dependencies running
There are many more behind the scenes. Top services furthermore get replicated in different world regions, in different data centres to be more available, among other benefits.
Not all services require the same level of high availability: Critical services like user login on a popular web site needs to be much more available than a small user login on a personal page.
Note that services cannot be more available than the underlying components. If the previous service relies on a database available 50% of the time, the service could not be available more than that.
Why high availability?
We need high availability to ensure the services are performing most of the time.
From a user point of view, thanks to high availability, we can use the service at any. For companies offering services, that makes them trustworthy.
Now we understood what is high availability, be prepared to achieve it. Next articles focus on ways to use and configure “floating” (virtual) IP addresses, load balancers like HAProxy, standard services redundancy and technologies like containers and container orchestrators, amid others.