The 5 essential tests required to build 100% Uptime Cloud Computing Services

Let’s face it… having your boss breathing down your neck while having to do major incident handling reports (MIH) every hour sucks!

So… you have a significant Cloud Hosting outage! Hopefully, you didn’t lose your job.

Customers are upset (and rightfully so), you are doing everything you can to recover the systems. All you need is some uninterrupted time to focus and work on the resolution, but the reality is… your boss is standing over your shoulder, and senior management wants to know what caused the problem… and more importantly how soon you expect the systems to be back online serving your customers!

You want to tell them “very soon”… but the truth is that you don’t know! Your gut tells you it will be right after everyone has run out of their last thread of patience, and another 6 hours after you are at the point of ‘drop dead’ exhaustion.

To add insult to injury …once you recover the solution, you’ll have to provide a root cause analysis (RCA) detailing when the outage occurred …how long the solution was down …what was done to recover it …what caused the outage …and what you’ll do to prevent that problem from reoccurring?

DOESN’T SENIOR MANAGEMENT REALIZE HOW MUCH YOU’VE JUST BEEN THROUGH?

‘STRESSED!’ is an understatement!

All cloud services solutions have parts that break… and break often. Failing hard disks, entire server chassis, network controllers, power supplies, etc. The list is endless. That’s a given! What you don’t want to occur is for customers to lose access to their services when components break. Services don’t even have to be ‘life’ CRITICAL services either — customers get upset when they lose access to their favorite Netflix programs.

So what can you do to build a highly available, fault tolerant, and robust Cloud technology solution… capable of gaining the respect of industry giants like Amazon or Netflix?

There are several things… and today we’re going to focus on being proactive by highlighting the 5 essential tests needed to build a ‘rock-solid’ Cloud services infrastructure platform.

Testing is an aspect of my job that takes time and can be quite boring… oh so boring. To combat this my team and I find ways to make this chore a more creative process and thus more interesting.

In future articles I’ll break-down these 5 tests in more detail and discuss why they’re so valuable for your career. For now I want to focus on what they are… and where they originated.

Many of our cloud infrastructure tests, whether you are building Infrastructure as a Service (Iaas) or Software as a Service (SaaS) solutions might sound familiar, and that’s because many have been adopted from software testing strategies and methodologies.

The first thing to understand is that once you make the pain-staking effort (ok… not really that bad) to build a proper cloud computing testing system and methodology, you and your team don’t have to consciously think about what to do when you start building a new solution… you will just have act!

So here are the 5 ‘pro’ tests in order…

1. Proof of Concept (POC)

Purpose: Will your cloud services solution do what it’s supposed to do and with a cost that’s acceptable to the business unit?

2. Hardware Validation

Purpose: You need to check that each component of your solution equipment will fit within the cooling, rack, and power constraints of the final location (i.e. your production data center)? Some data centers are older and some newer equipment infrastructure might not fit in the rackspace, cooling, or power constraints of older data center environments.

3. 48 Hour Burn-in Test

Purpose: You want a test to see if any of the cloud services solution equipment have been compromised during handling and transportation to the final location (i.e. your production data center)?

4. Open Readiness Testing (ORT)

Purpose: Once the cloud technology hardware components have been assembled together and the cloud software configured in the final location, test to see if the basic functions of whole solution operate as expected?

5. User Acceptance Testing (UAT)

Purpose: Finally, have your customer test to see if their application functions and performs as intended on your solution?

It is the role and responsibility of cloud hosting technology managers to protect their companies from service failures. If you want to build something worthwhile, that stands up to all types of abuse, and gets you recognized by your boss’s boss, it’s important to understand the requirements of your Cloud Technology’s Service Level Agreement (SLA) policy (i.e. 99.9% service uptime). Building cloud solutions to meet the demands of a comprehensive SLA policy is critical to helping companies avoid monetary penalties and/or a loss of reputation.

To build a highly robust, highly available ‘powerhouse’ solution capable of getting respect from the likes of NetFlix and Amazon, you need a set of solid processes, thorough design, stringent testing, and plenty of planning time.

If you found this article was beneficial please give it a LIKE and SHARE. Liking and Sharing are the “currency” of the Internet. Thank you!

Heath M. Jones is a Cloud Computing Expert supporting top-tier Telecom providers for over 8 years!

Click here to read about him on Linkedin