From zero to production in sixty minutes: Building a cloud platform for product development

GSK Tech
GSK Tech
Published in
8 min readJun 22, 2020

Across GSK we use several public cloud platforms for a number of reasons: to bolster operations and IT infrastructure, to uncover insights with data and analytics and to drive drug discovery using Artificial Intelligence (AI) and Machine Learning (ML). Adoption of public cloud is an important part of our technology and digital strategy which ultimately enables our company mission to help people to do more, feel better and live longer.

Whilst technology adoption is important, to be effective it must be accompanied by process improvement and cultural change. This is very challenging for enterprise organisations, particularly in an industry that is highly regulated. At GSK, we’re committed to changing the way we think about and build digital products. Like many companies we have a history (we’re over 200 years old) of running digital products as waterfall projects. In recent years we’ve invested heavily to change this and build a culture of continuous learning and experimentation to transform how we deliver value to our customers. This transition is simple to talk about but complex to achieve. Digital transformations are rife with rhetoric of Agile, DevOps, Design Thinking and so on. Although all of these are important, we understand the value of context and we favour experimentation over cargo culting (using the buzzwords without living with actions). This ensures that we’re spending as much time improving process as we are complying to it.

Rapid experimentation requires rapid delivery

Modern digital product teams are centred around rapid cycles of discovery and experimentation (delivery). The result of continuous experimentation should feed learnings back into the discovery process. Sometimes this is described as dual-track Agile and can be visualised with the diagram below.

Dual track Agile

To support product teams working in this way requires delivery to be as frictionless as possible so that there is reduced waste in the value stream and we are quicker to gain signals that prove or disprove our hypothesis. Engineers must be able to build their software experiments and push them into production at speed. To achieve this they require a high level of automation to reduce manual processes and eliminate human error. This is an area where adopting a public cloud platform can have a huge impact.

When our team was first established at GSK we set out to build a cloud platform to enable our product team (a cross-functional team of product managers, engineers, designers and a user researcher) to work in short cycles. To align on “what good looks like” we set out, as a team, to create a vision for our cloud platform: our 0–60 vision.

What is 0–60?

Can a team provision infrastructure and get a product or service idea into production within 60 minutes with a fully automated pipeline and with security, logging, monitoring, metrics and alerting baked-in?

The 0–60 vision is our north star. It ensures that the platform that we build is focused on enabling our builders to move at speed by quantifying what speed actually means when it comes to provisioning infrastructure. Without a vision we risk:

  • Lack of alignment
  • Building a platform that doesn’t meet these needs
  • Putting unnecessary friction in the way of rapidly delivering value for our customers

0–60 isn’t about defining the tools that help achieve the vision. Instead it helps define what “good” looks like for us and should act as a conduit to our decision making.

Speed is at the core of the 0–60 ethos but it’s not the complete picture. It must also encompass what we value and expect of a quality “production-ready” software system. Trust is one of our core values at GSK. If our systems don’t meet the required level of quality through automated testing and observability then we risk eroding that trust, or worse. Speed and quality may sound mutually exclusive, but in effective software delivery, they are complimentary. Having quality built in ensures when we move at speed, we do so safely. The pairing of speed and quality are essential in the 0–60 vision.

Speed and quality of delivery also have a strong evidence base in helping organisations achieve profitability, productivity, market share, operating efficiency, customer satisfaction, quality, and achieving organisation or mission goals. Accelerate [Fosgren, Humble, Kim, 2018] represents a study of this area and demonstrates that those organisations that achieve a high level of speed and quality are 150% more likely to meet or exceed the organisational goals.

How are we achieving this?

The 0–60 strategy is not about tooling, it’s an ethos for how we build our platform and deliver software. That being said we have adopted certain tools that have helped us achieve this. For example, Terraform, to automate the deployment of immutable infrastructure, and Google Cloud Platform (GCP), to handle the operational burden and leverage their vast array of products. This post won’t delve into the technical detail; that’s not the point of the 0–60 strategy and we will share future posts that detail the implementation of the 0–60 vision.

In order to achieve this vision we take an opinionated approach. This reduces the cognitive load on teams and the time it takes to provision infrastructure and deploy production-ready products and services that are secure and compliant.

Imagine you’re a software engineer and you want to create a new app and deploy it to GCP. How do you get started and what do you need to do? There are currently two steps that the team are required to follow.

0–60: Step 1 — Creating a Google Cloud Platform project and Infrastructure-as-code Github® repository

The first step for a team is to be on-boarded onto our GSK GCP organisation and provision their environments. We have automated this process using a Terraform module that we call a “Project Factory”. The Project Factory automates the creation of consistent, secure and compliant projects within our GCP organisation. The engineer will configure a file in this module with their specific requirements (project name, team name, billing account, cost centre etc) and the module then builds and bakes a range of features in, including least-privileged permissions, DNS zones, auditing and many more.

Once the team has submitted the pull request for their project, it is peer reviewed by a member of the cloud infrastructure team. Upon approval the changes are pulled into a Continuous Integration/Continuous Delivery (CI/CD) pipeline where some automated checks are run by Google Cloud Build to ensure syntax and compliance to standards before Terraform is executed to provision the GCP Project(s).

With a successful GCP project control plane in place for the team, a process is automatically started to create a Github® repository to house the teams infrastructure as code for their project. This doesn’t require any action from the team or any human intervention. To achieve this we use a combination of Google Pub/Sub, and Google Cloud Functions but we won’t delve into that detail in this post. With the Github repository created it is then automatically scaffolded and bootstrapped by another terraform module and template files to create the minimum infrastructure necessary to run terraform CI/CD pipelines. The team are now in a position to make any required changes to their infrastructure by altering their Terraform code in their newly, and auto-magically, created Github repository. When these changes are committed to Github they will run through a Google Cloud Build CI/CD pipeline to be deployed. Giving the team a fully managed infrastructure and deployment pipeline as code.

The image below highlights these steps, the timings associated with each step and what the team gets baked-in for free.

Flow of creating a GCP project plane and Terraform bootstrapped repository

0–60: Step 2 — Creating and deploying an app or service

Having created the teams GCP project to house their infrastructure and the infrastructure and environments themselves the team is left to create the application or service. Whether that be an API, a service, a website etc. To expedite that process and at the same time ensure I high level of standardisation (without locking teams down a path) we created another application bootstrapper, using Yeoman®, that creates scaffolding for a project based on the requirements of the team. The team run through the bootstrapper, answering a series of questions and the scaffolder will output the basics of the application including Dockerfile, Kubernetes manifests and a CI/CD pipeline manifest. This allows our standards to be baked in, reduces the amount of work for the engineer and allows them to concentrate on the value-adding business logic and not the routine tasks (as important as the routine tasks are).

The boostrapper currently supports the following languages that it will scaffold:

  • NodeJS + Typescript + Yarn
  • NodeJS + Javascript + Yarn
  • .NET Core 2.0 + C#
Flow of creating a bootstrapped application or service on GCP

The gif below shows me running our app bootstrapper and the options currently available for a team. In time we will expand on the offerings available.

Once the application is committed to Github it will automatically run through a CI/CD pipeline. By default this generated pipeline will go through OWASP ZAP tests and automated compliance testing using InSpec before being deployed to the teams project environments in GCP (detailed in step 1).

So what next?

Our platform offering will never be “done” but we have made huge strides in the automation, provisioning of infrastructure and bootstrapping applications that gives us a great starting point. This enables our product engineering teams to significantly decrease their cycle times to deliver value to our customers. Previously this work would have taken many months with many manual handoffs involved in the value stream and we’re now in a stronger position with a self-serve developer experience to provisioning fully immutable infrastructure as code than we were before.

Underpinning the simplicity of use for product engineers has, understandably, resulted in managed complexity for our platform engineers and that complexity isn’t without cost. To minimise this we’ve mostly leant towards the adoption of cloud native offerings, sometimes referred to as “no-ops”, and so we’ve shifted the operational burden onto the Google Cloud Platform where possible. An opinionated approach like this may seem restrictive and in time if we have a valid reason to move away from managed services we will balance our decision making with the operational burden required of the team. In time we would also look to support serverless workloads such a Google Cloud Functions or Google Cloud Run.

--

--