SafetyCulture’s Development Environment using Kubernetes
At SafetyCulture we’ve had to constantly tackle the problem of how to provide scalable, reliable, cost-effective, up to date, on demand development environments to increase pace of development without impacting quality. In this post, we will cover how we are now able to provide such environments to our engineers as part of our “journey with Kubernetes series”.
Providing environments that replicate our production environment allows engineers to confidently and rapidly test isolated changes without impacting our customers.
Evolution of SafetyCulture’s Development Environments
Prior to SafetyCulture’s journey with Kubernetes, our development environments have evolved through the following stages:
- Hosting a single environment shared by all engineers for development.
- Creating a virtual machine containing all services and using vagrant to run everything locally.
- Using docker compose to run all services locally.
With a growing number of services (we are now at 150+ services) and engineering teams, these environments would not scale and with each iteration, we would continually face the following challenges:
- Environments were not kept up to date with latest production versions.
- The time for new engineers to get up and running with their environments was long and tedious. The process was very brittle, would regularly fail and was often very hard to debug without specialised knowledge of the entire setup.
- Engineers were unable to run the full stack of services locally due to resource constraints.
- Engineers were unable to quickly demo changes to other team members particularly when working remotely as all changes were running locally.
- Having a single shared environment constantly involved conflicts between teams which made it complicated to coordinate thorough testing before production releases.
Development Environments using Kubernetes
Introducing SafetyCulture’s next generation of development environments which involves running multiple environments in a single Kubernetes cluster and utilising namespaces as a logical separator for individual environments dedicated to teams or engineers. Each environment (namespace) contains all services and data stores required to run SafetyCulture’s platform with the underlying Kubernetes infrastructure shared across environments as seen in the diagram below.
To create the cluster and namespaces, we reused the tools and methods discussed in The SafetyCulture journey to Kubernetes — Part 2:
- Terraform to manage the Kubernetes cluster and any additional cloud resources.
- Additional custom scripts to perform bootstrapping of our data stores.
- Helm and Helmsman to deploy services with internal helm charts.
Using the above setup, we are able to easily create new environments and are able to scale out across multiple Kubernetes clusters as our Engineering organisation continues to grow.
The Impact of Kubernetes Development Environments at SafetyCulture
- Each team has full control over the state of their namespace with the ability to deploy or update any version of any service without impacting other team namespaces. This gives teams confidence that there are no other changes that could be impacting their tests during early stages of the development process.
- Engineers can rapidly make changes to services with fast feedback as we have built additional tooling on top of telepresence, allowing engineers to run any combination of services locally as if they were running in their namespace with bi-directional communication with the rest of the services as illustrated below.
- New engineers are able to begin developing a lot quicker as new namespaces can be provisioned within an hour. This allows engineers to only focus on running a single service locally without the need of configuring all its dependencies. This has significantly reduced the amount of time required to onboard new engineers.
- Engineers can be confident that their namespace is reflective of our production environment as we have automation in place to ensure changes deployed to production by other teams are automatically updated across all team development environments.
- We are able to create environments for engineers to experiment and test out new innovative ideas which we use consistently as part of our company hackathons.
Overall, this setup has significantly improved the productivity of our Engineers here at SafetyCulture and it is now a critical resource to the Engineering team. Currently we have ~30 active environments running in a single Kubernetes cluster with over 5000 pods running daily.
Lastly, I’d like to share a few tips for those who look to embark on this journey:
- Create tooling to make it as easy as possible for engineers to get setup as well as interact with their environments during their day to day workflows.
- From an infrastructure point of view, use spot instances (if running in AWS) where possible & scale the replica count of deployments to 0 when environments are not being used. This will help reduce the cost of running such environments dramatically.
- Ensure adequate training is provided to teams on how they can use and interact with these environments. Onboarding teams to use the environment can be a tough process and it is essential to listen to the feedback provided by developers when they use the environments.
- Finally, run a daily test of the environment creation process to ensure that any issues with the process are identified and resolved as quickly as possible.
At SafetyCulture, we are excited to come to work and the value we bring to the business. We are hiring!