The road to better onboarding for tech developers

Published in

Boozt Tech

10 min readMar 30, 2022

Written by: Peter Lind, Staff Engineer at Boozt

When I started at Boozt, mid 2018, it took me more than a day to get a local environment set up and be able to see the system I was to work on, in a browser. Getting to be productive was not straightforward — there were quite a few stumbling blocks along the way.

First, we had several different environments — both docker and vagrant. Second, we also had slightly outdated guides for getting everything installed and working. For some systems, we didn’t actually have guides — you asked your teammates how to get things working and they helped you set things up. Third, while development environments were reasonably close to staging and production environments, they weren’t part of the same pipeline with the same basis, so there was some chance for divergence — not to mention that updating also became an issue.

Development environments worked but they were flawed and caused developer unhappiness — your environment is supposed to support your work, not get in the way. Left unchecked, this kind of thing can get expensive: in terms of time wasted, motivation wasted, or even developer departure. The key thing to remember is that the best infrastructure is the one you don’t notice (also: make sure to let your SRE team know you appreciate their work — they’re busy working very hard to make sure you don’t notice their work).

A knock-on issue we saw was related to the division of labour: our developers were using the tools we had in place for setting up environments, but weren’t maintaining them — that was on our engineering/SRE team. That team, in turn, didn’t use those tools on a day to day basis. This created friction between teams: it’s annoying to have an itch you can’t scratch, and equally annoying to have to support something that isn’t a key area of concern for your team.

Where we wanted to go

The scenario you’re aiming for in development is a one-button-press CI/CD system: after you’re done coding, you push a button (git push in this case) and automated tests run — and in the same vein, with one push of a button you deploy your software. No fiddling around with scripts, no manual process to get lost in: if your code is bad, the tools tell you. If it’s good, the tools get it live without you having to worry about how.

That’s what we wanted for our development environment: a tool that would automatically set up your environment for you, bootstrapping data and running any prep scripts and commands needed. Getting new devs onboarded with the systems they needed to work on, plus data for it, should take a few hours tops, and should be something that could run in the background without interaction for the majority of that time.

Automated checking and fixing of environment

However, once you get something like this in place, it’s not done and dusted. It’s vital that you both keep updating your tooling, both in order to fix bugs and improve performance, but also to ensure that you keep up with the constant technology developments. There are constantly new versions of the tools you rely on getting published, and the ones you’ve been using will have security updates or go EOL. In order to support this change, you need to do the same for your own tooling that you do for your end product: automated tests checking that your features are working properly and that you don’t introduce regressions.

A point of focus was also that we wanted our developers to be able to take part in both maintenance and improvements. We wanted a culture of “If you’ve got a problem, fix it! Don’t just get annoyed, scratch that itch!”. There were several reasons for this — not wanting to block improvements or bug fixes to our environment, wanting to spread the job of maintaining the project across more people, and generally instilling a feeling of ownership in the teams.

How we’re getting there

A couple of our requirements — specifically cross platform support and developer ease of use — led us to create a CLI binary, using Go as the technology of choice.

The reasons were fairly simple: 1) we had a very experienced Go-dev taking on the first part of the project, and 2) with Go you get cross platform with easy distribution out of the box. This has helped immensely with getting the first and most important part installed on developer computers — the tool to run all the other tools.

In general, using Go as the primary technology for this project has been a boon — there are a lot of solutions in place for common problems, the standard library is very helpful, and there is a big community around Go. That said, Go has not been a silver bullet, for two reasons: 1) Go and the Go philosophy at times have their own shortcomings (like expecting everyone to recreate the same boilerplate methods in their own code, because it doesn’t fit perfectly in the language), and 2) the nature of our problem means that a lot of the solutions we need to create lie not within Go but in the operating systems of developers. Making sure that Docker is installed on a machine requires different solutions across different operating systems. Go will help you detect which system it’s running on — but you still need to handle the rest.

This is really the key to the project — we’re encapsulating the complexity of setting everything up, the combined knowledge of however many wiki pages with troubleshooting, in one binary, that will handle setup, day-to-day tasks, troubleshooting, and what other needs that devs may have. It allows us to install the proper tools, in versions that we know work, and to run them in the same way across different machines. When things break or just don’t work, we can debug known environments, instead of first having to establish what’s actually installed on a machine.

The architecture we went for is a CLI app, based off of a popular library. Major command areas have been sectioned into their own packages: handling database operations (data download, import, post-import jobs), data synchronisation (different strategies based on OS), service orchestration (starting, stopping, handling project dependencies), environment checks, updates, etc. Supporting functions and logic for handling environment checks and fixes have their own packages in the system, for better modularity and testing.

One of the benefits of this modular approach is that it becomes easier to string actions together. In order to get a one-button-push setup, you need to handle a lot of separate processes — how many depends on the project you’re orchestrating. One of our projects has nine service dependencies that need to be started up, and about half of them have databases attached. So in order to get this project running, a developer doing manual setup would have to start all 10 projects, download and import data, run post-import actions, and make sure that other dependencies (pubsub, datastore, search engine, etc) were also in place — and make sure that container environments were properly set. All of this is abstracted away for the developer — starting up the first time, all of this is orchestrated with one command, and after that you start or stop those services with another single command.

As should be obvious, one of the big issues in managing this is the number of parts involved. There are many different pieces of software that need to be installed and configured together — not only that, but container images and composer configs need to be managed as well, and need to be in sync with each other. To handle this, both our binary and all the assets for a release are tagged, so even though only the binary is distributed when it updates itself, all the correct versions of resources are pulled down. The way this is implemented also allows developers to easily rollback to an earlier version if there was a problem with a release. Equally important is that it allows us to do early testing, as the system supports downloading release candidates through the same mechanism. This allows us to easily add and test features on several machines across OS, and thus to rapidly release upgrades to our environment.

Having developers test environment upgrades and fixes locally before we deploy them to everyone has proven very helpful in moving us forward — using the same mechanism to allow for updating to both a release candidate and a new version makes it incredibly easy to fix an issue a developer is seeing, without troubling anyone else why the work is ongoing. No need to manually download a version or worry about the rest of the software matching up, no need to package up a zip file and send it. Looking at the design decisions made at the beginning of the project, that was definitely a good choice, along with tagging all resources uniformly.

The project has provided a lot of learnings for us. Some have been tied to architectural choices or software used. For instance, at first we tied container image building in with building the app itself. This made sense, because it pegged the container image version to that of the app. After using this for a good while, a couple of things became apparent — we were needlessly rebuilding images when the app changed without a change in image configuration, and if we wanted to use the images for more than local development we would need to decouple image generation from app generation. This has spawned off a new project to ensure we can update images when we need, independently of the environment app — with the further aim to streamline images from development across CI and test to production.

Another area of learning is cultural: while it’s been a goal from the start to democratise this project, it’s not easy to get developers used to owning their environment like this. Spreading the project ownership out beyond the core team where it initiated has been hard, and while some developers have been interested in both improving things and fixing bugs, for almost all developers it remains a piece of software that should just work. It can even be an issue to get feedback on what’s not working and should be improved, at times. This probably remains the biggest ongoing issue we’re facing with the project; something we’ll keep exploring solutions for.

… And beyond

When we started out, we had dreams — a list of features that would solve our problems and remove the obstacles that could make development bothersome. Along the way we’ve realised some of these dreams and made things a lot better and easier for our developers. Some features have yet to be implemented but are still planned, waiting to improve the development experience. And as always happens when you start on a project, you really only become aware of the possibilities as you go.

So in order to make development even nicer we’re planning to look at — among other things — the following features:

Automated end-to-end testing of the tools

Being able to run environment setup through from start to finish, verifying that the development environment will be up and running, serving requests from the local data dump … what’s not to love. It’s just another practice that we take for granted when developing the main product, but often forget when time comes to make sure the development environment is taken care of. Not only does it provide assurance that the next new hire will be able to get up to speed fast, it also provides key insights into performance — so we can optimise the onboarding process.

Coupling with the cloud

In much the same way that the local development environment needs a do-over, testing environments have the same need of a helping hand. What we realised is that there’s essentially very little difference between the local environment and the test environment — the big difference is which machine is running and how you’re connected to it. Previously, we’ve been relying on having virtual machines set up with environments ready, but this has always just caused issues, as it’s a lot of work to keep these updated. Not to mention the topic of sharing machines for separate projects. However, if you happen to already be orchestrating a bunch of services, it becomes possible to convert that into a cloud configuration you can run on kubernetes. And suddenly, you’ve got on-demand test servers with public interfaces that you can share with the rest of your business — at the push of a button.

Better diagnostics support

The counterpart to “It works on my machine” is “It fails on my machine”: when you’re no longer counting your devs on two hands and OSes on one finger, getting proper debug information from installations becomes key. We are lucky enough to not have been bitten by this too much, but it’s just a question of time before you’re really wasting resources doing remote debugging of environments. So, one of the upcoming projects is automated diagnostics gathering from machines running our CLI, so we don’t have to try to guess at which questions to ask.

We’re not done improving our development environments yet, and I doubt we’ll ever be. That’s as it should be — we won’t ever be done developing our main products either. The key point is instead: make sure tomorrow will be better than today. Keep removing the obstacles that annoy your developers. Keep making the day to day tasks easier, so people can focus on getting work done, instead of working around buggy environments.

That’s the real drive — helping the teams, seeing people happier as things just get better.

If you enjoyed this article, and want to read more great stories from the Boozt platform team, be sure to subscribe to the Boozt Tech publication!

Or perhaps you’re interested in joining our platform team? Then check out our careers page.