Team Evolution 2020 Roundup

Adam Connelly
ResDiary Product Team
6 min readDec 18, 2020

2020 has been one hell of a wild ride. The year definitely didn’t go to plan, and it’s been full of highs and lows. Despite all the troubles and adversity we’ve had to deal with, I think we’ve done a pretty good job at ResDiary of supporting our customers during this period, doing what we needed to allow the company to survive, along with still managing to make improvements to our software and the way that we work.

In this post, I want to provide a quick recap of some of the projects that my team, Team Evolution, has worked on, and to highlight some of our achievements.

Migrating to new Kubernetes Clusters

We were quite early adopters of Azure Kubernetes Service (AKS), and had created our original clusters not long after AKS became available. At the start of the year, we wanted to take advantage of a number of new features that had become available in AKS, in particular support for RBAC and Node Pools. Unfortunately Microsoft didn’t provide a straight-forward migration process, and in order to gain the Node Pool support in particular, we had to rebuild our clusters from scratch.

Because of this, as well as creating new AKS clusters, we had to come up with a plan for migrating all of our existing applications, taking into account the following:

  • Which applications were stateful vs stateless.
  • How to manage DNS changes to avoid any downtime.
  • How to rollback in the case of problems.
  • Transitioning our CI/CD systems over to the new clusters.

We performed an initial test migration where we setup new clusters and deployed our applications without removing them from their existing clusters, and then once we were happy with the plan went ahead and migrated our development environments first, and then finally production. We performed the migration of each environment over a two week period, and everything went to plan with no issues.

Another benefit — our AKS clusters are now defined using Terraform, making it much easier for us to create new clusters as we need them!

Reducing Resource Usage at the Start of Lockdown

Almost immediately after the highs of our successful AKS migration, the UK went into lockdown in March. We started to see our active customers dropping as restaurants were forced to close. The entire business worked together to try to reduce costs, with my team figuring out what pieces of infrastructure we could scale down rapidly. We had to play a balancing act between reducing resource usage, while also trying to avoid impact to customers who were still using their diaries.

I’d like to say thank you to everyone involved for all their hard work during this difficult time. Because of their efforts we managed to rapidly cut our infrastructure costs almost in half for the worst periods of lockdown over the summer. In particular I’d like to give a shout out to our intern Nici. Nici was in the middle of a year long placement with us when we went into lockdown, and as a result this was one of the last tasks that he worked on. Nici is finishing his final year at university now, but hopefully we’ll get a chance to work with him again in future!

Azure Data Factory

Over the summer we began to experiment with using Azure Data Factory for a few scenarios where we needed to extract large amounts of data for certain partners, or where we needed to pull data from multiple sources to perform analysis. Our database expert, Paul McHenry, used it to create a number of scheduled exports for one of our partners. The jobs he created have been running happily with no problems ever since, so well done Paul on some great work!

Switching to VMSS Instances

During the summer of 2019 we began experimenting with using Virtual Machine Scale Sets (VMSS) to deploy some of our applications that were running on manually provisioned VMs, but that would be quite difficult for us to run in Kubernetes in the short term. After running some of our lower volume environments on VMSS instances, we decided to migrate the Australian and UK web servers during 2020.

Scale Sets allow you to provide a VM image, and provide custom provisioning scripts to run when each VM in the set is created or upgraded, allowing you to deploy your application to the VMs. This gives you the benefits of consistent deploys, along with the ability to scale to meet user demand.

The migrations went to plan, minus an initial false start with our UK migration. Shortly after starting to switch traffic over to our new servers we started to notice increased failure rates and latency, causing us to quickly rollback. Big shout out to Elliot from Team Evolution for spending the time to try to understand the issues we were hitting, figure out how we could get metrics to help diagnose it, and finally coming up with a solution. After his investigation we were able to successfully complete the migration a couple of weeks later with no issues.

Converting Notifications to .NET Core and Kubernetes

We’d already started running our new applications in Kubernetes, but wanted to start converting some of our existing applications. We decided to start with our “Notifications” application, that sends webhooks to API integrators to provide them with real time updates of events like bookings being created or updated.

The credit for most of the work on this goes to Paul Lang, the Team Lead of Team Curo, along with my team member Lewis. Between the two of them they successfully converted the application, and also added structured logging using Serilog along with metrics using Prometheus. Here’s a screenshot of part of the dashboard Lewis created for it:

The result is an application that’s much easier to deploy, scale and monitor. I particularly want to give a shout out to Lewis for all the effort he put into designing the metrics and alerting we need to make sure the application is healthy.

Other Teams Using Metrics

This isn’t really a concrete task that Team Evolution worked on in 2020, but I feel like it shows that the work we’ve done on metrics and logging over the past couple of years is paying off, and is now becoming useful to the wider development team. We’ve been starting to see the other teams thinking about metrics, and building them into their applications right from the start, which I think is really exciting!

Helm 2 to Helm 3 Migration

We use Helm to deploy applications to our Kubernetes clusters, allowing us to template our Kubernetes resources based on the environment and region they’re being deployed to. In November this year Helm 2 was finally deprecated, so we decided to switch to Helm 3 ahead of that date.

There isn’t really much to say about this, other than that the migration was completed ahead of the deprecation date with no issues. We’ve noticed a few useful improvements in Helm 3 like better validation of Kubernetes objects, but unfortunately one of the biggest pain points for us — installing and upgrading Custom Resource Definitions — still appears to be broken from our perspective. We were really hoping that Helm would handle automatically installing and upgrading CRDs, but sadly we’re still stuck running manual scripts outside Helm to do that.

Wargames for On-Call Training

In October we began organising wargames for our on-call team to help us improve our ability to deal with support incidents, along with creating something that we could get together remotely and have a bit of fun. Here’s a screenshot of the inaugural games, showing a nice mixture of concern, confusion and amusement:

During 2021 we’ll need to work on steadily increasing the realism, and working on more complex problems, but I think we’re off to a pretty good start, and we’ve definitely learned something in each game.

Welcoming Tho to the Team

Last but not least, we got to welcome Tho to Team Evolution! Tho has joined us from our sister company DishCult, and has been working on creating integrations with third parties, and more recently starting work on a project to improve the experience for API developers integrating with ResDiary. He’s already made a great impression and I’m excited to see what we can achieve together next year!

So that’s it for 2020. Despite the difficulties we’ve all faced, I think we can be really proud of our achievements. I’d like to finish by thanking all my colleagues across the entire company for their help, support and dedication this year.

Originally published at https://adamconnelly.github.io.

--

--