Evolution of Jetty Deployments

Linda Kung
Jetty Product & Engineering
7 min readJul 1, 2022
Photo by Eugene Zhyvchik on Unsplash

Intro

In 2019, Jetty had only two repositories: a frontend monorepo and backend monorepo. Each repo contained multiple applications that had to be deployed separately.

When deploying to production, developers had to run four commands in order to deploy to the four EC2 instances of the backend app for all code changes to apply:

jetty deploy-prod app1jetty deploy-prod app2jetty deploy-prod app3jetty deploy-prod app4

With only two repos and a team < 10 engineers, this process seemed fine because we ran into very few issues on deployments. It became the status quo. However, as the team expanded beyond 10 engineers, this process became a bit troublesome because an engineer would unknowingly deploy only a single app to the backend on what is usually their first ever deployment. This would cause production issues that took an entire village to debug since logging was more primitive in the beginning. In addition, maintaining the deployment tool was not a priority for such a small team with a large backlog.

As Jetty products began to expand in 2020, we revisited the monorepo architecture and decided it was time for a change. This also meant revisiting our deployment process and internal tools.

The history

From the beginning, we utilized CircleCI for our continuous integration and deployment (CI/CD) pipelines as well as an internal command-line interface (Jetty CLI) tool. Jetty CLI was packaged in a bin script that developers had to download and install locally. As the products and the team changed, maintaining Jetty CLI proved to be a bit challenging. It was written at the inception of Jetty and could not be easily updated because every change meant updating the binary file in S3. When the file was updated, the developer would also have to update their local bin script to take advantage of the new features. As the team grew and as our security and permission requirements evolved, different problems started arising for each developer, and maintaining the tool became unsustainable.

In 2020, we started to expand our services and, to help us scale our architecture, we decided to adopt a combination of microservices built with the Flask framework and serverless applications with AWS Lambda. We would also continue to maintain our existing monorepos with the goal of deprecating them in the near future. All of the AWS infrastructure, including serverless applications were built with AWS’s Cloud Development Kit (CDK) which enabled us to deploy AWS resources with Python and empowered our developers to create infrastructure with code.

We also made a collective decision to eventually wean off of CircleCI and Jetty CLI. We broke this down into a few phases throughout 2020 and 2021.

Starting over

Phase 1: Repository level scripts

Microservices

For our new microservices, we added deploy scripts in each repository instead of extending Jetty CLI. However, deployments to our EC2 instances would remain the same: package up the code, push it to S3, and then deploy with Codedeploy via Circle. The script would live under a bin/ directory in each repository and be triggered by the developer:

cd flask_apibin/circle stage # begin circle workflow that pushed the app to Codedeploy

Serverless

We used the AWS CDK Toolkit to create and deploy cloud infrastructure. This included creating Lambdas and API Gateways. Similarly with microservices, our serverless applications also used a deploy script at the repository level instead of Jetty CLI. When developers deploy a serverless stack, they must download all the dependencies on their local computer and run:

bin/deploy stage

The underlying script runs the cdk deploy script which creates the Cloudformation template and resources in AWS:

cdk deploy stage-serverless-stack -c env=stage

These new scripts did not change the fact that our developers still had to deploy from their terminal. They still had to log in to AWS, keep branches up to date, ensure that CDK was installed, and that the CDK configuration had the correct build and run time variables. It was a headache to debug different issues on different computers remotely. However it served us fine as the pool of applications stayed small and documentation stayed up to date.

Phase 2: Github Actions “one click deploys”

After a few months of repo level deployments and the delivery of Jetty Deposit, the SRE team decided to move from CircleCI to Github Actions. With Github Enterprise, we were able to make use of some features such as environments which was in beta at the time.

The vision was a “one click deploy” workflow which meant the team should never have to pull any code or run any commands in the terminal to deploy a project to our three environments — dev, stage and prod. There are dozens of events to trigger a Github workflow. Our deployment workflows utilized workflow_dispatch which is the manual event. Our engineer would navigate to the Actions page in the repo, select the workflow and simply click Run Workflow.

For our serverless applications, we created our own self-hosted runner which is essentially a private EC2 instance with an IAM role with the correct permissions attached to run and create resources in AWS. We followed the exact self-hosted runner instructions provided by Github.

Microservices

The Github workflow for microservices didn’t change extensively. In the diagram (Diagram 1) below, the flow ultimately remained the same. We were able to use Github’s ubuntu-latest runner and existing AWS keys. On ‘Run Workflow’ click in the repo dashboard, the workflow in the production environment performs the following jobs:

  • Job 1: Checks that the actor triggering the workflow is on the engineering team.
  • Job 2: Runs build and unit tests
  • Job 3: Pause for engineering approval. Once approved, deploy
  • Job 4: Send notification on deployment completion and job status.

The biggest process change for us was the approval step. This was a step we have always wanted to include in deployments to production. Otherwise, they were unregulated and deployments could go out at any time without much oversight.

Diagram 1:

Diagram 1: Prod deployment process with approval
Diagram 1: Flask API deployment process with manual approval

Serverless

Deploying AWS resources required a self-hosted runner. We set up an EC2 instance with an IAM role attached and installed the actions-runner on the server. Another fun feature we took advantage of in the workflow_dispatch trigger were option choices. Some of our serverless repos contain multiple Cloudformation stacks that are deployed individually. With the options input, we were able to create a dropdown of choices the developer can select on deployment:

Using Choice Input

This removed the need to type in an input, prevented typos and saved some time. After selecting which stack to deploy, the workflow would be triggered:

  • Job 1: Checks that the actor triggering the workflow is on the engineering team.
  • Job 2: Run `cdk diff…` to compare the specified stack with the deployed stack.
  • Job 3: Pause for engineering approval. Once approved, deploy
  • Job 4: Send notification on deployment completion and job status.
Diagram 2: Deployment process for serverless apps with approval
Diagram 2: Deployment process for serverless apps with manual approval

Going forward

The transition from CircleCI to Github Actions took about 4 months. It was a pretty seamless transition once the Jetty CLI could no longer be used and developers realized they could just click to deploy from the working repository and no longer had to install anything on their laptops.

This change did not eliminate any existing deployment issues such as failed instances or AWS resource permissions but troubleshooting became much easier since logs exist on both the Github and self-hosted runners. We are able to filter those logs into Datadog and set up alerts for them. There has also been more transparency about which branches have been deployed and what environments they have been deployed in which lessens confusion on the team.

However, moving workflows to Github Actions is only the beginning. There are many more features we have not utilized yet and existing actions that we haven’t taken full advantage of yet:

Reusable workflows

For the first iteration, we copied and pasted the workflows into each repository or used composite actions. We are currently in the process of adding our workflows to an internal repository that can be reused in all repositories that are in our organization. This allows us to standardize our workflows across repos and make any updates to workflows in a single place.

Multi-account deployments

This is already in progress as we scale out our self-hosted runners. This would allow us to deploy to any Jetty AWS account with a single workflow and utilize existing AWS actions.

Monitoring

We currently only track logs from Github’s hosted runner and the actions-runner package in our self-hosted runner. We would love to introduce better alerting and logging particularly when we start using only self-hosted runners. This would provide more transparency for the entire engineering team.

Scaling

Scaling self-hosted runners to support multiple deployments. Github provides a guide to auto-scaling self hosted runners.

Conclusion

Overall, the move from CircleCI to Github Actions was both a big and small change for our team. Deployments were not interrupted and very little code had to be adjusted meaning the migration did not block the workflow of our developers. Developers have also been vocal about the ease of ‘one click deployments’ and not having to worry about their own environment when releasing. Being able to monitor deployments directly in Github has lessened some of the pain points for tech leads and the SRE team. We look forward to the many things Github Actions can do for us as we move forward.

See you in the next phase!

--

--