Data Center (DC) app reviews — automating away repetitive & time consuming tasks

This is the journey of how we reduced the cycle time for Atlassian DC app reviews from 2 weeks to just 3 days.

Jerome Rodrigo

Published in

ServiceRocket Engineering

6 min readJul 13, 2022

DC App Reviews

Before we begin, what are DC app reviews?

DC app reviews refer to the process of running performance tests on an app, collecting the results, and submitting them to Atlassian in order to publish a DC compatible app on the Atlassian Marketplace. The same process is also a requirement for existing DC apps, albeit on an annual basis. Atlassian provides their overview of the topic here.

Why the need to scale?

We have a vast offering of ServiceRocket apps on the Atlassian Marketplace. This currently includes 9 Confluence DC apps that have already been approved for DC compatibility and performance.

ServiceRocket Apps on Atlassian Marketplace

With this many apps, taking 1–2 weeks for a single DC app review was not going to cut it. We needed to find a better, more efficient way to handle the review process. Fortunately, Atlassian provides a decent solution.

DCAPT — Data Center App Performance Toolkit

In order to facilitate and standardise the testing process for the vendors of apps on the Marketplace, Atlassian introduced the Data Center App Performance Toolkit (DCAPT). This is a framework that provides automation for setting up an infrastructure stack. The stack is based on the Atlassian Standard Infrastructure (ASI) which is designed to be provisioned on AWS. For more information about DCAPT, visit the GitHub repo here.

While DCAPT is an essential starting point for performance testing DC apps, we found a few issues:

Provisioning is very time consuming, sometimes taking up to 9 hours to provision the Enterprise-scale environment.
Manual provisioning is also error prone. This is worsened by the fact that the guideline spans almost ~100 separate steps.
Keeping a DCAPT cluster running is expensive. Atlassian provides some hourly pricing estimates here, however it means the longer we keep the cluster running, the more cost we have to bear.

Despite these issues, this provided us an opportunity to automate the process. Among the advantages of automating this process:

Each app review goes through the same review process.
Automation can free up engineer time.
Low chance of human error as there are no manual steps involved.
Provisioning can be done outside of working hours.

Hence, we decided that automating this process was the best way forward.

How we did it

Before getting into the details, here’s a brief step-by-step overview of DC performance testing along with the average time taken.

Provision the test environment/instances (~ 9 hours)
Execute the performance tests (~ 45 minutes per test, ~ 4 hours in total)
Generate a report based on the test results (< 1 minute)
Submit the results to Atlassian (< 1 minute)

As you can see, the most time consuming activity is the initial provisioning of the test environment. This is largely due to the size of the dataset used for the test environment. The image below depicts the breakdown of the dataset as per Atlassian requirements:

The next most time consuming step is test execution. Although each test only requires 45 minutes of continuous load testing, it needs to be executed at least 3 times in different cluster scaling configurations to test for app scalability (e.g. 1 node, 2 nodes, and 4 nodes). In addition, we also need to run a regression test to measure the impact of the app being installed vs. not installed on the instance.

Hence, we decided to focus on automating the provisioning and test execution steps of the performance testing process.

Tools of the trade

Choosing the right tool to automate the process was quite straightforward. At ServiceRocket, we rely on Jenkins for CI/CD and AWS as our infrastructure platform. Hence, we decided to choose tools we already have, as we have the skills and procedures already in place without the need for large-scale re-engineering.

The Pipeline

At a high level, our Jenkins pipeline file consists of 3 main stages: provision, execute, and deprovision.

pipeline {
    stages {
        stage('Provision') {
            when {
                expression { params.STAGE == 'provision' }
            }
            steps {
                provisionStack()
            }
        }        stage('Execute') {
            when {
                expression { params.STAGE == 'execute' }
            }
            steps {
                executeTest()
            }
        }        stage('Deprovision') {
            when {
                expression { params.STAGE == 'deprovision' }
            }
            steps {
                deleteStack()
            }
        }
    }
    post {
        cleanup {
            cleanWs deleteDirs: true
        }
    }
}

Provision Stage

This stage is all about getting an enterprise-scale environment up and running. Below is some pseudocode of the provisionStack() function.

def provisionStack() {    createStack()    def outputs = getStackOutputs()    def url = getConfluenceURL(outputs)    def bastionPublicIp = getBastionPublicIp(outputs)    def nodePrivateIp = getNodePrivateIp()    buildImage()    sleep(FIVE_MINUTES_WAIT_TIME)    setupConfluence(url)    populateInstance(bastionPublicIp, nodePrivateIp)    sleep(FIVE_MINUTES_WAIT_TIME)    reindexInstance(url, bastionPublicIp, nodePrivateIp)}

Here are a few highlights from the provisioning process:

createStack()
The first step in the process is to create the initial infrastructure stack. We use the Atlassian Quick Start as a reference for launching the ASI into a new VPC. We exposed the quick start parameters as Jenkins parameters so that an engineer may be able to customise the configuration while starting the build.
setupConfluence(url)
By making use of Python, Chromedriver and Selenium, we’re able to simulate the clicks and user input needed during the initial setup wizard in Confluence. This way, no manual login is required — the script executes without human intervention.
populateInstance(bastionPublicIp, nodePrivateIp)
In this step, we log in to the instance via SSH. The bastionPublicIp is needed here as a jump box. We then mimic the exact steps for database setup as suggested by Atlassian here.
reindexInstance(url, bastionPublicIp, nodePrivateIp)
After the database has been set up, we trigger re-indexing in Confluence based on these steps. To avoid a few manual steps, we used the Confluence REST API to create the new page, and to monitor the indexing status periodically. The API requests are executed using curl.

Execute Stage

The goal of the test execution stage is to run the performance tests and save the results for report generation later on.

def executeTest() {    // Provision AMI instance to run test    def instanceId = provisionTesterInstance()    sleep(ONE_MINUTE_WAIT)    // Retrieve tester instance IP    def instancePublicIP = getInstancePublicIP()    // Run test    runTestsViaSsh(instancePublicIP)    // Upload results to S3    downloadTestResults()    runAWSCLIDockerImage("s3 sync . s3://my-test-results-bucket")    // Deprovision the instance    runAWSCLIDockerImage("ec2 terminate-instances --instance-ids $instanceId")}

Deprovision Stage

The deprovision stage is simple as it’s handled by AWS Cloudformation. We just need to trigger AWS to delete the top-level stack, and the rest of the deletion will be cascaded accordingly.

def deleteStack() {    def stackName = params.stackName    def awsCommand = "cloudformation delete-stack --stack-name $stackName"    runAWSCLIDockerImage(awsCommand)}

Conclusion

Since introducing this automation pipeline into our processes, we’re able to complete DC reviews in just a few days, giving us more time to focus on our apps.

In addition, we are able to scale this automation to ease the transition process to bring more apps to DC.

Future Improvements

Although most of the time-consuming steps are now automated, there are certain steps that require an engineer to execute. Among them are:

Installing and configuring apps
Setting up test data for app-specific actions
Upscaling the cluster via the auto-scaling group
Collecting the test results and generating the report

It would be nice to be able to have a fully automated pipeline, however some of these steps are dependent on the app being tested and may change from time to time. We plan to invest in continuous improvement of the automation but not too much as to avoid diminishing returns. This quote from Bill Gates sums it up quite nicely:

The first rule of any technology used in a business is that automation applied to an efficient operation will magnify the efficiency. The second is that automation applied to an inefficient operation will magnify the inefficiency.
— Bill Gates