Performance Testing AWS Deployments

Shreyas Chaudhari
May 2 · 6 min read
Janjira Fort, Murud, Maharashtra, India. Built in the 15th Century.

The overall architecture of our product is like this photo above.

Goals

  1. Ninety-nine percent of the page-load times for the users accessing the applications should be less than or equal to two seconds.
  2. Users are based out of continents — North America (New York), Europe (London), Asia (Singapore).
  3. ‘n’ Concurrent users should be able to use the application without any page-load lags.
Architecture

Components

1. CRM — Researchers enter the data.

2. Microsoft SQL Server — two databases

a. CRMDB

b. UserIdentityDB

3. Scooper — Component that picks up data from the CRMDB and puts it in the Event store.

4. Subscriber — Components that are listening to the Event Store, who pick up the data from the Event Store and push it into MongoDB and Elastic Search.

5. Middle Tier — Services that pull the data from MongoDB and the Elastic Search and send it to the Front End in the form of REST APIs.

6. Front End — Data entered by the Researchers to be displayed.

Architecture

1. Researchers enter the data in the CRM.

2. Data is thereby stored in Microsoft SQL Server Database — CRMDB.

3. A dedicated table is created in the CRMDB mentioned in Step 2, wherein the data entered in the original schema of the CRMDB is stored in the format for Event Sourcing.

4. Scooper/Batch Scooper does the job of picking up the data from the Dedicated table from CRMDB in Step 3 and pushes it to the Event Store.

5. Subscribers pick up the data in the Event Store and push it to MongoDB and Elastic Search.

6. On the other end, there is a Service-oriented architecture, in which each of the services fetches the data from the Elastic Search and provides the data to the front end in the form of REST APIs.

7. Front end is a single-page application that consumes the data sent by REST APIs and renders the page.

8. There is a dedicated Identity server, which manages the identity of the user across all the applications the user has access to as a part of the subscription. The database that holds details related to the user identity is present in the Microsoft SQL Server Database mentioned in Step 2.

Technology Stack

1. Scooper/Subscriber/Services — .NET Core

2. Event Store

3. MongoDB

4. Elastic Search

5. Amazon Elastic Search Service

6. ReactJS

7. Amazon EC2

8. TeamCity

Test Environment Configuration

The configuration of the production-like environment that we used for running our performance tests was as in the table below. Details about each of the Instance types can be found here.

AWS EC2 Instances Configuration

Performance Testing Tools

  1. JMeter
  2. BlazeMeter
  3. Selenium Web Driver

Performance Test Types

  1. Load Test — It is conducted for validating the performance characteristics of the system when subjected to workload/load volume that is anticipated during the production load. Conducting this type of test before releasing the system to the market gives confidence and mitigates the risk of losing business due to performance issues.
  2. Spike Test — It is conducted to find out the stability of the system when it is loaded in bursts of very small time, releasing the load quickly.
  3. Endurance Test — It is conducted to find out whether the system is capable of handling expected load without any deterioration of response time/throughput when run for a longer time. Conducting this type of test before releasing to the market gives the confidence on availability and stability of the system.
  4. Page Load Test — It is conducted to compute the page load times from the end user experience—when the desired number of concurrent users generates a specific amount of load.

Strategy

Replicating the load for ‘n’ concurrent users is the first task, especially in the scenarios wherein the number of concurrent users will be on the higher side. There are two approaches for achieving this:

  1. Replicating the end-user behaviour by having automated UI tests that perform operations on the application as the real world user would do. Using a browser automation library like Selenium Web Driver for writing UI automation, which in turn will mimic the end-user behaviour is an option. This can be Selenium Grid setup, which will drive user flows on multiple machines.
  2. Second approach is wherein load can be generated on the server for the ‘n-1’ concurrent users using REST APIs. Then have just one user to execute the UI test flows and thereby compute the page load times.

There may be pros and cons for taking up either of these approaches. However, we opted for the latter.

JMeter Test — REST APIs

Let’s take the example of all the REST API calls made when a specific page loads, as in the snapshot below.

JMeter script for the the corresponding page will be as in the snapshot below. It consists of all the REST APIs chained together when a specific page loads.

The JMX file in the snapshot above is one-to-one mapping for every API call made in the network tab for the corresponding page load. Multiple such JMX files were created, which consist of REST APIs chained when the specific pages loaded.

Selenium Web Driver — Page Load Test

The UI test would navigate to the specific page in the application; note the timestamp before clicking on the page, then click on the link, and then note the timestamp when the page load is completed. This script used to wait for the entire page to be loaded— left to right and top to bottom.

The overall performance test setup was as in the snapshot below:

The performance test setup consisted of the BlazeMeter REST API tests mimicking the behaviour of ‘n-1’ users. The users were equally distributed among London, New York, and Singapore. At the same time, there were AWS EC2 instances spawned in the same three locations. There was a Selenium Web Driver test executed on each of the EC2 instances, which would help to compute the page load times. The Selenium Web Driver tests would perform the following steps:

  1. Launch the application URL and login.
  2. Navigate to the page on which the link to be clicked is present.
  3. Note the timestamp (T1).
  4. Click on the link.
  5. Note the timestamp (T2).

Actual Page Load Time = T2 — T1.

BlazeMeter Endurance Test

In order to track the performance of the AWS infrastructure, the following parameters were noted on the AWS console.

Elastic Search — Cluster Status

Elastic Search — Search Rate

Elastic Search — HTTP_Requests_By_Response_Codes

Elastic Search — Indexing Latency

Elastic Search — Master CPU Utilisation

Elastic Search — Master JVM Memory Pressure

Elastic Search — Data Maximum CPU Utilisation

Elastic Search — Data Maximum JVM Memory Pressure

MongoDB — CPU Utilisation

Public API— CPU Utilisation

Using the above setup, we were able to performance tune our AWS deployments, uncover the performance leaks, and better optimize our infrastructure.

Shreyas Chaudhari

Written by

Lead Consultant @ ThoughtWorks | Twitter : shreyasc_tweets | Instagram : shreyasc_clicks

Better Programming

Advice for programmers.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade