Building a JS Performance Monitoring Tool — Part I

Susan Thai
GumGum Tech Blog
Published in
5 min readAug 19, 2020
Photo by Luke Chesser on Unsplash

Introduction

JavaScript is at the forefront of creating a beautiful user experience or a horrific latency filled nightmare once a site loads. It holds great responsibility for implementing complex features, adding interactivity, and the like on a page. And as web technologies progress over the years, so does the amount of JavaScript at play. So it’s vital that we know what we’re releasing to the world is not one of the culprits stalling the browser’s main process. The only way we can do so is by empirically verifying it with meaningful data. To add, if we can measure it, we can improve it.

This article details our journey architecting a JavaScript performance monitoring tool by leveraging some existing third-party products.

Evaluating our needs

The first part in our journey to building a performance monitoring tool is to evaluate our business requirements for the project:

  • Segmentation — Involve as little of the other internal teams as possible (we don’t want to increase their workloads) and have the ability to isolate the data that we need without having third party dependencies
  • Usability — Gather what we deem as important data
  • Reporting — Provide visual representations of the data we gathered for easy digestion
  • Cost Efficiency — The solution must not break the bank in dollars or development hours

To expand on the usability branch of our needs, the types of data we deem important include the ability to track the time elapsed between certain events as well as contextual data. Time elapsed events we are concerned with are between our script tag loading and making certain http requests, as well as how long it takes from the moment our script receives an http response to when an element is rendered (using the response’s instructions) onto the page. We are also interested in seeing if certain environmental contexts (e.g. working in an AMP context) impacts our JavaScript performance.

Once we have this data, we are capable of knowing our baseline JavaScript performance. When we subsequently make and release new features, we can track if the latest release improves or decreases our JavaScript performance.

Why not use an already built service?

Before we actively considered designing our own tool, we had a trial contract with SpeedCurve. SpeedCurve was an online subscription service we used that helped monitor and give continuous feedback on our front end performance. Though it had a beautiful dashboard that displayed our logged metrics, we found trouble customizing it to fit our needs in our existing system.

Another service we trialled with was New Relic. We were already utilizing this service to monitor our server’s performance so we were hoping to tap into this existing framework. With New Relic, we were able to log the types of metrics we desired by simply making https requests with the data passed as a post body. Though promising, we ended up not going with them as the types of visualizations offered at the time were not what we were desired.

We continued looking into various other avenues. External solutions often checked off the reporting requirements but missed cost efficiency and/or usability. As a result, we settled on creating our own service that covered the points mentioned above.

Architecture

Since the team was familiar with quite a few AWS product offerings, we decided to continue utilizing them.

As such, our performance monitoring tool is comprised of the following for data collection and storage processes:

  • API Gateway for API management
  • Lambda to process and upload the data
  • S3 for data storage
  • Glue as the ETL service to prepare and load data for analytics
  • Athena for querying

For the visualization part, we use an analytics platform called Looker which hooks into Athena to query the data (the visualization process is explored in more detail in part II of this series).

To explain our choice in using API Gateway and Lambda is rooted in the cost and segmentation branch of our needs. By using these two products to create a simple API instead of relying on creating one in our existing server, we keep a separation of focus and avoid adding clutter to an already complex server logic. And the reason we choose to use Lambda over a server-based resource is that 1) it allows us to run code without provisioning or managing servers and 2) cost is relative to the compute time consumed. Since the workload is light (we do some simple data processing), Lambda is an obvious choice over a virtual server-based resource like EC2 in terms of cost for us.

The flow starts when our JavaScript logic gathers what we deem vital data. This data is formatted then sent as a JSON object in a post body to our API Gateway. API Gateway then passes the entire JSON object to our Lambda.

In the Lambda, we set up logic to map the data to more human friendly property names and do a quick check for specific data points (to confirm that certain data points exist in the object). If the data passes the check, we invoke an AWS’ REST API called pubObject to add our file into our S3 storage.

AWS Glue crawler is scheduled to run which crawls the data stored in our specified S3 bucket. The crawler generates a metadata table and registers this table in AWS Glue Data Catalog (which is a persistent metadata store) and defines our data’s schema. This step is necessary as it enables Athena to run queries on the data. From this point on, we can either manually run queries in Athena with standard SQL functions or have Looker pull the data with Athena to display graphical representations.

Challenges/Difficulties

One of the challenges we faced going this route was with Athena and malformed data. In a recent case, a couple instances of data we logged were incorrectly typed which broke the Athena query. For example, we expected a property value to have type ‘int’ but instead was found to be of type ‘string’ or ‘bigint.’ To fix this issue we started putting in a data check in the Lambda function for certain properties. If the data type did not conform to the schema, we placed the file in a different S3 bucket for later review. This allowed us to keep the miscreant data so we can investigate later on while still having the ability to use Athena for querying.

Conclusion

By using AWS technologies we are familiar with we are able to create a JavaScript performance monitoring tool that perfectly suits our needs. And since we are both the architect/engineer and consumer/client, we can continuously tweak and finesse the design to our liking.

Part II in this series discusses in more detail about visualizing the results.

We’re always looking for new talent! View jobs.

Follow us: Facebook | Twitter | | Linkedin | Instagram

--

--