Load Testing your Analytics and Dashboards

Marc Polizzi
9 min readMay 10, 2024

--

This post continues our previous discussion regarding the Analytics Ops project, focusing more specifically on load testing both your analytics and your dashboards.

Load testing allows you to determine how your analytics and dashboards will behave during normal and peak conditions and to confirm your analytical platform meets your intended performance goals, aka. service level agreement (SLA). Load testing is essential to understanding your analytics and dashboards are ready for deployment.

Load Testing

Very often, we hear about performance testing, load testing and stress testing altogether. So let’s first clarify these different type of tests.

Performance Testing

Performance testing is in general a testing practice performed to determine how a system performs in terms of stability, speed, scalability, and responsiveness under a given load.

Therefore, performance testing is used to identify and eliminate bottlenecks while verifying that your analytics and dashboards meets the predefined performance criteria.

Load Testing

Load testing is a type or a subset of performance testing. The goal of load testing is to validate the system’s ability to scale its resources and processes during periods of peak user activity. Load testing focuses on the capacity and scalability of the system, whereas performance testing evaluates the system’s overall efficiency and behavior.

Load testing helps your development teams determine ahead of time whether your analytics and dashboards can maintain reasonable performance standards during high levels of activity. Potential bottlenecks can be removed and inefficient dashboards and queries can be corrected well before they reach production.

Stress Testing

Stress testing is a subset of load testing and therefore a type of performance testing. During stress testing, the system is pushed beyond its normal operating conditions to evaluate its stability and fault tolerance and understand its upper limits.

Subjecting your analytics and dashboards to edge-case scenarios is a great way to ensure they can withstand the most demanding conditions improving their quality and resilience.

Types of Load Testing

There are a handful of different load tests each one corresponding to a different goal in mind. You can find several examples in this wikipedia page or in this page from Grafana Labs. The latter website is nicely showing the load profiles of several load tests at a glance :

Load Test Types (photo from Grafana Labs)

With this picture, it’s very easy to understand the goal and structure of each load test :

  • Smoke Test : Verify your analytics and dashboards functions with minimal traffic.
  • Average-Load Test : Discover how your analytics and dashboards functions with typical traffic.
  • Stress Test : Discover how your analytics and dashboards functions with extreme traffic.
  • Spike Test : Discover how your analytics and dashboards functions with sudden and massive increases in traffic.
  • Breakpoint Test : Progressively ramp traffic to discover your analytics and dashboards breaking points.
  • Soak Test : Discover whether or when your analytics and dashboards degrades under typical traffic of longer duration. Soak test is also known as endurance test.

The interested reader may refer to the Grafana Labs pages for more details about each type of test or Google it for more information.

Overview of Load Testing

To recap, load testing is a form of software testing that puts a simulated load on a system to see how it performs.

In the case of analytics and dashboards, the load can simulate the number of users opening the dashboards, the ad-hoc queries your users are executing, the amount of data being accessed, etc…

While the specific procedures may differ, load testing is typically made of the following steps :

  • Define the Objective : e.g., how the dashboards usages reacts to the number of users, data volumes, etc…
  • Define the Performance Goals : e.g., dashboards opening time, query processing time, etc…
  • Define the Test Scenario : A scenario is basically defining a number of actors (and their jobs : e.g., open a dashboard, ingest some data, etc…) acting on the system during three periods of time : ramp-up (gradually increasing the number of actors), steady-state (running all the actors), ramp-down (stopping all the actors).
  • Prepare the Test Data : e.g., the dashboards to open, then non-regression results for typical queries to execute, etc…
  • Create the Test Definition : The actual implementation of the test depending on the tool you’re using. Refer to the following section for various concrete examples.
  • Run the Test : Run the test as defined in the previous step.
  • Monitor and Analyse the Metrics Collected : Review both the output of the test and the metrics of the analytical platform collected during the run of the test to determine whether you met your performance goals.

To offer practical examples, the following section demonstrates the implementation of various analytics and dashboards load tests using the ic3-analytics-ops tool. For further information on this project, please consult either our previous post or to the project’s documentation.

Examples of Load Testing

We’ll simulate several users opening a set of icCube dashboards and executing ad-hoc MDX queries and ensure at the same time non-regression with the results of all the queries.

The same functional testing logic is used throughout the three loading tests to ensure analytics and dashboards are functionally working as expected.

We’ll adapt the actual load profile according to the loading test objectives as following :

  • Smoke Test : Use a minimal load to verify the analytics and dashboards are functionally working. We do not expect any performance issue here.
  • Average-Load Test : Use an average production load. We expect analytics and dashboards are functionally working without any particular hiccup.
  • Stress Test : Use an extreme load. We’d like to ensure the system is properly configured to handle an unexpected load and the performance remain acceptable.

Test Data Generation

Before defining each test scenario, we’ll need to collect the queries and their results (for non-regression testing) as well as identifying the list of dashboards to open. How to generate the right set of data is out of scope for this post but suffice to say those data should be representative enough of an average production usage. The ic3-analytics-ops allows for extracting this kind of data or you can collect icCube audit information for that purpose or even use tools like Fiddler to intercept typical queries (and their responses) from the system. Leave us a comment if you’d like to know more about that process.

Smoke Test

Smoke Test Profile (photo from Grafana Labs)

This test simulates a minimal load and will execute for a very short period of time. The objective here is to ensure the functional logic of the tests that will be reused in the following load tests.

Here is its JSON5 definition :

{
name : "Smoke Test",

restApiURL : "http://localhost:8282/icCube/api",

authenticator : {
user : "${analytics.ops.user}",
password : "${analytics.ops.password}"
},

actors : [
{
name : "MDX Player",
tasks : [
{
action : "MDXes",
data : "data/Sales",
schema : "Sales",
pauses : "PT0s:PT0.5s",
shuffle : true,

performanceTargets : {
durationMax : "PT1s"
}
}
]
}
],

load : {
actors : [
{
actor : "MDX Player",
count : 3,
rampUp : "PT10s",
steadyState : "PT50s",
rampDown : "PT10s"
}
]
}
}

When run, this test is :

  • creating 3 “MDX Player” actors (aka. users) during the ramp-up period of 10 seconds
  • running the 3 “MDX Player” actors during the steady-state period of 50 seconds
  • stopping the 3 “MDX Player” actors during the ramp-down period of 10 seconds.

Each actor is executing a series of MDX queries and is asserting their expected results and at the same time is ensuring their maximum execution time remains below 1 second.

Notice the random pauses when executing the MDX queries to have a more realistic usage of the system (indeed, not all the users are clicking at the same time) and the ‘shuffle’ parameter to add a bit of random usages to prevent any repetitive pattern.

Once the smoke test has passed successfully, we can run the average-load test that simulates a typical load for a longer period of time.

Average-Load Test

Average-Load Test Profile (photo from Grafana Labs)

An average-load test assesses how the system performs under a typical production load.

This test simulates the number of actors reflecting the average number of users in the production environment as well as the average number of queries.

Here is its JSON5 definition :

{
name : "Average-Load Test",

restApiURL : "http://localhost:8282/icCube/api",

authenticator : {
user : "${analytics.ops.user}",
password : "${analytics.ops.password}"
},

actors : [
{
name : "MDX Player",
tasks : [
{
action : "MDXes",
data : "data/Sales",
schema : "Sales",
pauses : "PT0s:PT0.5s",
shuffle : true,

performanceTargets : {
durationMax : "PT1s"
}
}
]
}
],

load : {

failAtCpuLoad: 0.8,

actors : [
{
actor : "MDX Player",
count : 100,
rampUp : "PT5m",
steadyState : "PT30m",
rampDown : "PT5m"
}
]
}
}

As you can see, it is very similar to the smoke test definition. The main differences being the number of actors created (100) and the execution time of the test (5+30+5 minutes). Otherwise, all the assertions remain the same including the performance targets.

The ‘failAtCpuLoad’ configuration : it is used to quickly ensure the testing machine is behaving as expected and is not lagging behind the server machine being tested.

Notice that this test may fail during the ramp-up period if the system cannot handle the expected load. Otherwise, the system performance should remain stable during the steady-state period.

Once the average-load test has passed successfully we can run a stress test that simulates an above-average load to assess how the system performs.

Stress Test

Stress Test Profile (photo from Grafana Labs)

A stress test assesses how the system performs when loads are heavier than usual.

This test re-use the average-load test definition and increase significantly the number of actors (and therefore the number of queries). Apart from that, the definition remains the same.

The expected performance targets has been increased to reflect the expected slow down of the system.

Here is its JSON5 definition :

{
name : "Stress Test",

restApiURL : "http://localhost:8282/icCube/api",

authenticator : {
user : "${analytics.ops.user}",
password : "${analytics.ops.password}"
},

actors : [
{
name : "MDX Player",
tasks : [
{
action : "MDXes",
data : "data/Sales",
schema : "Sales",
pauses : "PT0s:PT0.5s",
shuffle : true,

performanceTargets : {
durationMax : "PT2s"
}
}
]
}
],

load : {

failAtCpuLoad: 0.8,

actors : [
{
actor : "MDX Player",
count : 200,
rampUp : "PT10m",
steadyState : "PT30m",
rampDown : "PT5m"
}
]
}
}

Similarly to the average-load test, this test is possibly failing during the ramp-up period if the system cannot handle the excessive load and is possibly failing during the steady-state period if the performance degrades too much over time.

Benefits of Load Testing

In addition to all the advantages of testing your analytics and dashboards in general, load testing provides the following benefits :

  • Improved Performance and Reliability : Potential bottlenecks can be removed and inefficient dashboards and queries can be corrected well before they reach production.
  • Improved User Experience : As performance and reliability of your analytics and dashboards increase, users will encounter fewer issues and will have an overall smoother experience using them. This will in turn increase their confidence and trust in your application.
  • Improved Cost Control : Since your dev. teams can simulate anticipated load, over-provisioning resources is unnecessary.
  • Minimize the Cost of Failure : Load tests help your team preparing your production environment to prevent unresponsive (or outages) analytics and dashboards that can cause significant financial losses.
  • Faster Time to Market : As you catch performance issues early, you’ll spend less time troubleshooting issues and extinguishing fires and have more time to develop new features.
  • More Confidence in your Release : Load tests greatly decreases the likelihood of deployment and production issues, as it allows your dev. teams to catch issues early.
  • Meet Service Level Agreements (SLA) : Load tests allows your dev. teams to determine how your analytics and dashboards will behave during normal and peak conditions and to confirm your analytical platform meets your intended performance goals, aka. service level agreement.

Wrapping Up

The success of your analytics and dashboards depends amongst other factors on its performance. Indeed, any outage can have a serious impact on your customers confidence in your application and eventually affect your company’s revenue.

Load testing is a valuable asset for your testing tools that can help your dev. teams to improve the stability and therefore the quality of your analytics and dashboards.

icCube is improving the ic3-analytics-ops project to help their customers adopt an AnalyticsOps approach for their dashboards project.

If you like to know more about this project and/or discuss testing do not hesitate to leave a comment or contact us.

--

--