How many users can your platform or application handle

Pandi Ganapathy
Atom Platform
Published in
8 min readJun 30, 2020

“How many users can your platform or application handle?” is a common question you will hear stakeholders asking engineering teams. It is not straightforward to answer this question because it requires a deeper understanding of terms like the following:

Concurrent users

Simultaneous users

Throughput

Requests per second (RPS)

Kaplan Context:

At Kaplan, we have built a multi-tenant platform for learning and assessment called Atom. Atom is designed to serve a variety of user needs in the learning and assessment space. Some of the common user types include Students, Authors, Content Creator, Teachers, and Institution administrators. Scale is one of our architectural goals and we have a set of principles that guide our engineers in ensuring the platform can scale as needed. Let me share my experience of how we validated that our platform could support the planned scale needs. Let us start by understanding the terminology related to scale.

Concurrent users v/s Simultaneous users

The concurrent users are those who are connected to the platform and are requesting work at regular intervals. The users are not requesting work all at the same time or for the same functionality. The word “concurrent” is used only for the events that occur over a period of time.

Imagine all the users who are using our Atom platform, regardless of what they are doing. It will include the users who just finished logging in, the learners going through their course plan, the students starting tests, the students in the middle of taking tests, watching videos, submitting tests, reviewing test scores, etc.

Let’s say there are 1M users using different parts of Atom, which includes 100,000 students who have just logged in, 200,000 students starting tests, 400,000 students taking the tests, 50,000 users creating content like Questions, Quizzes, Articles, Videos, etc.. and 250,000 students reviewing their test results. Then it means 1 Million concurrent users are using the platform.

On the other hand, simultaneous users are the ones connected to the platform and requesting work at the same time for the same activity.

In the above example, all 1M Concurrent Users will become Simultaneous Users when they all become test takers and start their tests at the same time. Hence, the word simultaneous is used for the events that occur at a point in time.

Examples of scaling goals for your platform or application may look like the following:

1. The platform or application must successfully handle 1M concurrent users. This implies that irrespective of their activities the application must be able to handle 1M concurrent users over a period of time (for example: depending upon the use case, the period of time can be 30 seconds, 3 mins, or 60 mins).

and/or

2. The platform or application must be able to support 1M simultaneous test takers with the response time not exceeding SLA (for example one of the SLAs can be an average response time should not exceed 3 seconds). This implies that the application must be able to handle 1M simultaneous students (at one point in time) without any performance degradation.

Correlation between Concurrent Users and Requests Per Seconds

Another term “Requests Per Seconds” also can be easily confused with the term “concurrent user. As explained above, concurrent users are the number of users engaged with the application at a given time. They’re all in the middle of some sessions, but they’re all doing different things. Whereas Requests Per Seconds (RPS) is a measure of how many units of work are being processed. This is explained in the example below:

(Image source: https://www.kaptest.com — Medical Center Chicago | Kaplan Test Prep)

Let’s use the example of our Atom platform again. In addition to supporting individual learner use cases, Atom also supports Kaplan’s Nursing schools. A simple example could be a Nursing test where the school has a room full of 250 students. The teacher may have assigned each student certain tasks such as to browse the website, to take tests, to watch instructional videos, to review the tests, etc. The teacher may have requested the students to begin their tasks at 9 am. If the teacher walks around the classroom after 9 am, each of the students might be on a different page. Some are reading content, others are clicking around, and still, others have stopped browsing and staring at the wall. Here, if we want a measure of these activities, we will look at how many requests are being sent to the server over time. That’s requests per second.

How to calculate Request Per Second generated by Concurrent Users?

These student’s actions are sending a certain number of HTTP requests to the servers, which are processing them. We may have 250 users using the website at a given point of time, but they are not just making 250 concurrent requests to the server. The hits per second that those concurrent users generate will only be based on their actual total interactions with the application when they click a button or a link or submit a form. This total is then calculated in one-second intervals. So, if a server receives 12000 successful requests from these 250 users over the course of 1 minute, then the throughput is 200 requests per second (12000/60). Throughput uses successful Request Per Second which is an important metric that indicates the health of a platform.

Why “Concurrent users”?

One of the performance goals is often to measure whether or not our application (not just the web server, but the entire stack) can handle an expected amount of traffic. However, the easier variable to work with is how many users are expected to visit the site and interact with various pages, or using an app from the tablets or mobile devices. MixPanel, Google Analytics, or logs are the best places to get these numbers. Generally, it is found to be more useful to think of the number of people engaging with the site, rather than the number of machine requests, when the engineering team is involved in planning capacity and designing load tests. It’s a clearer metric for the stakeholders to correlate with the amount of traffic expected during the upcoming “Busy Season” or the expected volume of new students registering during sales events etc, than the number of GET or POST requests might be coming in every second.

How to determine the number of users a platform can support?

At Kaplan, building for scale is a crucial part of our Atom platform strategy. As we look for ways to improve the performance of our platform, measuring and focusing on the right performance metrics like concurrency, throughput, requests per second have immensely helped our platform to scale. Below is the methodology we have followed in taking a holistic approach to designing scale validation tests:

  1. Analyze the user behaviors: It is important to take into consideration the existing user patterns, especially if there will be a known period of increased usage. In the Kaplan context, our institutional business has known periods of what we call “busy season”. Tools like Google Analytics, MixPanel, or System logs are vital in finding average users in peak hours.
  2. Granularity: From the web-logs, get the number of GET or POST requests made to the server during peak hours.
  3. Requests per Second: From a system perspective, knowing RPS as explained above is critical. Split the total requests per microservices and determine their corresponding Request per Seconds.
  4. Design a Load test: Once you have understood the load distribution across all your microservices and the user trends and behavior during the peak hours, your next phase is to design the load test by scripting the user journey using tools like Jmeter.
  5. Run Baseline test: Deploy the load test scripts in a load generation server. Using a load generation agent, simulate a load that is equal to the production peak load. You can start at 20% of your peak load concurrent users and slowly ramp up from there to 100%. At every stage of ramp-up, ensure all the key metrics like Error Per Second, Response Time, Latency, Throughput are completely stable. If the tests succeed at 100%, check for any potential memory leaks, high CPU usage, memory usage, network usage, unusual server behavior, and any errors, etc.
  6. Project future load and establish a Baseline: This is the planned number of users anticipated on your platform, according to your production user pattern, trend analysis, product requirements on any new adopters/tenants of your platform, etc. In this stage, you need to extend the baseline test and run it against the anticipated load to ensure that your platform can scale seamlessly to cater to any upcoming near term needs.
  7. Identify the breaking point: Now it is time to identify the point of failure, increase the concurrent users count slowly until there is significant performance degradation in terms of error rate, response time, CPU, memory, IOPS, etc. The point at which performance degradation happens or application stops functioning is called the breaking point. The number of users connected to the platform before that breaking point is the number of concurrent users your platform can handle for that capacity.
  8. Investigate and Isolate the issue: If the performance degradation happens, the first step toward resolution is to determine the problem, find out the root cause, isolate and resolve the problem. To understand what’s causing the bottlenecks, start with monitoring application servers, web servers, app servers, load balancers, databases, gateway, etc. Watch the performance metrics during the load test to create baselines and trendlines for normal operation. Then it becomes easier to isolate the processes or endpoints that use a lot of resources that deviate from the normal trendline. Now you will be able to debug and remediate the issues.
  9. Remediate to eliminate the bottleneck: Some of the most common performance bottlenecks are CPU, network, memory, I/O bottlenecks. Generally, the causes for performance problems can be various, Let’s say you have successfully investigated and isolated a problem. The bottleneck is due to slow running queries, It can be a result of the bad application and schema design or missing indexes. In this example, the best course of action is to work with developers or DBA to remediate this problem. Hence depending on the root cause, you need to work with developers, architects, or DBA to change code, queries, architecture, or fine-tuning the infrastructure.
  10. Continuous Transformation: Last but not the least, always remember performance engineering is an iterative process where the scale validation needs to be performed repeatedly. With each iteration, the performance test strategy should be refined and test design should be expanded to meet architecture, application, or infrastructure enhancements. You need to rerun your scale test after the fix has been implemented. This way you can be confident that your application or platform is continuously transforming and prepared to scale for high usage.

I hope this article helps you go about answering the question “How many users can your platform or application handle?”.

--

--