A FRAMEWORK FOR
Estimating Peak Web Traffic for Digital Commerce Websites
Keep your site running smoothly by learning about exactly how much traffic to test for.
Solution architects should keep business owners informed about the traffic that their website can sustain so that as they plan a new launch or a major marketing push, they can use this information for effective planning. Doing so will avert a laggy site that may cause loss of revenue or, even worse, a breaking of customer’s trust.
Ascertaining the peak load requires rigorous testing; which in turn requires a good estimate of the traffic that is expected for the site.
At AAXIS, during the mid-2000s, we performance-tested websites that advertised at the Super Bowl halftime show, monitoring the load before, during, and after the show. Since then we have implemented and tested several high-profile Business-to-Business(B2B) and Business-to-Consumers (B2C) websites that not only successfully stood the test of traffic but thrived. Based on the observations, we developed a framework for estimating peak load for a website utilizing metrics at various granularities; from annual to hourly.
Let’s now explore this framework in more detail, so you can apply it to your own site. Eventually, we will calculate the peak load in Requests Per Second or RPS. If you are testing a website, this is the load you should actually test at.
Quality of the metrics
Understanding the quality of metrics that are available to you is very important. If it is a guesstimate from an executive, you should add a good margin of safety to it. I usually recommend a 100% margin based on my observations.
Front-end analytics such as Google Analytics, Matomo, etc., may not provide the whole picture, as they are only meant to measure marketing efficiency. To ensure that your site servers can handle the load gracefully, you need to take into account all traffic; including bots, scrapers, or other unhelpful traffic. Because, trust me, if you don’t, your servers won’t be the only ones to feel the load, your customers will as well. So, if you only have data from front-end analytics sources, add a healthy margin to it. Note that adding a percentage factor on your existing load can be misleading as bot traffic has to be added independently of site traffic. For smaller websites, this could mean 50% or more of traffic may just be bad bots.
The best source of data, in my opinion, is from access logs. Parsing the logs using software such as AWSTATS, NAGIOS, GoAccess, HTTP log viewer, etc provides good, quality metrics for performance testing purposes. Another source could be Application Performance Monitoring(APM) tools such as New Relic, App Dynamics, etc.
A key consideration also is Page Views per Second (PVS) vs Requests per Second. These are two different metrics, depending on who the information is intended for. Marketing may look at PVS as the number of eyeballs on the site. While PVS is useful to think of in that way, the site might actually have to handle multiple requests for that page view.
RPS on the other hand might include hits for static content such as CSS, JS, Images, etc., which should be coming from a Content Delivery Network (CDN). If you actually serve out these assets from your app server, don't! That is a waste of precious resources.
In this article, I will refer to load in Requests Per Second or RPS. If you have the yearly average number of requests, how do we compute RPS? Easy, you say. Simply divide the yearly number by the time units and you should have it! Not so fast. What you calculated is average load and there is an ocean of difference between average and peak; sometimes 10 times as much. So let’s look at how to find the peak. Since I am not sure what granularity of metrics is available to you, I will start with yearly and go down, section by section, to a second. While you should start with the finest level of detail available to you, I recommend estimating from the top to see if the numbers make sense. You just might discover a trend in your data that could help your website.
Average Yearly Load to Monthly Load
You should start here if you only have the yearly load available. For startups, this often the case. You can get this number from competitor websites or derive it from sales and marketing strategies.
Presented here is the yearly load for ACME Foods Inc. While the company and the metrics are fictional, the trends are common for most B2B Companies.
Looking at the numbers, the average web traffic is 326,000 requests with a peak of 423,000 requests. ACME Food Corp. distributes diet and sports food and drinks to businesses that sell them to consumers. You can see the peak around January representing the “Diet Season” and again in May and June, in preparation for the summer sports season, which they directly address.
So what we observe here is that, for ACME, if we only knew the yearly number of 4 Million requests, a simple average would lead you to conclude that the server only needs to handle 326k requests/month, where in reality it needs to handle 425k requests/month. The peak-to-average ratio for ACME is roughly 1.35. For the companies we handled at AAXIS, this ratio is typical and in the mid-range for B2B companies, with B2C companies coming in at an even higher range.
The point here is not to take this number as given but to calculate it for your specific case. We have customers who do 50% of their business in the two months leading up to the hurricane season, serving storm chasers who do home repairs. Their peak-to-average is so high, we made a recommendation to automatically scale the hardware up and down to save costs.
In summary, while a 35% margin is a good bet in the absence of any available data, you should really try to study your specific case and ascertain the monthly load.
Peak monthly load = (1.35 to 1.50) * Total yearly load / 12
Monthly Load to Daily Load
Once you have the monthly load, known or calculated, the next step is to compute the daily requests. We’ll need to look at the weekly trend as well as the overall monthly trend when we analyze the traffic patterns in a month.
Here is the graph of daily web requests for our fictional company, ACME Foods Inc., which is typical for many B2Bs.
There are two aspects here to observe; One is the cyclic weekly pattern and the other is the month-end activity. Fridays are a little slow and Mondays are light as people return from the weekend. Another thing to consider is that Mondays and Fridays may represent only the partial load on your servers depending on the time zone distribution of your clients. This results in the classic Gaussian Distribution like bell curve.
The second pattern is the increase in month-end activity. This is becoming increasingly common as B2B companies provide features such as requested delivery date, and forecasting, where a customer can specify when they need the shipment to arrive. These features allow customers to place orders at the end of the month to reserve inventory and leverage credit from next month. This causes the increased month-end activity seen in the graph.
The customer behavior pattern for B2Cs is quite different. A look at popular sites can be seen here at Similar Web’s blog. From the post, we can see that Amazon shows fairly flat traffic during the week with a slight peak during weekends. Web traffic from US Average Shopper and Ikea is shown here from their post. Ikea, unlike Amazon, shows distinct peaks on weekends with the lowest point towards the middle of the week.
While the peaks occur at different times, both B2B and B2C sites exhibit a peak to average ratio of between 1.2 and 2.0. A ratio of 1.5 can be used for a B2B or B2C focussed retailer, with no month-end rush.
Peak daily load = (1.2 to 2.0) Monthly load / 30.5
Daily Load to Hourly Load
Having calculated the monthly load above, it’s now time for the peak hourly load. For B2B companies specifically, the load during the beginning part of the day would be low as the day starts off, rising towards mid-day and tapering off as people leave work, as the business day ends. The data for our ACME Food Inc. looks like this.
An important aspect that dictates the shape of this graph is the geographic distribution of your clients. In the case of ACME, while it has clients all over the US, Canada, and Mexico, the majority of its clients are in one timezone. The small dip at the middle of the day, we believe, is the lunch hour, plus time zone overlap, based on the analysis of sessions we did. Around 8 PM, the small bump in requests is due to close-of-business stock tallies by B2C clients of ACME, who typically place stock replenishment orders at the end of the day.
This pattern could be different based on the timezone distribution of your clients. One of our clients at AAXIS, who transacts in luxury goods in US and Asian countries, sees two distinct peaks corresponding to the peak times in those two geographies. American Book Sellers show their daily traffic patterns here and you can see a similar bell curve.
The curve on the top represents a classic Gaussian Distribution like bell curve with a theoretical peak-to-average of 1.4. If I use a 16-hour workday the ratio works out to be 1.39 or I can use a 24-hour workday to get 2.0. We find that using the duration of office hours and a ratio of 1.4 to 1.6 works best for B2B Companies, where using a 24-hour workday works best for B2C companies. B2C hourly trends are here for reference.
Peak hourly load = (1.4 to 1.8) * Daily load / Average work hours
We recommend using 12 hours + (Maximum time zone difference between the location of your client bases) if you are estimating this number.
Hourly Load to Peak Requests Per Second (RPS)
The next step is to calculate the peak Requests Per Second. In most cases, this is rarely available and you might need to compute it. To understand this part, we’ll need to delve a bit into probability and statistical science.
In any given hour, we can expect the average visitors and requests to be more or less constant. However, in a given second in that hour, visitors will arrive randomly and there will be fluctuations in the requests. Some seconds will be higher and some seconds will be lower, but with a constant rate averaged over a longer period, such as an hour. To estimate what the peak is within the period of 5 seconds, we can use Poisson Distribution.
Poisson Distribution expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. (Wikipedia).
In layperson’s terms; as long as there is a constant influx of website visitors, we can use this method to calculate the probability that a given number of visitors are using the website in a given time period (second). This author takes a similar approach here.
The worst-case for a website is all the users for that hour will appear within the same exact 5-second window. However, this case is extremely rare. Practically then, we can estimate how many users are on your site, with a confidence level from 99.9% to 99.999% (Five Nines), and from there we can pick the one that makes the most sense. Let me illustrate the point with an example. The graph below shows cumulative Poisson Distribution with an average of 10 RPS.
The x-axis represents RPS that the server is likely to see, averaged over twice the time taken by your worst performing page (we used 5 seconds). The y-axis represents the confidence level.
So, if you take 10 RPS on the x-axis, you see that the confidence level is 50%. That is, there is a 50% chance that the peak load your server will experience in any given second is the average load or lower. Moving up, we are 92% sure that the peak will be 12 RPS or less. Looking at it another way, there is an 8% chance that at any given second, the site will experience a load higher than 12 RPS. What is the confidence we want to go for? Typically, we recommend 99.999% (or five 9’s) for mission-critical B2B commerce applications. In the case illustrated, that happens at 16.4 RPS or a factor of 1.64.
You can’t take that number though. As the load on your server decreases, the factor to get to five 9’s increases and vice versa. At five RPS, five 9’s occur at 9.80 RPS or a factor of 1.96. At 0.5 RPS (the case with ACME), five 9s can be only be achieved at a hefty 440% markup of 2.20 RPS.
Fortunately for ACME, the minimum standard environment offered by popular B2B platform providers, such as ORO Commerce and Magento on their cloud environment, easily covers the required 2.2 RPS. I recommend always tuning your servers to at least 2 RPS, even if your load is only a fraction of that amount.
If you didn’t follow along with all the Math above, think of peak-to-average within the time span of a second as your safety factor; the lower your load, the higher your safety factor should be. I present this handy graph for you to pick your factor of safety with a 99.999% confidence level.
As an example, if your RPS is 5, you should use a factor of 2. That is with a 99.999% confidence level, we can say that in any fixed 5-second window, your server, on average will not experience a load higher than 5*2 = 10 RPS.
Peak RPS = (4.4 to 1.5) * Hourly load / 3600
To get a rough estimate of the factor for your load, use the Poisson Distribution graph.
Summary
TLDR; Here is the version for the impatient.
Always consider the quality of the analytics and the intended audience. For estimating peak load for the server, choose server-side metrics over metrics designed for marketing purposes.
To calculate peak monthly load given annual load:
Peak monthly load = (1.35 to 1.50) * Total yearly load / 12
To calculate peak daily load given monthly load:
Peak daily load = (1.5 to 1.8) * Total monthly load / 30.5
To calculate peak hourly load given daily load:
Peak hourly load = (1.4 to 1.6) * Total daily load / Average work hours
To calculate peak load on the server given peak hourly load:
Peak RPS = (4.4 to 1.5) * Total hourly load / 3600
Further considerations before writing a test
Here are some more considerations before you can conduct a performance test
- Estimate the number of visitors you expect for your site. It is important to test with the expected number of users, as having a memory-bound application that cannot scale will cause a bad user experience or even may take down your server. By testing, you can plan for the load and ensure that your site will function as expected.
- Analyze the composition of the load. Not all pages place the same load on the server. For example, checkout pages use more resources than product detail pages. Typically, only 2% or less (for B2C) or 20% or less (for B2B) is checkout traffic. You can use Google Analytics or Matomo to get an idea of which pages are popular and represent that in the test.
Conclusion
Digital commerce companies depend on uptime not only to generate revenue but also to build trust with their customers. Therefore, it is doubly important for platform owners and System Integration partners to conduct performance tests on the site, using an accurate measure of peak load.
This article describes a systematic structure to think about the load on your server. Simply dividing the total load by the time period will lead to erroneous conclusions on the peak load on the servers. The correct way is to use various factors to arrive at the peak load. This process will ensure that the peak load is representative of the correct load and help you avoid unnecessary trouble during production launch or a major advertising push. Use this as a framework to estimate your traffic, as actual conditions vary from company to company.
While you shouldn’t count your chicken before they hatch, it helps if you have a reliable estimate!