What is a good P99 benchmark ?

Rishabh Gupta
5 min readAug 15, 2023

--

Server processing time is not just dependent on CPU, multiple attributes contribute to a healthy processing time like use of cache, database structure, use of indexing, query optimisation, business logic, multi threading, blah, blah.. let’s get to the fun part.

The statistics

Below statistics were assimilated using AWS CloudWatch, they represent a medium sized, fast & modern cloud based enterprise application with the following key operations for users :

  • Download extensive reports (up to a max of 20,000 records)
  • Data visualization — monthly, weekly and daily dashboards
  • Auto refresh dashboards — at set intervals (1–15 minutes)
  • Perform fetch, update and insert transactions on 100+ entities
  • Perform bulk fetch, update and insert transaction on 20+ entities
  • Third party integrations with 10+ partners
  • App operates 24/7, with reduced load at night time
  • Load balanced environment for off-peak efficiency
  • 30+ batch jobs for automation of tasks

The statistics shown are from a 5 day period in 15 minute windows.

Requests / Traffic

The above shows the count of API requests received in 15 min buckets over a period of 5 days. It also shows how the traffic peaks during the day and falls off during the night.

P99.99

This shows how long the slowest API call took in each 15 min window, meaning 99.99% response times were slower than the depicted response time within that 15 min (minute) bucket. We can see it hovers around 6–7secs (seconds) and quite a few of them peaked to 60 secs which is not ideal for a customer facing app, but for our enterprise application these peaks attributed to some larger report downloads (15,000–20,000 records).
The concentration on the left on day 1 was analysed and addressed by breaking down big data queries into smaller multiple queries multithreaded to reduce the overall response time.

P99.9

P99.9 clears out the outliers of large reports, shows us that small and medium size reports (5000–10,000 records) take roughly 2.5 secs to download, which is fairly fast.

P99

P99 sitting at almost 1 second is again not ideal for a customer facing app, this should be closer to 200 ms (milliseconds). But, for our enterprise app this attributes to small report downloads (500-1000 records) and a bunch of live dashboards that auto refresh at 1 to 15 min intervals, 750 ms response time for such requests is fairly good.

P95

P95 graph is mostly influenced by bulk inserts and updates, 150 ms for such transactions is not bad. Towards the end of day 2, some scripts ran which influence the outliers around then.
If you own a customer facing app, your P99 should look something similar minus the outliers due to the script.

P90

P90 for us is mostly influenced by inserts and updates of individual records in our system or transactions that fetch/update data in third-party systems via integration.
100 ms for such transactions is not bad whether it be an enterprise app or a customer facing app.

P75

P75 is mostly attributed to reads that involve more than one type of entity in a single API call, mostly daily or weekly graphs and smaller dashboards.
30 ms response times for such API calls that respond with roughly 10–50 records is very good.

P50

P50 is mostly attributed to GET calls by entity ID.
Around 12.5 ms for such transactions is good.

P25

P25 is mostly attributed to 2 types of calls; GET API calls by entity ID that can be served directly from cache, or GET API calls for summarised data that changes infrequently and is sitting ready in summaries in the cache.
8 ms is a good response time to serve such requests, getting us into the coveted single digit response time range.

P1

P1 is mostly influenced by health checks and API calls that respond based on calculations without having to access the cache or the database. 3 ms for basic health checks is very healthy, ensuring no unnecessary burden on the processors for trivial calls.

Percentile Rank metrics

They tell us what percentage of API calls are served within a specified timeframe. For e.g. PR(x:y) indicates the percentage of calls that were served within x to y seconds.

PR(:0.100)

PR(:0.100) shows roughly 90% of requests were served within 0–100ms

This can be correlated with the P90 graph, which helps us confirm that roughly 90% of the fastest requests are served within 100 ms.

PR(:0.050)

PR(:0.050) shows roughly 80% of requests were served within 0–50ms

PR(:0.010)

PR(:0.010) shows roughly 40% of requests were served within 0–10ms

Metrics based around means

Means are generally not a good indicator of the overall application health as they tend to average out critical issues in the system that can otherwise be highlighted by peaks, but below are some that can be used to assess the general health of the application.

TM(5%:95%)

TM(5%:95%) varies between 20–45 ms

TM(10%:90%)

TM(10%:90%) varies between 15 to 30 ms

IQM

IQM varies between 12–16 ms

Interquartile mean (IQM) represents the mean of the response times of the middle 50% of the requests when ordered by response times. This would be the same as TM(25%:75%).

Suggested response times with percentile value

The response times discussed in this article do not include the network latency, it is limited to the time it takes from when the last byte of the request reaches the server to when it readies the response and sends back the first byte.

This article is an attempt at suggesting percentile values for an application using a modern architecture. I welcome any suggestions that can help improve this guide that attempts at indicating suggested response times based on use case.

--

--