Mastering Latency Metrics: P90, P95, P99

Anil Gudigar
Javarevisited
Published in
6 min readApr 11, 2024

Latency metrics play a critical role in evaluating the performance of a service or application, by observing P90, P95 and P99 latencies we can identify potential bottlenecks to optimize the user experience.

We should know how important it is to get our services and applications under a particular latency period and also have an SLA ( Service Level Agreement ).

Imagine you’ve built a robust backend system that’s been thoroughly tested, meeting all functional requirements.

You’re excited to roll it out to production. However, just before deployment, you are requested for details on the system’s performance metrics:

What’s the average (Mean), middle (Median), and maximum latency?

How do the latencies look at the 90th (p90), 95th (p95), and 99th (p99) percentiles?

Additionally, you want to know at what level of load the system was tested

What is a latency period?

The latency period for an API refers to the time it takes for the API to respond to a request. It measures the delay between when a request is made to the API and when the response is received.

Latency is typically measured in milliseconds (ms) or seconds (s).

A lower latency indicates that the API responds quickly, while a higher latency indicates slower response times.

Monitoring and optimizing latency is crucial for ensuring good performance and user experience in applications that rely on the API.

What is SLA (Service Level Agreement )?

In simple words, an SLA (Service Level Agreement) is like a promise between a service provider and its customers.

It spells out what services the provider will offer and the standards they promise to meet.

For example, if you’re paying for internet service, the SLA might say they guarantee your internet will be working 99% of the time, and if it’s not, they’ll fix it within a certain timeframe. It’s a way to make sure you get what you paid for and hold the provider accountable if they don’t deliver.

Sample SLA Report

What is P90,P95, P99 ?

Let's consider we have an API Service that connects to different databases and upstream APIs / 3rd party services and does some feature operations within that particular product.

For Every API we need to provide latency so that consumers of the API will know that this particular API is going to take a particular period to respond.

In the Context of API services, P90, P95, and P99 refer to different levels of performance or response times.

P99 denotes here that is the response time of API 99 %.

For example: the assumption here is server is getting 100 request

there will be 99 requests that take < 120 ms

1 request takes > 120 ms

So, we could say /api — P99 response time is = 120 ms

P99 (99th percentile): This suggests that 99% of the API requests are faster than this value.

Just 1% of the requests are slower than the P99 value.

P95 denotes here that is the response time of API 95 %.

For example: the assumption here is server is getting 100 request

there will be 95 requests that take < 90 ms

5 request takes > 90 ms

So, we could say /api — P95 response time is = 90 ms

P95 (95th percentile): This indicates that 95% of the API requests are faster than this value.

Only 5% of the requests are slower than the P95 value.

P90 denotes here that is the response time of API 90 %.

For example: the assumption here is server is getting 100 request

there will be 90 requests that take < 80 ms

10 request takes > 80ms

So, we could say /api — P90 response time is = 80 ms

P90 (90th percentile): 90% of the API requests are faster than this value. In other words, only 10% of the requests are slower than the P90 value.

So, when monitoring API performance, these percentiles help in understanding how the majority of requests are performing and how outliers impact overall performance.

P99 Latency Vs Median Latency

In the below table, we see 100 requests with different response times.

Here P99–15 ms, how do we calculate this? except for 1 request which is 80 ms all are less than or equal to 15 ms.

But for a median latency, it's the average of all latency, so it is 12.8 ms.

Median latency, put simply, is the middle value of all the response times for a set of requests made to a system or service. Imagine lining up all the response times from fastest to slowest, and then finding the one right in the middle. That’s the median latency.

It’s a way to understand the typical or average time it takes for a system to respond, without being influenced too much by any extremely fast or slow responses.

Mean and Max Latency

Mean latency is like the average response time of a system. It’s calculated by adding up all the response times for requests made to the system and then dividing that total by the number of requests. So, if you imagine all the response times lined up, the mean latency gives you a sense of the typical or average time it takes for the system to respond

Let’s say you’re measuring the time it takes for a website to load for different users. Here’s how mean latency works:

Imagine you have data on the time it took for 5 users to load the website: 2 seconds, 3 seconds, 4 seconds, 5 seconds, and 6 seconds.

To find the mean latency:

Add up all the response times: 2 + 3 + 4 + 5 + 6 = 20.

Divide the total by the number of users (which is 5 in this case): 20 ÷ 5 = 4.

So, the mean latency for this example is 4 seconds. It means, on average, it takes 4 seconds for the website to load for these users.

Max latency is the longest amount of time it takes for a system to respond to a request.

For example, let’s say you’re measuring the time it takes for a video to start playing after clicking the play button.

If the video starts playing for most users within a few seconds, but there’s one user who experiences a delay of 20 seconds before the video starts, then the max latency in this case would be 20 seconds. It represents the longest wait time among all the users’ experiences.

References :

In the real world, you will be deploying your service on multiple servers across the globe for scalability and availability, so calculating these percentiles by ourselves is not practical. There are a lot of technologies which can help us, and those are

--

--