This Could Be A Reason For Your Slow Database Performance: Why Burst Credits Matter?
It was a typical day at work. I was monitoring one of our services that consumes events from Kafka, fetches data from our database, performs aggregations, and inserts data into the database. But then something unexpected happened: Kafka was accumulating lag. And it was accumulating fast. Within a short amount of time, we were facing a lag of around 1 crore.
I quickly ran through my usual checklist: I checked the Kubernetes pod restarts, checked for CPU spikes, made sure the database storage wasn’t overflowing, and checked for memory overflow. But everything seemed to be right. None of the usual suspects was causing this problem.
However, I noticed something strange. Even the simplest queries that were used to find data by ID were showing up as some of the slowest operations. This was extremely fishy, and it was clear that something else was causing the problem.
So, I decided to dig deeper. I searched through the nooks and crannies of our data pipeline and service logs, hoping to find some clue as to what was causing the problem. And finally, after an overnight search, I found it — a rare problem that I had never heard of before.
It turned out that we were running low on burst credits. At first, I had no idea what that meant. However after some research, I discovered that burst credits are a way to control IOPS (Input/Output Operations Per Second) for Amazon Elastic Block Store (EBS) volumes in AWS.
I learned that when you provision an EBS volume, it is assigned a baseline IOPS level that it can sustain continuously. If the volume needs to perform a burst of I/O activity beyond its baseline performance level, it can use its burst credits to increase its IOPS. The amount of burst credits that an EBS volume receives is based on the size of the volume.
This was a crucial piece of information. I realized that our database performance was suffering because we had run out of burst credits. And because our database was central to our service, the entire pipeline had come to a grinding halt.
IOPS, Burst Credits, and CPU Performance
IOPS (Input/Output Operations Per Second) is a metric used to measure the performance of storage devices, including disks and storage volumes. It is essentially a measure of how many read or write operations a storage device can perform in a second. The higher the IOPS, the better the performance of the storage device.
In the context of a database, IOPS can have a significant impact on the overall performance of the system. When a database performs read or write operations, it needs to access data from the disk. If the disk has a low IOPS, it can create a bottleneck that slows down the entire system. This can result in slower database response times, poor user experience, and slow application performance.
The relationship between IOPS, CPU performance, and burst credits can be critical for maintaining optimal system performance. Burst credits are a feature of Amazon Web Services (AWS) Elastic Block Store (EBS) volumes, providing a temporary boost of IOPS beyond the baseline performance level. However, when a disk runs out of burst credits, it can no longer perform I/O operations at its peak performance level, leading to longer waiting times for data access. This waiting can consume valuable CPU cycles, slowing down the overall performance of the system. Thus, if an EBS volume exhausts its burst credits, it can cause a bottleneck and restrict the system’s overall performance, emphasizing the importance of managing IOPS and burst credits effectively.
Turning Setbacks into Success: How Burst Credits Led to a Performance Breakthrough
After discovering the impact of burst credits on the database performance, we switched from the gp2 EBS volume to the gp3 EBS volume, which significantly increased the IOPS and helped to resolve the issue.
In addition to this change, I also came to know about a valuable technique for optimizing database and application performance: setting up multiple EBS volumes instead of relying on a single, larger volume. For example, if I estimate that I’ll need 120 GB of storage for my database, I have two options: a single 120 GB EBS volume or multiple smaller EBS volumes, such as three 40 GB volumes.
The latter option is actually preferable because it allows me to spread my data across multiple volumes and thus reduce contention for resources. For instance, if I’m running MongoDB on an EC2 instance, I might store my data directory in one EBS volume, my journal directory in another, and my log directory in a third. By distributing the workload across multiple volumes, I can increase the total IOPS available to my system and reduce the likelihood of any single volume becoming a bottleneck. This can lead to better overall performance and utilization, without wasting CPU cycles waiting for I/O operations.
By making these changes and optimizing my EBS volumes, I was able to significantly improve the system’s performance and avoid the issues caused by burst credits. I highly recommend this approach to anyone experiencing similar performance issues with their databases or applications.
Conclusion - Small Details, Big Impact
In conclusion, I learned an important lesson that day — that even the smallest details, like burst credits, can have a big impact on the performance of our services. It was a frustrating experience, but it also taught me to be more vigilant about monitoring our system and to always be on the lookout for new ways to optimize our performance.
If you found this information helpful, don’t forget to give it a round of applause and share it with your colleagues and friends. If you have any further questions or comments, please feel free to reach out. Thank you for reading!