40% AWS infrastructure cost reduction

Before worrying about “traditional” metrics that don’t really tell you what’s the root cause of your performance problem, take a look at how your queries are performing!

This case is from our customer VHSYS but it actually represents a very common situation among IT companies. Questions like “What’s causing frequent CPU peaks?” and “How can I improve performance without resizing my database to a larger instance?” come to us very often and when we are trying to understand the real pain we always start with:

“- Do you really know how is your applications communicating with the database?”

And the answer is always: I have no idea.

Initial Scenario

Customer case: VHSYS was facing a massive user base growth month over month and as a consequence performance issues and infrastructure costs started increasing. They started worrying about the scalability and reliability of their software and, of course, the costs going up.

Their online Enterprise Resource Planning (ERP) SaaS had this characteristics:

  • 1000 concurrent users;
  • 12GB database;
  • RDS Mysql (db.m3.large / db.m3.xlarge) Multi AZ;
  • Frequent CPU peaks of 100%, most of the time between 50% and 80%;

In the first day using our monitoring SaaS, that was the scenario:

  • 162796 executions with Execution Time over 100ms;
  • 557.74ms was the Average Time of an Execution;
  • 25h 13m Total Time Spent;
  • 2m 41s 456ms was the Slowest Execution time;
  • 853 distinct commands;

We could also verify that only 10 queries were responsible for more than 70% of the database’s total execution time.

Solution

Based on the report generated by Nazar, the following actions were taken:

  • 7 indexes created;
  • 5 commands were rewritten (SQL);
  • 1 single command frequency reduction in the application;

The solution seems to be pretty simple but only when you’re monitoring what really matters and have the root causes of the performance problems identified you are able to come up with simple and effective solutions.

Results: 40% cost reduction and more!

As you can see in the table below, performance have improved significantly. The daily number of executions slower than 100 milliseconds decreased from more than 160 thousand to around 10 thousand, a 93.85% reduction, and the Total time spent dropped from 25 hours to only 47 minutes.

The RDS instance CPU utilization reduced expressively reflecting the performance optimizations.

The reduction in the RDS instance resources consumption allowed our customer to resize their database instance to a smaller instance which resulted in a 40% cost reduction.

“We hired the services of NAZAR aiming to improve the performance of our database. We were pleased with the result, achieving a drastic reduction in load, making it possible to reduce 40% of AWS infrastructure costs.” — Reginaldo Stocco, CEO at VHSYS

Originally published at blog.nazar.io.