MySQL Adventures: Metabase Sync May Eat Your InnoDB Buffer Pool

To continue our adventures on MySQL, today we investigated one more issue about MySQL’s memory consumptions. It's not a very big deal but looks something interesting/warning if you are using Metabase.

You may like these articles:

  1. How max_prepared_stmt_count can bring down production
  2. Reduce MySQL Memory Utilization With ProxySQL Multiplexing


Metabase is a great lightweight tool for analytics or we can use it for query MySQL database(it’s not the primary purpose, but still we can use it as a GUI for run your report queries). It supports many databases including Google BigQuery which I like most in Metabase.

We received an alert from our monitoring system that one of our read replicas consuming more memory than usual. If MySQL uses more memory then absolutely its fine since More In-Memory More Performance. But I didn’t convince with that because from past 6 months it never crossed 60% but suddenly it reached 80%.

The reason for writing this blog is, many times DBAs are SREs wherein the situation to find out the sudden memory spikes in MySQL. Also, we need to run Metabase Sync very carefully. I have done my investigation step by step to find the cause. (If you are a MySQL DBA you might know these steps).

I’m using Percona Monitoring Tool to monitor all of my MySQL servers. These investigations are also done with it.

  1. The below graph shows that the system memory usage was gradually increasing.

2. We need to make sure MySQL is the one who ate this memory.

3. And this utilization happened because of the Buffer pool. It was storing some huge amount the data in it. More than 1 million data pages were pushing into the buffer pool.

4. The reason why suddenly buffer pool start capturing the data is, some queries were running but the data pages for those queries not available in the buffer pool. So those queries actually read the data from the SSD disk and push it to Buffer pool.

The below graph is showing there were some huge transactions happened.

5. To dig deeper, what kind of Operation caused this much transaction is actually READ. The below graph is confirming this.

6. Now we should take a look at Disk IO to verify those data are actually taken from disk and loaded to the Buffer pool.

The below graph confirms that during this period, the InnoDB got the data from DISK, not from memory.

7. Finally, we verified the data read from DISK. DISK monitoring metrics confirmed this. More than 10K read OPS happened on the disk.

8. And those queries actually did the read on the disk and almost 80% IO utilized.

9. Its time to identify what are those queries made this much disk read. The Query Analysis will tell you.

10. If you the query runtimes is actually pretty high. If you expand the queries these all are starts with --metabase

But one good thing about this is, It did limit 5000

Finally, we identified the exact queries while piled up the memory. Let’s see why metabase wants this sync?

From MetaBase’s Doc:

By default, Metabase performs a lightweight hourly sync of your database, and a nightly deeper analysis of the fields in your tables to power some of Metabase’s features, like filter widgets.

Metabase maintains its own information about the various tables and fields in each database that is added to aid in querying. By default, Metabase performs this lightweight sync hourly to look for changes to the database such as new tables or fields. Metabase does not copy any data from your database. It only maintains lists of the tables and columns.

How to Disable metabase sync and run whenever we need?

  • Go to Admin panel → Databases → Click the MySQL database you want to disable sync.
  • Turn on This is a large database, so let me choose when Metabase syncs and scans

I recollect one more case like this, I was using Metabase with SQL Server to generate some report and send it to the Business team at every 12/24hr interval. After setting up Metabase, the Report Server’s performance was dramatically slow. And Metabase ran the queries more than 2 hours.

Caution - MetaBase for BigQuery:

By default, metabase will run the sync at some particular interval. If you configure metabase to point to large BigQuery or Athena datasets, make sure you disable sync — because you will end up paying for scanned data

So if you are using metabase, do sync at non-production hours or fewer traffic hours. In the documentation they mentioned without sync, metabase won’t work. But still, Im using metabase to run Ad-Hoc queries and working fine without the sync on other MySQL servers.