Data compression in Mysql

Data compression in Mysql

Denys Golotiuk
DataDenys
Published in
4 min readOct 10, 2022

--

You can enable data compression in Mysql to save space . But only if your case is relevant. Let’s see how to understand that, monitor and configure compression the right way in Mysql…

Enabling table compression

Table data compression is enabled when table is created using the ROW_FORMAT settings option:

CREATE TABLE `test` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED

Now Mysql will store data in compressed form and automatically uncompress it on reading. In order to evaluate compression effect, let’s also create uncompressed table and fill it with the same data:

CREATE TABLE `test_uncompressed` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB

We’re going to fill it with 100k random rows from generated CSV file:

LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE test_uncompressed
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';

And let’s copy that data to test table (where compression is enabled):

INSERT INTO test SELECT * FROM test_uncompressed

In order to understand how much space we’ve saved, let’s compare *.ibd files on disk:

As we can see, compressed table takes 40% less space on disk, so in our case it makes sense to use compression.

There can be cases when compression ratio is going to be low (e.g. data is too random and too unique). In this case you should disable compression, cause it will only lead to additional CPU load.

Tuning compression

When creating table, you can specify compression page size, which will define compression “chunk” page. Long story short and simple, table data will be compressed in those chunks. Compression page (“chunk”) size is configured using KEY_BLOCK_SIZE option while creating table:

CREATE TABLE `test_2` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB KEY_BLOCK_SIZE=2

Let’s compare multiple values for KEY_BLOCK_SIZE to understand how it affects our table compression efficinecy:

So, in our case we should be using KEY_BLOCK_SIZE=4 as it shows the best compression efficiency:

Mind, that each dataset will show different compression results, and the best approach is to test your table with different KEY_BLOCK_SIZE values and then pick the best one.

Monitoring compression performance

If table data is updated frequently, one important thing to monitor is how compress_ops_ok metric is related to compress_ops from INFORMATION_SCHEMA.INNODB_CMP table:

The closer compress_ops_ok to compress_ops the better. If difference grows, that means Mysql has to create new pages and compress them more frequently (which is called compression failure). This usually happens due to changes being made to the table data. In such cases it makes sense to experiment with KEY_BLOCK_SIZE value or disable compression at all if you experience high CPU load.

So after cheking INNODB_CMP table, we actually can tell, that the best option in our case is to have KEY_BLOCK_SIZE=8 because it leads to least compression failures (and takes only 10% more space than with KEY_BLOCK_SIZE=4).

Archive engine

There’s also an ARCHIVE engine, that is intented to store compressed data which is not going to be changed (e.g. when you prebuild values maps from other sources). Create table as follows:

CREATE TABLE `test_arc` (
`id` int ,
`msg` text
) ENGINE=Archive

And we can see that it takes even less data on disk since it can leverage better compression structure (as we can’t change data in this table):

Summary

Enable Mysql table compression when you need to save disk space:

CREATE TABLE `test_2` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB KEY_BLOCK_SIZE=8

Test compressed table sizes with different KEY_BLOCK_SIZE values to pick the best one. Keep compression efficiency monitored using system table:

SELECT * FROM INFORMATION_SCHEMA.INNODB_CMP

Where compress_ops_ok and compress_ops values should be as close to each other as possible.

--

--

DataDenys
DataDenys

Published in DataDenys

Technical articles on creating, scaling, optimizing and securing big data applications

Denys Golotiuk
Denys Golotiuk

Written by Denys Golotiuk

I write about data, machine learning, and its practical implementation. github.com/mrcrypster https://datachild.net golotyuk@gmail.com

No responses yet