Data compression in Mysql
You can enable data compression in Mysql to save space . But only if your case is relevant. Let’s see how to understand that, monitor and configure compression the right way in Mysql…
Enabling table compression
Table data compression is enabled when table is created using the ROW_FORMAT
settings option:
CREATE TABLE `test` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED
Now Mysql will store data in compressed form and automatically uncompress it on reading. In order to evaluate compression effect, let’s also create uncompressed table and fill it with the same data:
CREATE TABLE `test_uncompressed` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB
We’re going to fill it with 100k
random rows from generated CSV
file:
LOAD DATA INFILE '/var/lib/mysql-files/data.csv'
INTO TABLE test_uncompressed
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
And let’s copy that data to test
table (where compression is enabled):
INSERT INTO test SELECT * FROM test_uncompressed
In order to understand how much space we’ve saved, let’s compare *.ibd
files on disk:
As we can see, compressed table takes 40%
less space on disk, so in our case it makes sense to use compression.
There can be cases when compression ratio is going to be low (e.g. data is too random and too unique). In this case you should disable compression, cause it will only lead to additional CPU load.
Tuning compression
When creating table, you can specify compression page size, which will define compression “chunk” page. Long story short and simple, table data will be compressed in those chunks. Compression page (“chunk”) size is configured using KEY_BLOCK_SIZE
option while creating table:
CREATE TABLE `test_2` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB KEY_BLOCK_SIZE=2
Let’s compare multiple values for KEY_BLOCK_SIZE
to understand how it affects our table compression efficinecy:
So, in our case we should be using KEY_BLOCK_SIZE=4
as it shows the best compression efficiency:
Mind, that each dataset will show different compression results, and the best approach is to test your table with different KEY_BLOCK_SIZE
values and then pick the best one.
Monitoring compression performance
If table data is updated frequently, one important thing to monitor is how compress_ops_ok
metric is related to compress_ops
from INFORMATION_SCHEMA.INNODB_CMP
table:
The closer compress_ops_ok
to compress_ops
the better. If difference grows, that means Mysql has to create new pages and compress them more frequently (which is called compression failure). This usually happens due to changes being made to the table data. In such cases it makes sense to experiment with KEY_BLOCK_SIZE
value or disable compression at all if you experience high CPU load.
So after cheking INNODB_CMP
table, we actually can tell, that the best option in our case is to have KEY_BLOCK_SIZE=8
because it leads to least compression failures (and takes only 10%
more space than with KEY_BLOCK_SIZE=4
).
Archive engine
There’s also an ARCHIVE
engine, that is intented to store compressed data which is not going to be changed (e.g. when you prebuild values maps from other sources). Create table as follows:
CREATE TABLE `test_arc` (
`id` int ,
`msg` text
) ENGINE=Archive
And we can see that it takes even less data on disk since it can leverage better compression structure (as we can’t change data in this table):
Summary
Enable Mysql table compression when you need to save disk space:
CREATE TABLE `test_2` (
`id` int DEFAULT NULL,
`msg` text
) ENGINE=InnoDB KEY_BLOCK_SIZE=8
Test compressed table sizes with different KEY_BLOCK_SIZE
values to pick the best one. Keep compression efficiency monitored using system table:
SELECT * FROM INFORMATION_SCHEMA.INNODB_CMP
Where compress_ops_ok
and compress_ops
values should be as close to each other as possible.