Wix Engineering
Published in

Wix Engineering

MySQL BLOB Fetch Performance in Java

Cover image by DALL-E: “a manometer with pipe with a dolphin on a background, 3d render”

Benchmark Environment

This benchmark is slightly less micro and slightly more complicated than my usual benchmarks. We have 3 major parameters here: MySQL Server, network between MySQL and Application and compression algorithms.

Datasets and Compression Algorithms

Because this blog post is a continuation of my previous one — Java Compression Performance, I’m going to use the same compression algorithms and datasets as there.

  • One is based on the real user data, contains URLs, UUID etc.
  • Another is purely random, using different alphabet sizes to achieve different compression ratios.

BLOB in MySQL

MySQL has BLOB column type which allows to store binary data (check docs for the maximum length, and also few tips for using BLOB).

  • DYNAMIC row format for BLOB stores 20-byte pointer to where it’s stored. As opposed to REDUNDANT and COMPACT row formats, which store first 768 bytes in a table and the rest in overflow pages. In this benchmark data sizes start from 1028 bytes, so this kind of optimization is irrelevant.
  • COMPRESSED row format enables compression on database level, so 20-byte pointer will point to another storage which will be compressed using deflate with 4-byte length prefix. For the benchmark sake we will use the same algorithm on application level to see the overhead of MySQL over compression. You may find useful another blog post with overview of MySQL compression.
CREATE TABLE `uncompressed_blobs` (
`id` INT NOT NULL PRIMARY KEY,
`data` MEDIUMBLOB NOT NULL
) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;
CREATE TABLE `compressed_blobs` (
`id` INT NOT NULL PRIMARY KEY,
`data` MEDIUMBLOB NOT NULL
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;

Network and Hardware

For this benchmark I ended up with 4 different configurations:

  • localhost. This is the most naive and “in a vacuum” configuration. Both JMH benchmark and MySQL were run on a same laptop (i7–1165G7, 16GB RAM, Ubuntu 22). This configuration provides the best “network” latencies, and allegedly without too much contention, as 2 threads were used for the JMH benchmark and the rest for MySQL (which was underutilized anyway).
  • AWS. I used the actual production cluster of MySQL Server with master/slaves with proxysql in front of it. JMH benchmark was run on a separate server inside the same region.
  • 1Gbit is so-called configuration that I ran on my home network. MySQL server was installed on an old computer with Atom processor, and a JMH benchmark was run from another laptop (i7-U10610U, 32GB RAM, Ubuntu 20): first run using Wi-Fi; and another one using Ethernet (I don’t have Ethernet port on a laptop from localhost configuration).

Benchmarks

For this benchmark the most important metric is a throughput: how many bytes of uncompressed data we can get in a second. Response time can’t show a meaningful result because it doesn’t include compression ratio.

Benchmarks for AWS configuration

I’d guess that the most interesting results are for AWS configuration as it’s closest to a potential production use case.

A line chart for AWS configuration, large data sizes. Throughput
AWS, real dataset, large BLOBs, throughput bytes per second
A line chart for AWS configuration (highlighted), large data sizes. Throughput.
AWS, real dataset, large BLOBs, highlighted, throughput bytes per second
A line chart, AWS configuration, medium data size. Throughput
AWS, real dataset, medium BLOBs, throughput bytes per second
A bar chart animation: throughput for different data sizes and different compresion rations
AWS, stub dataset, different compression ratios, throughput bytes per second

Benchmarks for localhost

What localhost benchmark really shows is the ideal case in which network is as fast as possible.

localhost, real dataset, large BLOBs, throughput bytes per second
A line chart for throughput, localhost configuration, medium data size
localhost, real dataset, medium BLOBs, throughput bytes per second

Benchmarks for local network and weak server

Even though it is pretty much unlikely to encounter such configuration in a real production environment, it may show something interesting.

A line chart for throughput, 1GBit Ethernet configuration, large data sizes
1GBit Ethernet, real dataset, large BLOBs, throughput bytes per second

Comparisons

I also did a couple interesting (in my view) comparisons.

MySQL Compression Algorithm

The first comparison is between ROW_FORMAT=COMPRESSED and using UNCOMPRESS function of MySQL over compressed data in ROW_FORMAT=DYNAMIC. Basically, we compare how storage is organized versus simple decompression.

A line chart for throughput, MySQL ROW_FORMAT=COMPRESSED vs UNCOMPRESS function
localhost, Compressed Table vs UNCOMPRESS, throughput bytes per second
A line chart for throughput, AWS configuration, Java vs MySQL
AWS, Java vs UNCOMPRESS, throughput bytes per second
A line chart for throughput, localhost configuration, Java vs MySQL
localhost, Java vs UNCOMPRESS, throughput bytes per second

lz4 vs Uncompressed

Let’s take a slightly closer look on lz4 and uncompressed, as they are clear winners in localhost and AWS configurations. Here is the small animation to see lz4 vs uncompressed in all configurations:

An animation: line charts for throughput of lz4 vs uncompressed for different configurations
lz4 vs Uncompressed, stub dataset for all configurations, throughput bytes per second
A line chart for throughput, AWS configuration, lz4 vs uncompressed
AWS, lz4 vs Uncompressed, throughput bytes per second

Conclusion

For our particular use case lz4 looks very promising. When the compression ratio is not good, it loses slightly, but generally it outperforms anything else. The next option is not using compression at all.

  • Data can’t be used within database. So, if you, for example, store JSON in a BLOB, you won’t be able to use JSON functions.
  • It moves CPU computations from MySQL to your application (which could be an upside in some cases as well).
  • It requires more effort from the application side.

--

--

Architecture, scaling, mobile and web development, management and more, this publication aggregates blog posts written by our very own Wix engineers. Visit our official blog here: https://www.wix.engineering/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dmitry Komanov

Software developer, moved to Israel from Russia, trying to be aware of things.