Inserting large data volumes into Mysql

Denys Golotiuk
DataDenys
3 min readAug 8, 2022

--

When you insert one or couple of rows into Mysql, you just use standard INSERT queries and don't give a shit. But when it comes to insert a lot of data at once (e.g. loading data into table) we can loose a lot of time waiting for actual data being loaded.

First of all, tune settings so Mysql can insert data in the most efficient way on your hardware. And then let’s take a look on the list of options we have to insert big volumes of data into Mysql.

Why a lot of INSERTS is bad

If we load data as a set of INSERTs, we make the worst possible choise and harass our Mysql server:

Let’s measure how long it takes on a small virtual 1-core node to insert 5000 rows in a row-by-row approach (using PHP and Mysqly framework):

<?php
$start = microtime(1);
for ( $i = 0; $i < 5000; $i++ ) {
mysqly::insert('bulk', ['col' => 'val', 'col2' => 'val2']);
}
echo microtime(1) - $start;

It will take (in seconds):

3.8

So we’ll load data at the speed of around 1300 rows per second on our node. Loading millions of rows can take hours. Not cool.

Bulk insert

The simplest approach is to join multiple INSERTs into single query by using bulk format:

INSERT INTO bulk(col1, col2) VALUES('a1', 'b1'), ('a2', 'b2'), ('a3', 'b3'), ...

Let’s use this approach to load data (using mysqly::multi_insert() method which does exactly what we need):

<?php
$start = microtime(1);
$insert = [];
for ( $i = 0; $i < 10000; $i++ ) {
$insert[] = ['val' => md5(mt_rand(1, time()))];
}
mysqly::multi_insert('bulk', $insert);
echo microtime(1) - $start;

This will give us much better performance:

0.25

Which, as we see, is almost 30 times faster (we’ve inserted twice as much rows) and around 40k insers per second. So this method should be absolutely used when we insert a lot of rows at once. The size of bulk INSERT query is limited by max_allowed_packet, so either increase it or split single query into multiple smaller to fit limit.

Load data from CSV

Now the method, not so popular, but let’s check if that’s fair. Assume we have our data prepared and stored in data.csv. Loading data from CSV is simple:

LOAD DATA INFILE '/tmp/data.csv'
INTO TABLE bulk
FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n'

This method allowed us to load 905k rows into table within 5 seconds. This gives speed of 180k rows per second which is the best performance among other options to load large amounts of data:

Summary

In order to load large data volumes into Mysql, use either LOAD DATA INFILE or at least bulk INSERT...VALUES(),(),(),... statement.

--

--

Denys Golotiuk
DataDenys

Data-intensive apps engineer, tech writer, opensource contributor @ github.com/mrcrypster