Inserting large data volumes into Mysql
When you insert one or couple of rows into Mysql, you just use standard INSERT
queries and don't give a shit. But when it comes to insert a lot of data at once (e.g. loading data into table) we can loose a lot of time waiting for actual data being loaded.
First of all, tune settings so Mysql can insert data in the most efficient way on your hardware. And then let’s take a look on the list of options we have to insert big volumes of data into Mysql.
Why a lot of INSERTS is bad
If we load data as a set of INSERT
s, we make the worst possible choise and harass our Mysql server:
Let’s measure how long it takes on a small virtual 1-core node to insert 5000 rows in a row-by-row approach (using PHP and Mysqly framework):
<?php
$start = microtime(1);
for ( $i = 0; $i < 5000; $i++ ) {
mysqly::insert('bulk', ['col' => 'val', 'col2' => 'val2']);
}
echo microtime(1) - $start;
It will take (in seconds):
3.8
So we’ll load data at the speed of around 1300
rows per second on our node. Loading millions of rows can take hours. Not cool.
Bulk insert
The simplest approach is to join multiple INSERT
s into single query by using bulk format:
INSERT INTO bulk(col1, col2) VALUES('a1', 'b1'), ('a2', 'b2'), ('a3', 'b3'), ...
Let’s use this approach to load data (using mysqly::multi_insert()
method which does exactly what we need):
<?php
$start = microtime(1);
$insert = [];
for ( $i = 0; $i < 10000; $i++ ) {
$insert[] = ['val' => md5(mt_rand(1, time()))];
}
mysqly::multi_insert('bulk', $insert);
echo microtime(1) - $start;
This will give us much better performance:
0.25
Which, as we see, is almost 30 times faster (we’ve inserted twice as much rows) and around 40k
insers per second. So this method should be absolutely used when we insert a lot of rows at once. The size of bulk INSERT
query is limited by max_allowed_packet
, so either increase it or split single query into multiple smaller to fit limit.
Load data from CSV
Now the method, not so popular, but let’s check if that’s fair. Assume we have our data prepared and stored in data.csv
. Loading data from CSV is simple:
LOAD DATA INFILE '/tmp/data.csv'
INTO TABLE bulk
FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n'
This method allowed us to load 905k
rows into table within 5 seconds
. This gives speed of 180k
rows per second which is the best performance among other options to load large amounts of data:
Summary
In order to load large data volumes into Mysql, use either LOAD DATA INFILE
or at least bulk INSERT...VALUES(),(),(),...
statement.