Update multiple rows in SQL with different values at once
Did you ever have a problem that you needed to update multiple rows in database, with different values? I did, and it gave me a little headache. I will show you here, how to efficiently solve this problem in NodeJS.
Image that you have a lot of entries in database. Like a million and then some. But you want to change each field to different value (lets say that you want to encrypt first name). How will you do that in NodeJS? Well, one solution is to do it one by one, but that is not efficient and it will take a lot of time. But we can try. Lets use knex for this issue
knex.transaction(function(trx) {
knex('users')
.transactiong(trx)
.select('id')
.select('firstName')
.then(async function(res) {
for (let i = 0; i < res.length; i++) {
const userData = res[i]; //here we have id and firstName
const encryptedFirstName = encrypt(userData.firstName);trx('users')
.where('id', '=', userData.id)
.update({
firstName: encryptedFirstName
})
}
})
.then(function() {
trx.commit();
process.exit(0);
})
.catch(function(err) {
trx.rollback();
process.exit(1);
})
});
As I said, this solution is ok, for low amount of data. But can we do this faster? Well, first, I started googling a little bit and i found this SQL solution:
UPDATE users
SET firstName = (case when id = 1 then 'encryptedFirstName1'
when id = 2 then 'encryptedFirstName2'
when id = 3 then 'encryptedFirstName3'
end)
WHERE id in (1, 2, 3);
Looks nice doesn’t it? But one senior developer at my first job, always asked me, can we do it differently? So I started thinking and thinking and thinking and finally I started building a solution. First thing first, I need id
and firstName
. We can simply say this:
SELECT 1 as id, 'myFirstName1' as firstName;
Ok, we can generate this, no problem, but we need a bunch of data, how to do that? Well, there is command called UNION ALL
that will join all selects together.
SELECT 1 as id, 'myFirstName1' as firstName
UNION ALL
SELECT 2 as id, 'myFirstName2' as firstName
UNION ALL
SELECT 3 as id, 'myFirstName3' as firstName
Ok, so now we generated all this stuff that needs to be inserted / updated in users
table. But how? Well, did you know, that we can use JOIN
in the UPDATE
statement? Yes, we can. So, we will join this data from this select statements and set our firstName as new value and id as our condition like this
UPDATE users u
JOIN (
SELECT 1 as id, 'myFirstName1' as firstName
UNION ALL
SELECT 2 as id, 'myFirstName2' as firstName
UNION ALL
SELECT 3 as id, 'myFirstName3' as firstName
) a
ON u.id = a.id SET u.firstName = a.firstName;
Isn’t this much more beautiful solution than that above with WHEN CASE
statement?
Now, we need to implement this in NodeJS. We will do this in batches, so we need to define our batch size. Lets set it to 3000. We will need to have two loops; one for batch rotation and one for encrypting data inside of a batch.
const BATCH_SIZE = 3000;
knex.transaction(function(trx) {
knex('users')
.transactiong(trx)
.select('id')
.select('firstName')
.then(async function(res) {
for(let i = 0; i < res.length: i+=BATCH_SIZE) {
for(let j = 0; i < BATCH_SIZE; j++) {
const userData = res[i+j];
}
}
})
.then(function() {
trx.commit();
process.exit(0);
})
.catch(function(err) {
trx.rollback();
process.exit(1);
})
});
Now, we need to build our query. Only think what is dynamic is that JOIN
, everything else is fixed, so we can hardcode it
let sql = 'UPDATE users u JOIN (';
for(let i = 0; i < res.length: i+=BATCH_SIZE) {
let nestedSql = '';
for(let j = 0; i < BATCH_SIZE; j++) {
const userData = res[i+j];
} sql += `${nestedSql}) a on u.id = a.id SET u.email = a.email;`;
await trx.raw(sql);
}
Ok, now for the “hard” part — nested sql. We need to add UNION ALL
for every query except for first and last. We need to add SELECT
always. But what about last batch? What if there is less data than our batch size? Then, our userData
variable will be undefined and we need to break out. So, we have now our algorithm, we just need to have the correct order. Well, obviously first check wil be if data is undefined. And we know that we do not need any check for adding select statement. So, there we go…
let sql = 'UPDATE users u JOIN (';
for(let i = 0; i < res.length; i+=BATCH_SIZE) {
let nestedSql = '';
for(let j = 0; i < BATCH_SIZE; j++) {
const userData = res[i+j];
if (!userData) {
break;
}
if (j != 0) {
nestedSql += 'UNION ALL ';
}
nestedSql += `SELECT \"${data.id}\" as id, \"${data.firstName}\" as firstName `;
}
sql += `${nestedSql}) a on u.id = a.id SET u.firstName = a.firstName;`;
await trx.raw(sql);
}
Now we can run this and it will do a bulk update of 3000 rows. Pretty neat, and fast.
Now, go ahead and commit this code, but do not forget on pull request. If you want to create pull request from command like, be sure to check my other story in which I talked about GitHub CLI tool.