I’m in the middle of importing data from one database to another database. Someone I worked with was convinced that his job would take 20 hours and he could make a few optimizations, but was ultimately running into a wall. I asked about how long things were taking and couldn’t quite get a clear answer.
I work remotely, but this was a problem that seemed faster to solve by driving from the coffeeshop I was at to a coffeeshop near my coworker. He started immediately suggesting some strategies to cut down the time, but I was really just interested in what was taking so long.
The job was importing one specific object that ended up touching some similar objects. Something like this:
Some hard-to-test ideas that we had were:
- indexes slowed down creation time
- Joining eagerly helps
- Touching the author to update it’s record was inexpensive
So I proposed measuring everything we could. We ended up turning the code into this:
If the code seems contrived, that’s because it is. We wrapped every possible thing that looked like it was taking time with a timing call to statsd. We use datadog to collect our statsd metrics so this was fairly easy.
This code is executed thousands of times which means we can get some nice graphs. I chose a nice stacked-area graph:
It was easy to spot the culprit here. Incidentally it was the author.touch method. We still needed to do that, but the side effect just improved some aggregate statistics, it wasn’t by any means a critical need. So we deferred it and took it out.
Later that day we made more and more optimizations, testing all our previous assumptions and just spitballing ideas:
What we did to get here didn’t really matter. If you are curious: we added some indexes and removed some eager joins. You can see we made some impressive gains in just a few hours. All and all we cut down the task to about 3 hours.
I’ve been really loving statsd for quite some time, but Datadog makes it much easier to use and gives us such pretty graphs.