Basic Tips for Tuning Code Performance
Some very basic things to consider when running into performance issues
Slow code is annoying. Not only for users but also for developers. I have noticed that optimizing code performance in projects where performance isn’t the main focus is something that is often done very ineffectively. So I thought I’d write down some of the things I tend to do and think about first when trying to improve the performance in a piece of code that most likely has not been tuned a lot before.
Note that this is not a guide on how to develop high-performance code. It is just a (hopefully helpful) collection of tips on where to look first when trying to gain some performance, without focussing too much on this very broad topic, while developing.
The aim of these tips is to
a) Save you time trying to tune your code’s performance ineffectively
b) Save you time running your code during its development
Tip #1 — Measure
My number one tip and the first thing I always do when thinking about improving performance anywhere in my code is to take measurements! Take the time and measure the time each section of your code takes to execute and compare those timings with the other sections. More often than not you will find that you are losing performance in code sections where you are not expecting to. So, before investing time in trying to speed up a particular section of the code, make sure that it is actually the one that is costing you performance! Sounds simple, but this basic evaluation step often gets jumped because people feel sure about where exactly they are losing performance.
The easiest way to analyze the performance of various code sections — and we have all been guilty of using this method, trust me! — is to insert print() statements into your code. It’s a quick and dirty solution that works pretty well as long as you can narrow down the area of the code at which you want to look at beforehand. Do not try this in a large codebase in which you do not know anything about the code and its performance! On the other hand of the spectrum, there are tools like the Intel VTune Profiler which is used in the High-Performance-Computing world as well. The good thing is: It used to be a pretty costly tool, but it is now usable free of charge. There is no real reason why you should not be giving it a go and dump your print() strategy.
After your analysis, you should know precisely where your code is spending a lot of runtime. From now on, focus on these sections. Write those timings down, because after you have finished improving your code’s performance, you should run the same measurement again to make sure you have gained runtime. Speaking about comparing old vs. new performance, here is a bonus tip: Write tests before tweaking the performance. Your performance improvements aren’t any good if they break the code!
Tip #2 — Reduce
Another seemingly simple tip that sounds easy but is still very, very valuable: The fastest code is the code that is not executed! I.e. the easiest way to improve performance is to drop code that is not needed anymore. This specifically includes dropping whole features that are not used. Especially in Data Science codes, but in all other codes as well, features are developed and then dumped again. Or people tested something and did not fully revert their changes afterward (a proper git branch workflow will prevent you from some of this creeping into your code!). An example I came across recently was a feature that was once very useful for our client, but has become obsolete during the development: At the beginning of our project, we were writing the final results to disk in Excel files. Later, we developed features that allowed the customer to tweak those results using our GUI. Now, saving the tweaked results took pretty long, because we were still writing them down as Excel files. However, due to all the development we had done during the project, we noticed that those Excel files were obsolete now. Simply removing this “feature” sped up the saving process by a factor of about 100 which made our client happy because it meant their workflow can be much smoother now.
If you adhere to tip #1, you know where your code is spending most of the time. Think critically. Talk to the stakeholders. Is the work the code is doing there still worth the effort? Or is it spending time producing results that are not being used anyway? Notice that this is often not a black and white question. Sometimes the answer is “we don’t need all of it, but parts of it”. In these cases, you can still get some of the benefits by dropping the not-so-relevant parts.
Tip #3 — Parallelize
Now for those of you who have been waiting for a tip on how to improve the performance of a particular piece of code, here’s my last one for now: Parallelization.
No, I am not talking about High-Performance-Computing-like parallelization. This is a very, very sophisticated topic and diving deep into this topic will take a lot of time away from you doing the “actual” (i.e. not performance-related) implementation work. There are many occasions though where trivial parallelization can help you out a lot. In a recent project, for example, we had to read a lot of files and verify their content. Because we used tip #1, we knew that reading the files was pretty quick, but verifying them wasn’t. Improving performance in the verification function would have been moderately difficult to do, but it was pretty easy to simply spread the work across multiple processes, each operating on its own set of files.
Note again that I’m not talking about High-Performance-Computing here. I’m talking about the performance of “everyday” code. That’s why I’ll give examples for Python here. If you’re living in the High-Performance-Computing world, you most likely won’t be using Python anyway. Python offers some wonderful modules for a broad range of use cases. Regarding parallelization, there are, amongst others, multiprocessing and joblib, both of which I find to be pretty useful and fairly easy to use. So if you’re just starting out with parallelization, start there.
Thanks for reading! Please feel free to post more simple tips in the comments. I’ll also keep an eye out for further things that I notice and then put together a second part of this 101 in the future.
Resources
- Intel VTune Profiler
- Python modules: multiprocessing, joblib