HSOC —Hadrian Optimisation: Update 2

James Foster
3 min readJun 27, 2019

--

Despite a few technical difficulties, this week has been pretty good, so let’s jump right in!

.dependencies rule in Hadrian

Here’s the rule for files which match the pattern “**/.dependencies” in Hadrian:

You can find the full code on GitLab or, if you have the GHC repo cloned already, in hadrian/src/Rules/Dependencies.hs

There’s plenty to unpack here, particularly if you haven’t seen any Shake code before (if you don’t recognise an operator, it’s probably a Shake thing). I’ll probably write something for people new to Shake at some point, but here’s that Shake/Hadrian paper again for those interested. However, essentially this code boils down to reading a .dependencies.mk file (line 2), parsing it (line 10), and doing some data transformations/string manipulations (most everything else).

The concern was that this rule took 10 seconds, which is quite a while to deal with a file that’s less than 1MB, which could mean it was hitting O(n²) complexity. To check, I wrote a somewhat hacky program to see how long it took to execute the rule as the file size increased, inspired by Neil Mitchell’s Quadratic “deriving Generic” Compile Times post.

To get the times, I used Shake profiling and manually entered the data into a spreadsheet.

So, what was the result? Well…

…it’s linear, with big outliers (probably due to all the reading/writing to disk). After rerunning that rule the time was much much lower, so it seems like I just got unlucky with the first profile and got an outlier. The outliers are pretty large so they might be worth looking into, but overall the rule is probably fine.

HsInstances

I had more luck here. Initially I tried commenting out parts of the code to see if any particular part was responsible for the ridiculously slow compile time (~2mins in my build, ~4mins in Neil’s), but it’s just a big file of deriving statements and commenting out pretty much any of them results in the compiler shouting at you, so I gave up pretty quickly there.

Changing compiler flags proved to be more fruitful. Currently it compiles with -O/-O1, taking 1m55s on my machine. If we change to -O0 it takes a much faster 12s, and interestingly -O2 also reduces the compile time compared to -O1 with (an admittedly still slow) 1m27s, even though I was under the impression that -O2 has all the same optimisations as -O1 and a few more.

Next Steps

-O1 and -O2 are really sets of compiler flags, so the next step is to find out which flags are causing the compile time to take so long. I could do that by hand, but that sounds pretty boring and inefficient if I end up wanting to check the flags on another command. Instead I’ve started writing a program that will parse ghc/docs/users_guide/using-optimisation.rst (if there’s a better source for what flags are set by -O1 and -O2 I’d like to know), try to determine which are in -O1 and -O2, and time the compilation of HsInstances with different sets of flags. Then once I find the culprit(s), I’ll remove them from the corresponding rule, measure the effect on the rest of the build process, and hopefully make a merge request if the change is purely positive.

I also encountered some potential bugs in the build system this week which I’d like to reproduce to see if they’re actual bugs, but I want to get this optimisation out of the way first, and ideally reproduce on a more recent commit when the Stack build gets fixed.

On a personal note, I’ll be moving out very soon and going back home for a while, which might make things a little stressful this week, but overall I’m excited that the first major optimisation of this project is on the horizon!

--

--

James Foster

Computer Science student. I worked on Hadrian Optimisation for Google’s Summer of Code.