Haskell Summer of Code: Hadrian Optimisation —Update 1
I’m not much of a blogger, but I’m going to try my hand at doing somewhat weekly blogs for my Google Summer of Code project: Hadrian Optimisation.
I started Google’s Summer of Code a couple weeks late thanks to exams, but I’m away now and I’m doing my best to catch up. So far I’ve spent most of the time building GHC, reading this paper on Shake/Hadrian, and filling in the gaps in my knowledge to be able to understand what’s going on, particularly ReaderT and monad transformers in general. I found the Hacker Noon and Monday Morning Haskell articles to be very useful on the respective topics.
Building GHC
First thing’s first, I tried to build GHC using Hadrian, with some varying results. On my laptop it failed (“Executable named hadrian not found on path”), but on my desktop it was fairly plain sailing. Apart from my hard drive running out of space the first time I tried to build, it worked perfectly. I used the following command, with the parameters taken from Neil Mitchell’s blog (which I’ll be referring back to throughout this post):
./hadrian/build.stack.sh -j --flavour=quickest --integer-simple --configure --profile
This ended up taking just over 28 minutes, 12 minutes faster than Neil’s recorded time, but with slightly lower parallelism, so in all likelihood this is simply due to better single threaded performance on my hardware.
Looking through the report I found that almost half of the time was untraced, compared to only 2 minutes for Neil. This was a significant difference, but I thought maybe the videos I was watching to pass the time might have interfered with the profiling somehow, so I tried running the whole build again with nothing else open, just in case that was the culprit. I cleaned up the build and ran it again, this time adding the -B
flag to ensure that everything was rebuilt.
./hadrian/build.stack.sh clean
./hadrian/build.stack.sh -j -B --flavour=quickest --integer-simple --configure --profile
And the result was almost exactly the same. The build was 1 minute faster, but the amount of untraced time increased, so the mystery remains.
Profiling
I’ll pick out some excerpts from the shake report in this post, but here is the full report for those interested.
We see largely the same problems Neil pointed out in his post, although lack of parallelism is less of an issue with 4 cores instead of 8. They don’t look quite as precipitous, but bottlenecks are still clearly evident, as we can see from the command plot:
Untraced time
Digging into the rules a little bit, we can see that although cumulatively the untraced time is quite significant, no one rule is responsible for a significant amount of it, although _build/stage1/compiler/.dependencies
and the package database do seem to be the larger offenders.
Rules relating to stage1 seem to be particularly bad, but it’s unclear why as of yet.
Wall clock time
Sorting by WTime we see that the stage0 HsInstances is still the biggest offender for both me and Neil, but some of the others with high WTimes for Neil have dropped off the map somewhat. Not necessarily because they’ve changed, but more likely because of hardware and OS differences. Eg. DynFlags is the second slowest rule for me after HsInstances, but has just 0.09s of WTime compared to 20.7s for Neil.
Some of these rules are very significantly faster than for Neil’s build, particularly hadrian/cfg/system.config etc.
which hogged the entire CPU for an impressive 4m21s compared to my 10.99s. I can only assume that this is a Linux vs. Windows difference, but I can’t even imagine what Windows would be doing to be so slow.
Next steps
It seems like it’d be worthwhile investigating what’s going on with HsInstances, and more specifically stage0, to see if it can be sped up.
I’ll also look at _build/stage1/compiler/.dependencies
and I’m sure along the way more things will come up.