HSoC — Hadrian Optimisation: Final Report

James Foster
6 min readAug 26, 2019

--

It’s been fun, but Google’s Summer of Code has finally come to an end. It’s not been especially plain sailing: I started 2 weeks late due to exams, moved house twice, and had my laptop (the only thing I could get Hadrian to build on at the time) die on me. But despite all that, I’ve learnt a lot and hopefully managed to provide some value for the community.

This project was all about trying to optimise Hadrian, the new build system for GHC which uses Shake instead of Make, with the hope being that faster build times helps GHC devs, which in turn helps everyone in the wider community that uses GHC. It has mostly involved a lot of investigation compared to actually writing code. You go down one path, hoping it’ll turn something up, but ultimately if it doesn’t there’s not much you can do except make the most of it and move on to the next thing.

It’s been quite a challenging experience, certainly more challenging than I thought it would be when writing the proposal. My previous optimising experience came from speeding up some (pretty contrived) C code for running on the uni supercomputer, which was really very different from this project, and set some unrealistic expectations for the rate of significant improvements. Switch two loops around? Speedup! Change doubles to floats? Speedup! etc.

Not so for Hadrian. The low hanging fruit sadly wasn’t as plentiful as we were hoping, though it really speaks to the power of Haskell and GHC that, despite not writing for speed, experienced developers ended up writing something pretty performant anyway. That’s not to say there’s no room for improvement, there certainly is, just that it’s all a bit more involved than what I was initially expecting. Cloud builds, for example, will hopefully cut down build times massively, but fixing the associated issues with them is far from trivial.

Regardless, this experience has taught me a lot and I hope to continue learning and contributing to Hadrian/GHC/open source going into the future.

Finally, before we get to the work for the project, I’d like to thank Neil Mitchell and Andrey Mokhov for being fantastic mentors, and Alp Mestanogullari for offering advice and enduring all of my emails despite not officially being one of my mentors (didn’t realise until the end, sorry Alp).

Merge Requests

Some time and energy was put into commenting out parts of these files to see what bits were slow, until I came across discussions on them that made it obvious that the issues run much deeper than that and have been elusive for years, so we moved on to compiler flags.

There was a hope that maybe there was one particular flag, or at least a small set of flags, causing HsInstances and DynFlags to take so long to compile. Finding these offending flags was the focus of compFlags, and much time was spent writing and running it. I tried to automate the process as much as possible, even parsing GHC compiler flags from the documentation using Pandoc. This is discussed more in Update 3 of my blog.

It turned out that for both files, no one flag or set of flags could account for their lengthy compile times, and I learned that -O and -O2 are not just sets of compiler flags, as the documentation suggests, but actually have other (undocumented, as far as I can tell) effects.

In the end, we decided creating a rule that compiles them both with -O0 in Stage 0 was the best option, since it speeds up builds significantly without changing the Stage 1 outputs. This, and the code itself, is discussed more in Update 4.

While parsing the compiler flags in the documentation I found that this one was incorrectly named.

This was the result of profiling Hadrian, and is discussed at length in Update 5. A decent amount of time was spent figuring out how to get a profile, interpreting it, and testing it. It turned out no one thing was taking up that much time, although it initially appeared that ?== was. Since patterns for ?== are going to be changed to make it more efficient in a future Shake release, I went through Hadrian and got ahead of this change. This didn’t result in any speedup, but it’s possible it might in the future and should definitely prevent some headaches later on.

This is discussed at length in Update 6. It involved commenting out imports in files probably on the critical path, to see if they were really needed for a successful build, which was all done by unusedImports. This, like compFlags, took a very long time to run and involved much more manual data entry. It managed to turn up a decent number of seemingly unused imports of different types, which is discussed more in Update 6 and Add check for unused imports to CI. We decided that import foo () statements were the best candidates for removal, so I removed them in this MR, though it’s hard to say if this resulted in any speedup since the build times vary too much and it would only be by seconds at best. At the very least this cleaned up the code a bit.

This documentation builds on my explanation of Predicates in Update 4 and will hopefully make it easier for people new to Hadrian to get to grips with it. You can find it at hadrian/doc/expressions.md.

When this MR is done, Hadrian will use -dynamic-too to build modules in both static and dynamic ways (when appropriate), instead of having separate commands for the different ways. This should significantly cut down build times when building both static and dynamic ways, which is the case for most flavours, including the one used in the CI.

Currently I just need to find a way to prevent the separate dynamic rules from firing when using -dynamic-too, which I’ll hopefully be able to do soon after GSoC is over. The last commit for it before the end of GSoC is 7b6d734e.

Results

These times came from going to the root of the GHC repo and using the following commands (or more accurately, they came from using runTrials to automate the running of these commands):

I tried to get times for all the provided flavours, but 8GB of RAM was not enough to avoid disk thrashing with many of them, so I only have the times for quick, quickest, and prof. These times were taken on a Quad-core i5–3570K @ 4.2GHz, your results may vary.

Build times can vary a decent amount and I only ran these times once, so take them with a grain of salt, but there is still a clear speedup here, mostly from compiling Stage0 HsInstances and DynFlags with -O0.

Other work

Some of these have already been referenced, but just to keep all the things I did easy to find:

Issues

Documentation

Blogs

Investigation Tools

You can find a repo of all my resources on my GitHub. The code I wrote to help me with this project can be found in the projects directory.

The key programs I wrote and used were compFlags, runTrials, and unusedImports. The poorly named shake-demo was what I used to test the .dependencies rule discussed in Update 2. None of these projects are in a state to be used generally, but I’m mentioning them here on the off chance they can serve as inspiration for people in a similar position to me before this project.

--

--

James Foster

Computer Science student. I worked on Hadrian Optimisation for Google’s Summer of Code.