HSoC — Hadrian Optimisation: First GHC Contribution (Update 4)

James Foster
4 min readJul 16, 2019

--

This week I had my first merge request for this project (and GHC in general) accepted and merged! The related issue for the MR, which covers benchmarks and the steps to reproduce, is here.

Everyone should now get a few minutes knocked off their GHC build times (with Hadrian anyway), with maybe a bit more or less depending on your hardware and build flavour. Sadly the devel1 build flavour shouldn’t be sped up at all since the commit just means that HsInstances and DynFlags are now compiled with -O0 in Stage0, which is what devel1 does anyway for all compiler modules. For more information on the predefined build flavours, see the documentation.

Choices

I spent some time trying to go through DynFlags commenting out code, looking to see if I could find any particular part slowing things down, to no avail. Then I came across this lengthy issue about it that’s been open for 6 years and has some big names in it, and realised it’s far more complicated than that and probably best left for people who know more than me.

In the end I decided that the commit should just compile HsInstances and DynFlags with -O0 in Stage0 of the build. I considered making them compile with -O0 for Stage1+too, but it turned out that the quickest flavour was already doing that and I figured that it was probably best to leave the output binaries unchanged. For example, someone compiling using the perf build flavour presumably wants the resulting compiler to be as fast as possible, even if it takes longer to build. It seems as if DynFlags may still have non-linearity issues if not compiled with -O, so it seems best to leave it as is for now. Perhaps a user tweak or command line flag could be introduced to opt in/out of the optimisations.

The Code

You can find the following code in hadrian/src/Settings/Packages.hs

Fig. 1

The commit added lines 5–9 (in the GitHub gist above) to the file. But what’s going on here? The code mostly does what it says on the tin, but for a more in-depth look that tries to explain some Hadrian fundamentals when they arise, let’s start with line 1.

There are two different builder functions used in Hadrian. One is a part of the record syntax for Target c b in hadrian/src/Hadrian/Target.hs, with type builder:: Target c b -> b:

Fig. 2

The one actually being used here is builder :: BuilderPredicate a => a -> Predicate, from hadrian/src/Expression.hs:

Fig. 3

This has changed since the original Shake paper where it was builder :: Builder -> Expr Bool (see Fig. 3 of the paper), which threw me off originally, but the main difference is that it now uses a type class instead of more rigidly requiring a Builder as the comment in the above gist explains, and more type parameters have been introduced to Expr and Target.

Here Predicate is just a type synonym for Expr Context Builder Bool. Here’s the definition for Expr:

Fig .4

If you’re not too sure on ReaderT I recommend Jonathan Fischoff’s article on the Reader monad and Monday Morning Haskell’s article on monad transformers, but essentially for some target that comes along with a particular Context and a particular Builder, a Predicate can return True or False, possibly based on some IO action, since Action implements MonadIO.

In this case, builder (Ghc CompileHs) will be true any time we’re building GHC with CompileHs :: GhcMode, meaning we’re compiling a Haskell source file (instead of a C source file, for example). Ghc CompileHs :: Stage -> Builder since we’ve left off a parameter from the constructor, but thanks to builder being defined for a type class, this means that it matches for every Stage, instead of us having to repeat code for every possible stage with the old implementation.

That ? operator has also got a little more complicated since the paper to become more general, but here’s a hybrid of the old and new definition that should clarify its use in this instance:

Fig. 5

Basically, if the Predicate returns true, we return the Expr on the right, otherwise we return mempty (which is why we need the Monoid type constraint). In this case we’re going to mconcat :: Monoid a => [a] -> a together a list of Expr Context Builder [String], with the [String] being a list of flags.

Phew… and that was just line 1! Don’t worry, now we’ve got past the initial concepts we can get to the end a lot quicker.

Lines 8–9 of Fig 1, the ones I actually added, are relatively simple. inputs :: [FilePattern] -> Predicate c b takes a list of FilePatterns and returns a Predicate that returns true if any of the input files from the Target match any of the FilePatterns. (Confusingly, the are two type synonyms for Predicate, one with type parameters i.e. type Predicate c b = Expr c b Bool, and one without that we’ve already seen, where c and b are Context and Builder respectively.). The pattern “//DynFlags.hs” will match any FilePath that ends with "DynFlags.hs".

stage0 :: Predicate just returns true if the build is currently in Stage0, as you’d expect, and then we pure ["-O0"] :: Expr Context Builder [String] to raise up the -O0 flag to the Expr Context Builder level we need to mconcat it.

Put that all together and we see that if our target input is HsInstances or DynFlags in Stage0, [“-O0”] is returned, otherwise the empty list is returned (which comes from ? and mempty :: [String]).

Next Steps

  • Clear up the things I’ve been putting off like reproducing bugs I encounted, making issues for them, correcting/writing documentation etc.
  • Search for the next big optimisation, starting with profiling Hadrian (if I can get that working)

--

--

James Foster

Computer Science student. I worked on Hadrian Optimisation for Google’s Summer of Code.