HSoC — Hadrian Optimisation: Elusive Unused Imports (Update 6)

4 min readAug 21, 2019

This will be my final update post before the end of project summary. Since the last update I’ve moved houses (again), had my laptop die, come back to life, then die again, eventually succeeded getting NixOS installed on my machine, wrote an absurdly slow unused import tester, did a load of data entry for said tester, and left said tester running over the weekend (when I said slow, I meant slow), opened a feature request about the results of said tester, made an MR based on that, and wrote up some documentation on Hadrian expressions (which will hopefully be merged soon).

Whew, it’s all been quite hectic really. Let’s just say I’m glad my flatmate is so understanding about all the boxes still in the front room.

Unused imports

Motivation

Why do we want to remove unused imports? To:

Clean up our code
Potentially not have to compile that file/package anymore, speeding up building (quite optimistic for GHC but easily possible in many projects)
Increase parallelism by potentially shortening the critical path

Method

For the whole messy thing, you can check out the code on my GitHub, but essentially it removes imports from files I identify as (probably) being on the critical path and checks if the build system still completes without them, flagging that import as possibly being unused if it does.

To identify these files I used a Shake report, ordered by WTime (and then ETime once the WTimes were getting pretty low) and entered each of their details into one of these:

data Target = Target FilePath Int Int

Where the first Int is the line number the imports start at and the second Int is where they finish. I did close to 300 by the end, which you can see on my GitHub. It wasn’t a whole lot of fun but did give me a nice opportunity to enjoy some podcasts.

My program takes a list of these Targets, and for each one, removes the imports line by line (fairly naively) and runs hadrian/build.sh -j --flavour=quickest using system :: String -> IO ExitCode. If the ExitCode is ExitSuccess the import is added to the log, otherwise it’s not.

Now you can probably see why this was so slow. If n is the number of Targets and x is the average number of imports, we run nx rebuild commands, each of which could take 5 seconds if it fails quickly, but equally could take 5 minutes if it compiles and has a lot of dependents. It took quite a while, but thankfully it was much better than doing it all by hand. At least I could leave it to run and do something else.

Results

After I set it running I thought to myself “oh god, this isn’t going to turn up anything, is it?”, but by then end I was pleasantly surprised to find quite a number of seemingly unused imports. Through this method and with the help some experienced GHC contributors (particular thanks to Sebastian Graf) in the discussion section of the related issue I created, we managed to identify three bugs/limitations of -Wunused-imports that meant there were unused imports lingering in GHC’s code base:

Imports of form import foo () can’t be detected when they’re no longer necessary
The order of imports matters for whether they get recognised as unused
Redundant C++ headers can’t be detected

The first of these was believed to be the one that deserved the most attention, since removing those imports has the greatest chance to remove actual dependencies. These kinds of imports are often used for just importing the type class instances from a module. If those instances are no longer used, that import becomes redundant. So I made an MR to remove them.

The order of imports problem is less likely to remove actual dependencies, because it’s related to reexporting functions and modules, so there’s more likely to be transitive dependencies. Eg. foo is reexported in bar so in:

module baz whereimport foo
import bar...some code that uses foo and bar...

…import foo is redundant, but doesn’t get picked up by the compiler. However, removing foo won’t do much for performance because foo is a dependency of bar and baz depends on bar, so either way foo needs to be compiled before baz, meaning it still can’t be done in parallel.

There is at least one example where this is not the case, but even then it’s still a transitive dependency in a different way. It’d be nice to sort these examples out to clear up code a little, but it likely won’t do much for performance. Maybe I’ll get to them some time after the project, we’ll see.

As for the C++ includes, even though I can build Hadrian without them, that may not be the case for all setups, so it’s probably best to leave them for now.

Next Steps

Try to help with Hadrian cloud builds
Collect final times and time the make build for comparison
Write a summary of the project for the concluding blog!