HSoC — Hadrian Optimisation: Elusive Unused Imports (Update 6)
This will be my final update post before the end of project summary. Since the last update I’ve moved houses (again), had my laptop die, come back to life, then die again, eventually succeeded getting NixOS installed on my machine, wrote an absurdly slow unused import tester, did a load of data entry for said tester, and left said tester running over the weekend (when I said slow, I meant slow), opened a feature request about the results of said tester, made an MR based on that, and wrote up some documentation on Hadrian expressions (which will hopefully be merged soon).
Whew, it’s all been quite hectic really. Let’s just say I’m glad my flatmate is so understanding about all the boxes still in the front room.
Unused imports
Motivation
Why do we want to remove unused imports? To:
- Clean up our code
- Potentially not have to compile that file/package anymore, speeding up building (quite optimistic for GHC but easily possible in many projects)
- Increase parallelism by potentially shortening the critical path
Method
For the whole messy thing, you can check out the code on my GitHub, but essentially it removes imports from files I identify as (probably) being on the critical path and checks if the build system still completes without them, flagging that import as possibly being unused if it does.
To identify these files I used a Shake report, ordered by WTime (and then ETime once the WTimes were getting pretty low) and entered each of their details into one of these:
data Target = Target FilePath Int Int
Where the first Int
is the line number the imports start at and the second Int
is where they finish. I did close to 300 by the end, which you can see on my GitHub. It wasn’t a whole lot of fun but did give me a nice opportunity to enjoy some podcasts.
My program takes a list of these Target
s, and for each one, removes the imports line by line (fairly naively) and runs hadrian/build.sh -j --flavour=quickest
using system :: String -> IO ExitCode
. If the ExitCode
is ExitSuccess
the import is added to the log, otherwise it’s not.
Now you can probably see why this was so slow. If n is the number of Target
s and x is the average number of imports, we run nx rebuild commands, each of which could take 5 seconds if it fails quickly, but equally could take 5 minutes if it compiles and has a lot of dependents. It took quite a while, but thankfully it was much better than doing it all by hand. At least I could leave it to run and do something else.
Results
After I set it running I thought to myself “oh god, this isn’t going to turn up anything, is it?”, but by then end I was pleasantly surprised to find quite a number of seemingly unused imports. Through this method and with the help some experienced GHC contributors (particular thanks to Sebastian Graf) in the discussion section of the related issue I created, we managed to identify three bugs/limitations of -Wunused-imports
that meant there were unused imports lingering in GHC’s code base:
- Imports of form
import foo ()
can’t be detected when they’re no longer necessary - The order of imports matters for whether they get recognised as unused
- Redundant C++ headers can’t be detected
The first of these was believed to be the one that deserved the most attention, since removing those imports has the greatest chance to remove actual dependencies. These kinds of imports are often used for just importing the type class instances from a module. If those instances are no longer used, that import becomes redundant. So I made an MR to remove them.
The order of imports problem is less likely to remove actual dependencies, because it’s related to reexporting functions and modules, so there’s more likely to be transitive dependencies. Eg. foo
is reexported in bar
so in:
module baz whereimport foo
import bar...some code that uses foo and bar...
…import foo
is redundant, but doesn’t get picked up by the compiler. However, removing foo
won’t do much for performance because foo
is a dependency of bar
and baz
depends on bar
, so either way foo
needs to be compiled before baz
, meaning it still can’t be done in parallel.
There is at least one example where this is not the case, but even then it’s still a transitive dependency in a different way. It’d be nice to sort these examples out to clear up code a little, but it likely won’t do much for performance. Maybe I’ll get to them some time after the project, we’ll see.
As for the C++ includes, even though I can build Hadrian without them, that may not be the case for all setups, so it’s probably best to leave them for now.
Next Steps
- Try to help with Hadrian cloud builds
- Collect final times and time the make build for comparison
- Write a summary of the project for the concluding blog!