Writing a Zip File Importer: The Loader

Part 3

For a zip file-based loader, luckily there is not much heavy lifting. Working with the finder led to the realization that zip files use / while users use whatever the hell they want for a path separator. With that path-managing code already dealt with, that leaves just implementing the importlib.abc.SourceLoader ABC. That requires implementing the following methods:

  • get_filename(fullname)
  • get_data(path)
  • (optionally) path_stats(path)

Not much, eh? This is the kind of thing that I view as the crowning achievement of importlib (that and making import maintainable in pure Python code). By staring at the interfaces originally defined in PEP 302 for literally years, I have slowly been able to refine the algorithms involved with import (and especially loading) to the bare essentials that are absolutely required at any one point based on what may vary between uses (e.g. storage mechanism, variance of the language, etc.).

In the case of zip files, the three methods to be implemented are rather easy. The get_filename() method just returns what was given to the constructor. The get_data() method needs to translate paths which use OS-specific path separators and possibly starting with the zip file path as a prefix, but otherwise it’s just reading from a zip file and returning the bytes. The path_stats() method is actually quite easy thanks to zipfile.ZipInfo objects. Implement those three methods and you have source loading along with optional bytecode loading (toss in set_data() and you have bytecode writing as well, but for overhead reasons I didn’t implement it).

And that’s it! If you look back through the collection of posts on this topic you will find posts on the path hook, finder, and this loader post which cover (roughly) everything needed to make a zip file importer in pure Python come about. Issue #17630 is tracking the issue of whether this code should end up in the stdlib (it’s a little sticky with zipimport there for backwards-compatibility and bootstrapping reasons, so a pure Python importer is duplicate functionality, albeit with easier maintainability and some niceties).

One thing that I should mention which is not critical to understanding how I went about writing a loader for zip files is a quirk of trying to work with bytecode files in zip files. It turns out the zip files only keep a modification time at a two second granularity. Yes, you read that right: two seconds. That means only even numbered seconds are possible. Unfortunately that’s troublesome as importlib as of this writing does just a straight numeric comparison of the timestamp found in the bytecode file and what path_stats() returns for the source file. In order to properly support bytecode files in order to minimize ignoring them I will probably have to extend the API for loaders to add a compare_mtime(source, bytecode) in order to support a one second fuzziness factor which can be overridden for this special case.