ADVANCED PYTHON PROGRAMMING

International Trade

This story is all about imports, modules and packages: the underlying structure of all things Python.

13 min readJun 20, 2020

Finally, we’ve arrived at modules and packages: the last digivolve of Python. At first, it might surprise you to hear there’s a lot to say about them: after all, in C++, #include is just a preprocessor directive that literally copies over the header, and in Java, import is just a hint on how to link the bytecode. But in Python, like always, a module is an object—and as always, it opens up a new horizon of possibilities.

Importing Goods

Let’s start with the basics: import. Already, there are quite a few different objects it returns:

>>> import sys
>>> sys
<module 'sys' (built-in)>>>> import os
>>> os
<module 'os' from '.../os.py'>>>> import math
>>> math
<module 'math' from '.../lib-dynload/math.cpython-36m-darwin.so'>>>> import django
>>> django
<module 'django' from '.../site-packages/django/__init__.py'>

Some modules, like sys, are built-in—meaning it’s actually part of the interpreter, only exposed as a module (much like the Linux ProcFS, /proc, isn’t really a filesystem, but exposes the kernel’s interface as such).

Others, like os, are what you’re probably used to: Python files that define a bunch of stuff. They must be located in specific directories (the current working directory, the standard library’s root, and a few others), so Python can find them and wrap them up in a module object, through which you can access said stuff.

Yet others, like math, are not really Python files, but rather C extensions with a thin Python wrapper. Like their name suggests, they’re implemented in C, either for performance purposes (you really want your math to be as fast as possible), or because this functionality was already available in C, and there’d be no point rewriting it in Python if we could simply wrap it instead.

Finally, all third-party packages (stuff you install from PyPI with pip, or even packages you download manually and install with setup.py install) go in the special site-packages directory. So, as you can see, there are a lot of different options; and if that’s not enough, you can also do:

>>> from datetime import datetime
>>> datetime
<class 'datetime.datetime'>

Which would only import specific objects from the module; this is especially useful for modules that contain an object with a similar name, like datetime, to avoid writing the tedious datetime.datetime. Alternatively, you could:

>>> import datetime as dt
>>> dt.datetime
<class 'datetime.datetime'>

Which would import the datetime module, but bind it to the dt name, so that on one hand you don’t pollute your global namespace with multiple objects of unknown origins, and don’t have to type so much on the other.

That concept importing stuff from a module is generally discouraged, seeing as it introduces a lot of confusion about what came whence. I once saw code that did this:

join(x, y)

And had no idea what it meant, until I saw from os.path import join at the top of the file. And that’s horrible: my first thought was that it’s a local function, or maybe something that concatenates strings, or joins tables in a database; os.path.join(x, y) would’ve been so much clearer. However, when you have a really long import that you use often, you don’t want to be writing a.b.c.d.e every time, and from a.b.c.d import e is acceptable. Note that you can also use as when importing from, so if that e has an overly generic name like join, you could do from a.b.c.d import join as a_join.

Domestic Modules

But what is a module? And can we make one ourselves? Let’s investigate: we have a file, m.py:

def hello():
    print('Hello, world!')

And we do:

>>> import m
>>> m
<module 'm'>
>>> m.hello()
Hello, world!

In truth, like any object, it has an ID, a type, and a value, encapsulated in a __dict__ (which is effectively its global scope):

>>> id(m)
4455012048
>>> m.__class__
<class 'module'>
>>> m.__dict__
{..., 'hello': <function hello at 0x...>}

Where ... is all the built-in functions you can expect in a global scope, like len and ValueError. It even has a few interesting attributes:

>>> m.__name__
'm'
>>> m.__file__
'/home/user/m.py'

The name is the module’s file name (without the .py), and the file is its full path. Do note that the name stays the same, no matter which variable you bind it to:

>>> import m as x
>>> x.__name__
'm'

Because while x is a mapping in your scope, 'm' is the inherent name of the module; and if you rebind it to y, it’s not going to change. Anyway, this presents an interesting side quest:

if name == ‘main’:

First, some background: in Python, any module doubles as a library to import, and as a script to execute. Scripts often print stuff, or have other side effects—and that’s rather inconvenient when importing a library. After all, an import is pretty equivalent to execution: the module’s code is evaluated line by line into a namespace; and either that’s that, or that namespace is wrapped in a module object and made available to some other code. Of course, when importing stuff, we wouldn’t want them to run immediately—the point is to integrate them into our code and invoke them on demand; so, it’d be useful to have a way to tell the two cases apart.

The way to do it is if __name__ == '__main__'. It seems kinda hacky, and it’s unfortunate that this is the right (and only) way to go. The idea is that, when a module is imported as a file, its global variable __name__ is assigned its file’s name (the same as m.__name__ attribute we saw before—the module object simply reflect its global scope, which binds names to values, remember?). When a module is executed as a script, however, Python assigns it the special name '__main__', to indicate its eminence. Let’s take this m.py:

print(__name__)

Then import it:

>>> import m
m

Then execute it as a script:

$ python m.py
__main__

So all we need to do is check this __name__, and isolate a block of code, so that it only executes if the module is invoked as a script. For example, if m.py would look like this:

def hello():
    print('Hello, world!')if __name__ == '__main__':
    hello()

We could import it, and nothing would happen until we want it to:

>>> import m
>>> m.hello()
Hello, world!

And if we were to execute it, we’d get the desired effect:

$ python m.py
Hello, world!

It’s not recommended to make the if __name__ == '__main__' code block too big, so usually you’d see this at the bottom of the file:

if __name__ == '__main__':
    main()

With the main function encapsulating the script’s logic. If we’d have command-line arguments, we could even do this:

if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

So that we don’t hardcode the access to sys.argv in our main function, and in the unlikely (but still possible) event that some other module decides to invoke us “as a script”, but programmatically—it can do so with:

>>> import m
>>> m.main([...])

Big Package

If all your code fits in one file, that’s great; if said file starts spanning several thousand lines, it’s less so. Instead, you could split your code into multiple files, and arrange them in such a way that everything would be exposed through a module-like object called a package.

Essentially, where a module is a file with code—a package is a directory with code; and since we can’t put code directly in a directory, we put it in a file with the special name __init__.py. So this:

# m.py
def hello():
    print('Hello, world!')# Later that day...
>>> import m
>>> m.hello()
Hello, world!

Is equivalent to this:

# p/__init__.py
def hello():
    print('Hello, world!')# Later that day...
>>> import p
>>> p.hello()
Hello, world!

Of course, if you’re just going to put all your code in p/__init__.py, you might as well put it in p.py and stop being weird. But if you actually break that code across multiple files, e.g.:

# p/__init__.py
# Empty file, necessary to mark the directory as a package.# p/foo.py
def foo():
    return 'foo'# p/bar.py
def bar():
    return 'bar'

You end up with:

>>> import p
>>> p.foo()
Traceback (most recent call last):
  ...
AttributeError: module 'p' has no attribute 'foo'
>>> p.bar()
Traceback (most recent call last):
  ...
AttributeError: module 'p' has no attribute 'bar'

Well, that’s anti-climactic. Sorry. It turns out you have to import the package’s so-called submodules explicitly:

>>> import p.foo
>>> p.foo.foo()
'foo'
>>> import p.bar
>>> p.bar.bar()
'bar'

And while it does let you organize your code better, it’s a bit tedious. In fact, if that’s all you want to do, you can even omit the __init__.py file, and if that directory is on your import path, Python will figure out what you mean, and treat import p.foo as “import foo from the p directory”.

These __init__.py-less packages are called namespace packages, because they provide nothing more than “namespacing”—organizing your code over different scopes, if you will. But then, what are regular packages for?

Relativity Theory

Submodules grouped under the same package can import each other relatively, pretty much like the “current working directory” lets you refer to other files (usually in its vicinity) using succinct, relative paths. Our syntax is a little different:

# foobar.py
from .foo import foo
from .bar import bardef foobar():
    return foo() + bar()

The . indicates it’s a relative import; and a single one means that it’s a sibling submodule, located right next to this one. If we’d have a more complicated hierarchy, like so:

p/
    __init__.py     # defines x = 1
    a.py            # defines class A
    b.py            # defines class B
    sp/
        __init__.py # defines y = 2
        c.py        # defines class C

(As an aside, sp is called a subpackage, because while it’s nested in p, it’s a proper package with an __init__.py file in and of itself). Anyway:

From a, we’d be able to do from . import b and use it as b.B—or from .b import B and use it as B.
Also from a, we’d be able to do from . import sp.c and use sp.c.C, or from .sp import c and use c.C, or from .sp.c import C and use C.
From c, we’d be able to do from .. import a and use it as a.A, or from ..a import A and use it as A.
Anything defined in __init__.py files is also accessible to the rest of the submodules and subpackages.
From a, we could do from . import x and from .sp import y, while from c we could do from .. import x and from . import y. This looks a little weird at first, but makes sense once you remember that . represents the directory—i.e. the package—and since it can’t contain actual code, it uses the __init__.py file as a surrogate.
Having said that, I personally don’t like putting too much code in __init__.py files, because I find it counterintuitive to look there. Some people use it for “common utilities”, which could simply be put in utils.py with import _ from .utils; and some people put a whole bunch of logic there, in which case I really don’t understand why not use a module instead. So what should you put in __init__.py files, then?

Beautiful Wrapping Paper

One answer is “nothing”: and it’s a good one. __init__.py files are there to mark the directory as a package and make relative imports work, so just drop it there, forget about it, and go about your day.

Another answer is “the public API”. Arranging code in multiple files has the undesirable side-effect that clients have to be familiar with its structure in order to import the components they need; what if we’d abstract it away by “hoisting” any “public” components into the package’s __init__.py, thus exposing everything from its “root”? Going back to our first example…

# p/__init__.py
from .foo import foo
from .bar import bar# p/foo.py
def foo():
    return 'foo'# p/bar.py
def bar():
    return 'bar'# Later that day...
>>> import p
>>> p.foo()
'foo'
>>> p.bar()
'bar'

Nice, no? This can even be applied recursively to subpackages, so that sp/__init__.py has something along the lines of from .c import C, and whenever a.py or b.py need this class they can from .sp import C without caring about the internal structure of this subpackage.

Some people even go as far as reiterating said public interface in the __all__ list of strings, like so:

# p/__init__.py
from .foo import foo
from .bar import bar__all__ = ['foo', 'bar']

This __all__ variable is used when you import * from and by some documentation auto-generation tools, so you can take it or leave it.

-m works in mysterious ways

When people just start working with packages, one of their biggest frustrations is that any module with a relative import seems impossible to run. Take this, for example:

# p/__init__.py
from .foo import foo
from .foobar import foobar# p/foo.py
def foo():
    return 'foo'if __name__ == '__main__':
    print(foo())# p/foobar.py
from .foo import foodef foobar():
    return foo() + 'bar'if __name__ == '__main__':
    print(foobar())

When you try to run foo.py, it works:

$ python p/foo.py
foo

But not so for foobar.py:

$ python p/foobar.py
Traceback (most recent call last):
  ...
ImportError: attempted relative import with no known parent package

It’s a pretty cryptic message, which is a shame, because executing packages and submodules is actually not that hard. The thing you need to understand is that Python, as usual, is immensely dynamic—so when it attempts a relative import, the first thing it does is figure out “what package am I in”, similarly to a relative’s path resolution based on “what is the current working directory”. Python does so based on the __package__ variable, which is defined automatically when something is imported or executed as a package. To wit:

# p/__init__.py
# Empty file# p/a.py
print(__package__)

If I were to import a through its package, Python would wire everything properly, and we’d get:

>>> import p.a
p

But if I’d execute p/a.py as a script, Python wouldn’t differentiate it from a non-package situation, in which a.py just so happens to reside in p/; it’d simply go into that directory and run a.py outside of any package context:

$ python p/a.py
None

To communicate this context to Python, we’d have to invoke the module by its name, using the -m option. You can actually do it with regular modules:

$ python hello.py
Hello, world!
$ python -m hello
Hello, world!

But it makes no difference. With submodules, i.e. modules that reside inside packages, the difference is exactly that package context, which is inferred by Python if only you specify the module’s fully-qualified name.

$ python -m p.a
p

Similarly, if we’d like to execute p/foobar.py, and would also like its relative imports to work, the right way to do it would be:

$ python -m p.foobar
foobar

Executing Packages

What happens if we run python -m p? If you think about it, this question is similar to “what happens if we import p”: it boils down to the question “where is that directory’s code stored”. And just like its “initialization” code, intended to collect and expose its public API, is in __init__.py; so is its “main” logic, intended to expose its functionality via a command-line interface, is ina file with the special name __main__.py. For example:

# p/__main__.pyfrom . import foo, bardef main(argv):
    if len(argv) != 2:
        print(f'USAGE: python -m {__package__} <foo|bar>')
        return 1
    command = argv[1]
    if command == 'foo':
        foo()
    elif command == 'bar':
        bar()
    else:
        print(f'ERROR: invalid command: {command}')
        return 1if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

And now we can use it from our terminal, like so:

$ python -m p
USAGE: python -m p <foo|bar>
$ python -m p foo
foo
$ python -m p bar
bar
$ python -m p hello
ERROR: invalid command: hello

Note that I did a from . import foo, bar rather than from .foo import foo and from .bar import bar; the reason is that I like to keep my CLI separate from the rest of the package, and work with it only through its public API. This is not a strict requirement, but it helps with keeping the business logic in the package separate from “the scripting stuff” a CLI is for.

Only Siths Deal with Absolutes

If you’ve read Google’s Python style guide, you might be thinking relative imports are bad, and use absolute imports instead, even when writing a package of your own:

# p/__init__.py
# Nothing# p/foo.py
def foo():
    return 'foo'# p/bar.py
def bar():
    return 'bar'# p/foobar.py
from p.foo import foo # absolute import
from p.bar import bar # absolute importdef foobar():
    return foo() + bar()

This started in Python 2.7, when the import mechanism was much messier. Whenever a module a imported a module b, Python would look for it in the current working directory, in the standard directories, but also in a’s directory, in case it’s kind of a “relative” import.

This made everything really confusing, because things would behave differently depending on the working directory they were invoked from, and caused people to develop a strong distaste for this relativism. Instead, they said, it’s better to specify the fully-qualified name; so if p is importable, as it should be if we’re importing it, then p.foo and p.bar are as well, and unequivocally point to the same thing.

However, a lot of water has passed under that bridge, and we wouldn’t have relative imports as a language feature if it weren’t useful to, well, import things relatively in some cases; like when they’re part of the same package, and shouldn’t depend on its name (e.g. changing p to q breaks all absolute imports, while relative imports like from . work fine); or shouldn’t depend on its exact structure (e.g. from .sp import C instead of from .sp.c import C).

Conclusion

This time we covered modules and packages—and it turned out they’re just thin, dictionary-like wrappers around Python files and directories. The latter tends to confuse people with its relative imports and -m invocations, but it’s really all about defining a context in which things are tied relative to each other, as all cohesive contexts should be. In the next chapter, we’ll take a deeper dive into the import machinery, and with that finish our thorough exploration of the language—at least as far as its syntax and semantics are concerned ;)