ADVANCED PYTHON PROGRAMMING
International Trade
This story is all about imports, modules and packages: the underlying structure of all things Python.
Finally, we’ve arrived at modules and packages: the last digivolve of Python. At first, it might surprise you to hear there’s a lot to say about them: after all, in C++, #include
is just a preprocessor directive that literally copies over the header, and in Java, import
is just a hint on how to link the bytecode. But in Python, like always, a module is an object—and as always, it opens up a new horizon of possibilities.
Importing Goods
Let’s start with the basics: import
. Already, there are quite a few different objects it returns:
>>> import sys
>>> sys
<module 'sys' (built-in)>>>> import os
>>> os
<module 'os' from '.../os.py'>>>> import math
>>> math
<module 'math' from '.../lib-dynload/math.cpython-36m-darwin.so'>>>> import django
>>> django
<module 'django' from '.../site-packages/django/__init__.py'>
Some modules, like sys
, are built-in—meaning it’s actually part of the interpreter, only exposed as a module (much like the Linux ProcFS, /proc
, isn’t really a filesystem, but exposes the kernel’s interface as such).
Others, like os
, are what you’re probably used to: Python files that define a bunch of stuff. They must be located in specific directories (the current working directory, the standard library’s root, and a few others), so Python can find them and wrap them up in a module object, through which you can access said stuff.
Yet others, like math
, are not really Python files, but rather C extensions with a thin Python wrapper. Like their name suggests, they’re implemented in C, either for performance purposes (you really want your math to be as fast as possible), or because this functionality was already available in C, and there’d be no point rewriting it in Python if we could simply wrap it instead.
Finally, all third-party packages (stuff you install from PyPI with pip
, or even packages you download manually and install with setup.py install
) go in the special site-packages
directory. So, as you can see, there are a lot of different options; and if that’s not enough, you can also do:
>>> from datetime import datetime
>>> datetime
<class 'datetime.datetime'>
Which would only import specific objects from the module; this is especially useful for modules that contain an object with a similar name, like datetime
, to avoid writing the tedious datetime.datetime
. Alternatively, you could:
>>> import datetime as dt
>>> dt.datetime
<class 'datetime.datetime'>
Which would import the datetime
module, but bind it to the dt
name, so that on one hand you don’t pollute your global namespace with multiple objects of unknown origins, and don’t have to type so much on the other.
That concept importing stuff from
a module is generally discouraged, seeing as it introduces a lot of confusion about what came whence. I once saw code that did this:
join(x, y)
And had no idea what it meant, until I saw from os.path import join
at the top of the file. And that’s horrible: my first thought was that it’s a local function, or maybe something that concatenates strings, or joins tables in a database; os.path.join(x, y)
would’ve been so much clearer. However, when you have a really long import that you use often, you don’t want to be writing a.b.c.d.e
every time, and from a.b.c.d import e
is acceptable. Note that you can also use as
when importing from
, so if that e
has an overly generic name like join
, you could do from a.b.c.d import join as a_join
.
Domestic Modules
But what is a module? And can we make one ourselves? Let’s investigate: we have a file, m.py
:
def hello():
print('Hello, world!')
And we do:
>>> import m
>>> m
<module 'm'>
>>> m.hello()
Hello, world!
In truth, like any object, it has an ID, a type, and a value, encapsulated in a __dict__
(which is effectively its global scope):
>>> id(m)
4455012048
>>> m.__class__
<class 'module'>
>>> m.__dict__
{..., 'hello': <function hello at 0x...>}
Where ...
is all the built-in functions you can expect in a global scope, like len
and ValueError
. It even has a few interesting attributes:
>>> m.__name__
'm'
>>> m.__file__
'/home/user/m.py'
The name is the module’s file name (without the .py
), and the file is its full path. Do note that the name stays the same, no matter which variable you bind it to:
>>> import m as x
>>> x.__name__
'm'
Because while x
is a mapping in your scope, 'm'
is the inherent name of the module; and if you rebind it to y
, it’s not going to change. Anyway, this presents an interesting side quest:
if __name__ == ‘__main__’:
First, some background: in Python, any module doubles as a library to import, and as a script to execute. Scripts often print stuff, or have other side effects—and that’s rather inconvenient when importing a library. After all, an import is pretty equivalent to execution: the module’s code is evaluated line by line into a namespace; and either that’s that, or that namespace is wrapped in a module object and made available to some other code. Of course, when importing stuff, we wouldn’t want them to run immediately—the point is to integrate them into our code and invoke them on demand; so, it’d be useful to have a way to tell the two cases apart.
The way to do it is if __name__ == '__main__'
. It seems kinda hacky, and it’s unfortunate that this is the right (and only) way to go. The idea is that, when a module is imported as a file, its global variable __name__
is assigned its file’s name (the same as m.__name__
attribute we saw before—the module object simply reflect its global scope, which binds names to values, remember?). When a module is executed as a script, however, Python assigns it the special name '__main__'
, to indicate its eminence. Let’s take this m.py
:
print(__name__)
Then import it:
>>> import m
m
Then execute it as a script:
$ python m.py
__main__
So all we need to do is check this __name__
, and isolate a block of code, so that it only executes if the module is invoked as a script. For example, if m.py
would look like this:
def hello():
print('Hello, world!')if __name__ == '__main__':
hello()
We could import it, and nothing would happen until we want it to:
>>> import m
>>> m.hello()
Hello, world!
And if we were to execute it, we’d get the desired effect:
$ python m.py
Hello, world!
It’s not recommended to make the if __name__ == '__main__'
code block too big, so usually you’d see this at the bottom of the file:
if __name__ == '__main__':
main()
With the main
function encapsulating the script’s logic. If we’d have command-line arguments, we could even do this:
if __name__ == '__main__':
import sys
sys.exit(main(sys.argv))
So that we don’t hardcode the access to sys.argv
in our main
function, and in the unlikely (but still possible) event that some other module decides to invoke us “as a script”, but programmatically—it can do so with:
>>> import m
>>> m.main([...])
Big Package
If all your code fits in one file, that’s great; if said file starts spanning several thousand lines, it’s less so. Instead, you could split your code into multiple files, and arrange them in such a way that everything would be exposed through a module-like object called a package.
Essentially, where a module is a file with code—a package is a directory with code; and since we can’t put code directly in a directory, we put it in a file with the special name __init__.py
. So this:
# m.py
def hello():
print('Hello, world!')# Later that day...
>>> import m
>>> m.hello()
Hello, world!
Is equivalent to this:
# p/__init__.py
def hello():
print('Hello, world!')# Later that day...
>>> import p
>>> p.hello()
Hello, world!
Of course, if you’re just going to put all your code in p/__init__.py
, you might as well put it in p.py
and stop being weird. But if you actually break that code across multiple files, e.g.:
# p/__init__.py
# Empty file, necessary to mark the directory as a package.# p/foo.py
def foo():
return 'foo'# p/bar.py
def bar():
return 'bar'
You end up with:
>>> import p
>>> p.foo()
Traceback (most recent call last):
...
AttributeError: module 'p' has no attribute 'foo'
>>> p.bar()
Traceback (most recent call last):
...
AttributeError: module 'p' has no attribute 'bar'
Well, that’s anti-climactic. Sorry. It turns out you have to import the package’s so-called submodules explicitly:
>>> import p.foo
>>> p.foo.foo()
'foo'
>>> import p.bar
>>> p.bar.bar()
'bar'
And while it does let you organize your code better, it’s a bit tedious. In fact, if that’s all you want to do, you can even omit the __init__.py
file, and if that directory is on your import path, Python will figure out what you mean, and treat import p.foo
as “import foo
from the p
directory”.
These __init__.py
-less packages are called namespace packages, because they provide nothing more than “namespacing”—organizing your code over different scopes, if you will. But then, what are regular packages for?
Relativity Theory
Submodules grouped under the same package can import each other relatively, pretty much like the “current working directory” lets you refer to other files (usually in its vicinity) using succinct, relative paths. Our syntax is a little different:
# foobar.py
from .foo import foo
from .bar import bardef foobar():
return foo() + bar()
The .
indicates it’s a relative import; and a single one means that it’s a sibling submodule, located right next to this one. If we’d have a more complicated hierarchy, like so:
p/
__init__.py # defines x = 1
a.py # defines class A
b.py # defines class B
sp/
__init__.py # defines y = 2
c.py # defines class C
(As an aside, sp
is called a subpackage, because while it’s nested in p
, it’s a proper package with an __init__.py
file in and of itself). Anyway:
- From
a
, we’d be able to dofrom . import b
and use it asb.B
—orfrom .b import B
and use it asB
. - Also from
a
, we’d be able to dofrom . import sp.c
and usesp.c.C
, orfrom .sp import c
and usec.C
, orfrom .sp.c import C
and useC
. - From
c
, we’d be able to dofrom .. import a
and use it asa.A
, orfrom ..a import A
and use it asA
. - Anything defined in
__init__.py
files is also accessible to the rest of the submodules and subpackages.
Froma
, we could dofrom . import x
andfrom .sp import y
, while fromc
we could dofrom .. import x
andfrom . import y
. This looks a little weird at first, but makes sense once you remember that.
represents the directory—i.e. the package—and since it can’t contain actual code, it uses the__init__.py
file as a surrogate.
Having said that, I personally don’t like putting too much code in__init__.py
files, because I find it counterintuitive to look there. Some people use it for “common utilities”, which could simply be put inutils.py
withimport _ from .utils
; and some people put a whole bunch of logic there, in which case I really don’t understand why not use a module instead. So what should you put in__init__.py
files, then?
Beautiful Wrapping Paper
One answer is “nothing”: and it’s a good one. __init__.py
files are there to mark the directory as a package and make relative imports work, so just drop it there, forget about it, and go about your day.
Another answer is “the public API”. Arranging code in multiple files has the undesirable side-effect that clients have to be familiar with its structure in order to import the components they need; what if we’d abstract it away by “hoisting” any “public” components into the package’s __init__.py
, thus exposing everything from its “root”? Going back to our first example…
# p/__init__.py
from .foo import foo
from .bar import bar# p/foo.py
def foo():
return 'foo'# p/bar.py
def bar():
return 'bar'# Later that day...
>>> import p
>>> p.foo()
'foo'
>>> p.bar()
'bar'
Nice, no? This can even be applied recursively to subpackages, so that sp/__init__.py
has something along the lines of from .c import C
, and whenever a.py
or b.py
need this class they can from .sp import C
without caring about the internal structure of this subpackage.
Some people even go as far as reiterating said public interface in the __all__
list of strings, like so:
# p/__init__.py
from .foo import foo
from .bar import bar__all__ = ['foo', 'bar']
This __all__
variable is used when you import * from
and by some documentation auto-generation tools, so you can take it or leave it.
-m works in mysterious ways
When people just start working with packages, one of their biggest frustrations is that any module with a relative import seems impossible to run. Take this, for example:
# p/__init__.py
from .foo import foo
from .foobar import foobar# p/foo.py
def foo():
return 'foo'if __name__ == '__main__':
print(foo())# p/foobar.py
from .foo import foodef foobar():
return foo() + 'bar'if __name__ == '__main__':
print(foobar())
When you try to run foo.py
, it works:
$ python p/foo.py
foo
But not so for foobar.py
:
$ python p/foobar.py
Traceback (most recent call last):
...
ImportError: attempted relative import with no known parent package
It’s a pretty cryptic message, which is a shame, because executing packages and submodules is actually not that hard. The thing you need to understand is that Python, as usual, is immensely dynamic—so when it attempts a relative import, the first thing it does is figure out “what package am I in”, similarly to a relative’s path resolution based on “what is the current working directory”. Python does so based on the __package__
variable, which is defined automatically when something is imported or executed as a package. To wit:
# p/__init__.py
# Empty file# p/a.py
print(__package__)
If I were to import a
through its package, Python would wire everything properly, and we’d get:
>>> import p.a
p
But if I’d execute p/a.py
as a script, Python wouldn’t differentiate it from a non-package situation, in which a.py
just so happens to reside in p/
; it’d simply go into that directory and run a.py
outside of any package context:
$ python p/a.py
None
To communicate this context to Python, we’d have to invoke the module by its name, using the -m
option. You can actually do it with regular modules:
$ python hello.py
Hello, world!
$ python -m hello
Hello, world!
But it makes no difference. With submodules, i.e. modules that reside inside packages, the difference is exactly that package context, which is inferred by Python if only you specify the module’s fully-qualified name.
$ python -m p.a
p
Similarly, if we’d like to execute p/foobar.py
, and would also like its relative imports to work, the right way to do it would be:
$ python -m p.foobar
foobar
Executing Packages
What happens if we run python -m p
? If you think about it, this question is similar to “what happens if we import p
”: it boils down to the question “where is that directory’s code stored”. And just like its “initialization” code, intended to collect and expose its public API, is in __init__.py
; so is its “main” logic, intended to expose its functionality via a command-line interface, is ina file with the special name __main__.py
. For example:
# p/__main__.pyfrom . import foo, bardef main(argv):
if len(argv) != 2:
print(f'USAGE: python -m {__package__} <foo|bar>')
return 1
command = argv[1]
if command == 'foo':
foo()
elif command == 'bar':
bar()
else:
print(f'ERROR: invalid command: {command}')
return 1if __name__ == '__main__':
import sys
sys.exit(main(sys.argv))
And now we can use it from our terminal, like so:
$ python -m p
USAGE: python -m p <foo|bar>
$ python -m p foo
foo
$ python -m p bar
bar
$ python -m p hello
ERROR: invalid command: hello
Note that I did a from . import foo, bar
rather than from .foo import foo
and from .bar import bar
; the reason is that I like to keep my CLI separate from the rest of the package, and work with it only through its public API. This is not a strict requirement, but it helps with keeping the business logic in the package separate from “the scripting stuff” a CLI is for.
Only Siths Deal with Absolutes
If you’ve read Google’s Python style guide, you might be thinking relative imports are bad, and use absolute imports instead, even when writing a package of your own:
# p/__init__.py
# Nothing# p/foo.py
def foo():
return 'foo'# p/bar.py
def bar():
return 'bar'# p/foobar.py
from p.foo import foo # absolute import
from p.bar import bar # absolute importdef foobar():
return foo() + bar()
This started in Python 2.7, when the import mechanism was much messier. Whenever a module a
imported a module b
, Python would look for it in the current working directory, in the standard directories, but also in a
’s directory, in case it’s kind of a “relative” import.
This made everything really confusing, because things would behave differently depending on the working directory they were invoked from, and caused people to develop a strong distaste for this relativism. Instead, they said, it’s better to specify the fully-qualified name; so if p
is importable, as it should be if we’re importing it, then p.foo
and p.bar
are as well, and unequivocally point to the same thing.
However, a lot of water has passed under that bridge, and we wouldn’t have relative imports as a language feature if it weren’t useful to, well, import things relatively in some cases; like when they’re part of the same package, and shouldn’t depend on its name (e.g. changing p
to q
breaks all absolute imports, while relative imports like from .
work fine); or shouldn’t depend on its exact structure (e.g. from .sp import C
instead of from .sp.c import C
).
Conclusion
This time we covered modules and packages—and it turned out they’re just thin, dictionary-like wrappers around Python files and directories. The latter tends to confuse people with its relative imports and -m
invocations, but it’s really all about defining a context in which things are tied relative to each other, as all cohesive contexts should be. In the next chapter, we’ll take a deeper dive into the import machinery, and with that finish our thorough exploration of the language—at least as far as its syntax and semantics are concerned ;)
The Advanced Python Programming series includes the following articles:
- A Value by Any Other Name
- To Be, or Not to Be
- Loopin’ Around
- Functions at Last
- To Functions, and Beyond!
- Function Internals 1
- Function Internals 2
- Next Generation
- Objects — Objects Everywhere
- Objects Incarnate
- Meddling with Primal Forces
- Descriptors Aplenty
- Death and Taxes
- Metaphysics
- The Ones that Got Away
- International Trade