Trial and Error: Package discovery with setup.py
Preface
This article is a accompanying guide to https://setuptools.pypa.io/en/latest/userguide/package_discovery.html. While the documentation is by no means bad, there are some pitfalls. This article gives some examples.
Our Goal
We start with the bare minimum to build a package, that is an empty setup()
and slowly add functionality until we understand why project generators like pyscaffold
do things in a certain way.
Setup
For our initial folder structure, we will use the src layout.
├── src
│ ├── package1
│ │ └── main.py
│ └── package2
│ └── main.py
setup.py
Install script
# clean up install
pip uninstall python_example --yes
rm -rf build dist
# Build package and install, we use wheel builds to keep it clean
python -m build
pip install ./dist/*.whl
# Show built files
unzip -l ./dist/*.whl
Bare-bones setup.py
setup(
name="python_example",
)
Surprisingly, the empty setup.py already builds the package in a proper way. It is not clear from the documentation what the default behavior is. You will be able to import the packages with import package1.main
. Our job would be done here, but often you will find python packages in the wild with custom behavior and we need to understand how it works under the hood.
We will use the files inside of the wheel to evaluate how things are imported, because the folder structure is representative of how things are imported on the user side.
Length Date Time Name
--------- ---------- ----- ----
0 2022-12-02 15:12 package1/main.py
0 2022-12-02 15:12 package2/main.py
2182 2022-12-02 15:13 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:13 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:13 python_example-0.0.0.dist-info/WHEEL
18 2022-12-02 15:13 python_example-0.0.0.dist-info/top_level.txt
559 2022-12-02 15:13 python_example-0.0.0.dist-info/RECORD
--------- -------
2932 7 files
Attention: Do not test the import in the project folder, otherwise you will import the local files and not the installed files.
Manually adding a single package
setup(
name="python_example",
packages = ["src.package1"],
)
We can manually steer the package discovery, by passing packages to the packages
argument. This way you can hand select, what ends up being delivered to the user, and choose to ignore packages. Since we use the src-layout, we need to add src
to the name.
Length Date Time Name
--------- ---------- ----- ----
0 2022-12-02 15:12 src/package1/main.py
2182 2022-12-02 15:14 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:14 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:14 python_example-0.0.0.dist-info/WHEEL
13 2022-12-02 15:14 python_example-0.0.0.dist-info/top_level.txt
493 2022-12-02 15:14 python_example-0.0.0.dist-info/RECORD
--------- -------
2861 6 files
The problem: The package can only be imported with import src.package1
. It is only available under the src
namespace. But you actually want to import it like import package1
. package2
is not included and therefore cannot be imported (remember to execute python not in the project folder, otherwise you actually can import the local package2).
Edge case: Passing only src
setup(
name="python_example",
packages = ["src"],
)
We see that no src
folder is created in the wheel. But, you still can import src
. Why is that? There are no files named src that can be imported.
Length Date Time Name
--------- ---------- ----- ----
2182 2022-12-02 15:17 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:17 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:17 python_example-0.0.0.dist-info/WHEEL
4 2022-12-02 15:17 python_example-0.0.0.dist-info/top_level.txt
418 2022-12-02 15:17 python_example-0.0.0.dist-info/RECORD
--------- -------
2777 5 files
Turns out src
is added to this file and therefore is available:
# In ./dist/python_example-0.0.0.dist-info/top_level.txt
src
Two subpackages
setup(
name="python_example",
packages = ["src.package1", "src.package2"],
)
As expected you will have both packages available. The problem with src.<package>
remains.
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 src/package1/main.py
25 2022-12-02 15:14 src/package2/main.py
2182 2022-12-02 15:25 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:25 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:25 python_example-0.0.0.dist-info/WHEEL
26 2022-12-02 15:25 python_example-0.0.0.dist-info/top_level.txt
569 2022-12-02 15:25 python_example-0.0.0.dist-info/RECORD
--------- -------
3000 7 files
package_dir — Unexpected behavior
The package_dir
argument lets you rename your project packages, so they will appear under a new path/name on the user side. Following the examples, you might have tried this (at least I did it):
# This is NOT how to do it.
setup(
name="python_example",
packages = ["src.package1", "src.package2"],
package_dir={"src": "name"},
)
You will be surprised that name
is nowhere to be found and your packages are still available under src
. Also import name
actually works, but import name.package1
does not.
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 src/package1/main.py
25 2022-12-02 15:14 src/package2/main.py
2182 2022-12-02 15:38 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:38 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:38 python_example-0.0.0.dist-info/WHEEL
26 2022-12-02 15:38 python_example-0.0.0.dist-info/top_level.txt
569 2022-12-02 15:38 python_example-0.0.0.dist-info/RECORD
--------- -------
3000 7 files
The explanation is simple, change in package_dir
are applied first, so the name.package1
is available during the execution of setup.py
. But in the end packages
dictate what is distributed and the old src.packages
are still available.
Correct way of using package_dir
setup(
name="python_example",
package_dir={
"name": "src",
"name/package1": "src/package1",
"name/package2": "src/package2",
},
packages = ["name.package1", "name.package2"],
)
Learning from our previous mistake, we now successfully renamed our packages. So remember, what ever is in packages
gets shipped.
Archive: ./dist/python_example-0.0.0-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 name/package1/main.py
25 2022-12-02 15:14 name/package2/main.py
2182 2022-12-02 16:38 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 16:38 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 16:38 python_example-0.0.0.dist-info/WHEEL
5 2022-12-02 16:38 python_example-0.0.0.dist-info/top_level.txt
570 2022-12-02 16:38 python_example-0.0.0.dist-info/RECORD
--------- -------
2980 7 files
Edge case: A convoluted example
setup(
name="python_example",
package_dir={
"name": "src",
"name/package1": "src/package1",
"other_name/package2": "src/package2",
},
packages = ["src.package1", "name.package2"],
)
We can construct a convoluted example. Can you predict what is importable?
Archive: ./dist/python_example-0.0.0-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 name/package2/main.py
25 2022-12-02 15:14 src/package1/main.py
2182 2022-12-02 16:32 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 16:32 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 16:32 python_example-0.0.0.dist-info/WHEEL
9 2022-12-02 16:32 python_example-0.0.0.dist-info/top_level.txt
569 2022-12-02 16:32 python_example-0.0.0.dist-info/RECORD
--------- -------
2983 7 files
Available imports:
import src, name, src.package1, name.package2
Not available:
import other_name, src.package2, name.package1
package_dir — remove src from import
setup(
name="python_example",
package_dir={
"package1": "src/package1",
"package2": "src/package2",
},
packages = ["package1", "package2"],
)
Now that we understand the details of package_dir
, we can finally use it in a practical manner. The examples with name.package
were not practical. What we actually want to do was to remove the src
from the import.
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 package1/main.py
25 2022-12-02 15:14 package2/main.py
2182 2022-12-02 16:42 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 16:42 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 16:42 python_example-0.0.0.dist-info/WHEEL
18 2022-12-02 16:42 python_example-0.0.0.dist-info/top_level.txt
561 2022-12-02 16:42 python_example-0.0.0.dist-info/RECORD
--------- -------
2984 7 files
One remaining problem is that we need to do it manually. This is where the helper functions of setuptools
come in.
Canonical way
setup(
name="python_example",
package_dir={"": "src"},
packages = setuptools.find_packages(
where='src',
),
)
We can use the special syntax {"": "src"}
to remove src
from all the names. Additionally, we use find_packages()
to automatically find packages.
But it fails …
Length Date Time Name
--------- ---------- ----- ----
2182 2022-12-02 17:00 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 17:00 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 17:00 python_example-0.0.0.dist-info/WHEEL
1 2022-12-02 17:00 python_example-0.0.0.dist-info/top_level.txt
418 2022-12-02 17:00 python_example-0.0.0.dist-info/RECORD
--------- -------
2774 5 files
One detail is that find_packages()
only find packages with a __init__.py
. There are two options here. One is to add the files:
├── src
│ ├── package1
| │ ├──__init__.py
│ │ └── main.py
│ └── package2
| │ ├──__init__.py
│ | └── main.py
Or we use find_namespace_packages()
which accepts packages without a __init__.py
. This also explains the confusion of whether to add __init__.py
to packages. Depending on your package discovery settings it might or might not matter. In my opinion, the second option is more elegant, which is also the preferred setting of pyscaffold
.
# __init__.py not required
setup(
name="python_example",
package_dir={"": "src"},
packages=find_namespace_packages(where='src'),
)
The result is this wheel structure:
Length Date Time Name
--------- ---------- ----- ----
0 2022-12-02 17:02 package1/__init__.py
25 2022-12-02 15:14 package1/main.py
0 2022-12-02 17:02 package2/__init__.py
25 2022-12-02 15:14 package2/main.py
2182 2022-12-02 17:04 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 17:04 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 17:04 python_example-0.0.0.dist-info/WHEEL
18 2022-12-02 17:04 python_example-0.0.0.dist-info/top_level.txt
709 2022-12-02 17:04 python_example-0.0.0.dist-info/RECORD
--------- -------
3132 9 files
setup.cfg from pyscaffold
The Python community is moving away from setup.py
and is to setup.cfg
and pyproject.toml
(for more info check PEP 517, 518). We can now check how pyscaffold
does packaging, which can be considered as industry standard.
# setup.py
setup(
name="python_example"
...
)
# setup.cfg
[options]
zip_safe = False
packages = find_namespace:
include_package_data = True
package_dir =
=src
# Function arguments passed like this
[options.packages.find]
where = src
exclude =
tests
For every option in setup.py
there is an equivalent API in setup.cfg
. While arguments are mapped one to one, for functions there is a special syntax, that might not be intuitive at first glance.
find_packages()
is mapped to find:
and find_namespace_packages()
is mapped to find_namespace:
. Arguments for these functions are passed in a separate [options.packages.find]
section.
Conclusion
We have examined various examples for package discovery. While it is good to know how to do it yourself, which helps you debug packages in the wild, I would not recommend creating these files yourself.
The best way for package is to use project generators like Cookiecutter, poetry or pyscaffold and follow a common project structures like src-layout.