Trial and Error: Package discovery with setup.py

Insight in Plain Sight
7 min readDec 3, 2022

--

Preface

This article is a accompanying guide to https://setuptools.pypa.io/en/latest/userguide/package_discovery.html. While the documentation is by no means bad, there are some pitfalls. This article gives some examples.

Our Goal

We start with the bare minimum to build a package, that is an empty setup()and slowly add functionality until we understand why project generators like pyscaffold do things in a certain way.

Setup

For our initial folder structure, we will use the src layout.

├── src
│ ├── package1
│ │ └── main.py
│ └── package2
│ └── main.py
setup.py

Install script

# clean up install
pip uninstall python_example --yes
rm -rf build dist

# Build package and install, we use wheel builds to keep it clean
python -m build
pip install ./dist/*.whl

# Show built files
unzip -l ./dist/*.whl

Bare-bones setup.py

setup(
name="python_example",
)

Surprisingly, the empty setup.py already builds the package in a proper way. It is not clear from the documentation what the default behavior is. You will be able to import the packages with import package1.main . Our job would be done here, but often you will find python packages in the wild with custom behavior and we need to understand how it works under the hood.

We will use the files inside of the wheel to evaluate how things are imported, because the folder structure is representative of how things are imported on the user side.

  Length      Date    Time    Name
--------- ---------- ----- ----
0 2022-12-02 15:12 package1/main.py
0 2022-12-02 15:12 package2/main.py
2182 2022-12-02 15:13 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:13 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:13 python_example-0.0.0.dist-info/WHEEL
18 2022-12-02 15:13 python_example-0.0.0.dist-info/top_level.txt
559 2022-12-02 15:13 python_example-0.0.0.dist-info/RECORD
--------- -------
2932 7 files

Attention: Do not test the import in the project folder, otherwise you will import the local files and not the installed files.

Manually adding a single package

setup(
name="python_example",
packages = ["src.package1"],
)

We can manually steer the package discovery, by passing packages to the packages argument. This way you can hand select, what ends up being delivered to the user, and choose to ignore packages. Since we use the src-layout, we need to add src to the name.

Length      Date    Time    Name
--------- ---------- ----- ----
0 2022-12-02 15:12 src/package1/main.py
2182 2022-12-02 15:14 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:14 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:14 python_example-0.0.0.dist-info/WHEEL
13 2022-12-02 15:14 python_example-0.0.0.dist-info/top_level.txt
493 2022-12-02 15:14 python_example-0.0.0.dist-info/RECORD
--------- -------
2861 6 files

The problem: The package can only be imported with import src.package1 . It is only available under the src namespace. But you actually want to import it like import package1 . package2 is not included and therefore cannot be imported (remember to execute python not in the project folder, otherwise you actually can import the local package2).

Edge case: Passing only src

setup(
name="python_example",
packages = ["src"],
)

We see that no src folder is created in the wheel. But, you still can import src . Why is that? There are no files named src that can be imported.

  Length      Date    Time    Name
--------- ---------- ----- ----
2182 2022-12-02 15:17 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:17 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:17 python_example-0.0.0.dist-info/WHEEL
4 2022-12-02 15:17 python_example-0.0.0.dist-info/top_level.txt
418 2022-12-02 15:17 python_example-0.0.0.dist-info/RECORD
--------- -------
2777 5 files

Turns out src is added to this file and therefore is available:

# In ./dist/python_example-0.0.0.dist-info/top_level.txt
src

Two subpackages

setup(
name="python_example",
packages = ["src.package1", "src.package2"],
)

As expected you will have both packages available. The problem with src.<package> remains.

  Length      Date    Time    Name
--------- ---------- ----- ----
25 2022-12-02 15:14 src/package1/main.py
25 2022-12-02 15:14 src/package2/main.py
2182 2022-12-02 15:25 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:25 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:25 python_example-0.0.0.dist-info/WHEEL
26 2022-12-02 15:25 python_example-0.0.0.dist-info/top_level.txt
569 2022-12-02 15:25 python_example-0.0.0.dist-info/RECORD
--------- -------
3000 7 files

package_dir — Unexpected behavior

The package_dir argument lets you rename your project packages, so they will appear under a new path/name on the user side. Following the examples, you might have tried this (at least I did it):

# This is NOT how to do it.
setup(
name="python_example",
packages = ["src.package1", "src.package2"],
package_dir={"src": "name"},
)

You will be surprised that name is nowhere to be found and your packages are still available under src . Also import name actually works, but import name.package1 does not.

  Length      Date    Time    Name
--------- ---------- ----- ----
25 2022-12-02 15:14 src/package1/main.py
25 2022-12-02 15:14 src/package2/main.py
2182 2022-12-02 15:38 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 15:38 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 15:38 python_example-0.0.0.dist-info/WHEEL
26 2022-12-02 15:38 python_example-0.0.0.dist-info/top_level.txt
569 2022-12-02 15:38 python_example-0.0.0.dist-info/RECORD
--------- -------
3000 7 files

The explanation is simple, change in package_dir are applied first, so the name.package1 is available during the execution of setup.py . But in the end packages dictate what is distributed and the old src.packages are still available.

Correct way of using package_dir

setup(
name="python_example",
package_dir={
"name": "src",
"name/package1": "src/package1",
"name/package2": "src/package2",
},
packages = ["name.package1", "name.package2"],
)

Learning from our previous mistake, we now successfully renamed our packages. So remember, what ever is in packages gets shipped.

Archive:  ./dist/python_example-0.0.0-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 name/package1/main.py
25 2022-12-02 15:14 name/package2/main.py
2182 2022-12-02 16:38 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 16:38 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 16:38 python_example-0.0.0.dist-info/WHEEL
5 2022-12-02 16:38 python_example-0.0.0.dist-info/top_level.txt
570 2022-12-02 16:38 python_example-0.0.0.dist-info/RECORD
--------- -------
2980 7 files

Edge case: A convoluted example

setup(
name="python_example",
package_dir={
"name": "src",
"name/package1": "src/package1",
"other_name/package2": "src/package2",
},
packages = ["src.package1", "name.package2"],
)

We can construct a convoluted example. Can you predict what is importable?

Archive:  ./dist/python_example-0.0.0-py3-none-any.whl
Length Date Time Name
--------- ---------- ----- ----
25 2022-12-02 15:14 name/package2/main.py
25 2022-12-02 15:14 src/package1/main.py
2182 2022-12-02 16:32 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 16:32 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 16:32 python_example-0.0.0.dist-info/WHEEL
9 2022-12-02 16:32 python_example-0.0.0.dist-info/top_level.txt
569 2022-12-02 16:32 python_example-0.0.0.dist-info/RECORD
--------- -------
2983 7 files

Available imports:

import src, name, src.package1, name.package2

Not available:

import other_name, src.package2, name.package1

package_dir — remove src from import

setup(
name="python_example",
package_dir={
"package1": "src/package1",
"package2": "src/package2",
},
packages = ["package1", "package2"],
)

Now that we understand the details of package_dir , we can finally use it in a practical manner. The examples with name.package were not practical. What we actually want to do was to remove the src from the import.

Length      Date    Time    Name
--------- ---------- ----- ----
25 2022-12-02 15:14 package1/main.py
25 2022-12-02 15:14 package2/main.py
2182 2022-12-02 16:42 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 16:42 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 16:42 python_example-0.0.0.dist-info/WHEEL
18 2022-12-02 16:42 python_example-0.0.0.dist-info/top_level.txt
561 2022-12-02 16:42 python_example-0.0.0.dist-info/RECORD
--------- -------
2984 7 files

One remaining problem is that we need to do it manually. This is where the helper functions of setuptools come in.

Canonical way

setup(
name="python_example",
package_dir={"": "src"},
packages = setuptools.find_packages(
where='src',
),
)

We can use the special syntax {"": "src"} to remove src from all the names. Additionally, we use find_packages() to automatically find packages.

But it fails …

  Length      Date    Time    Name
--------- ---------- ----- ----
2182 2022-12-02 17:00 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 17:00 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 17:00 python_example-0.0.0.dist-info/WHEEL
1 2022-12-02 17:00 python_example-0.0.0.dist-info/top_level.txt
418 2022-12-02 17:00 python_example-0.0.0.dist-info/RECORD
--------- -------
2774 5 files

One detail is that find_packages() only find packages with a __init__.py . There are two options here. One is to add the files:

├── src
│ ├── package1
| │ ├──__init__.py
│ │ └── main.py
│ └── package2
| │ ├──__init__.py
│ | └── main.py

Or we use find_namespace_packages() which accepts packages without a __init__.py . This also explains the confusion of whether to add __init__.py to packages. Depending on your package discovery settings it might or might not matter. In my opinion, the second option is more elegant, which is also the preferred setting of pyscaffold .

# __init__.py not required
setup(
name="python_example",
package_dir={"": "src"},
packages=find_namespace_packages(where='src'),
)

The result is this wheel structure:

  Length      Date    Time    Name
--------- ---------- ----- ----
0 2022-12-02 17:02 package1/__init__.py
25 2022-12-02 15:14 package1/main.py
0 2022-12-02 17:02 package2/__init__.py
25 2022-12-02 15:14 package2/main.py
2182 2022-12-02 17:04 python_example-0.0.0.dist-info/LICENSE
81 2022-12-02 17:04 python_example-0.0.0.dist-info/METADATA
92 2022-12-02 17:04 python_example-0.0.0.dist-info/WHEEL
18 2022-12-02 17:04 python_example-0.0.0.dist-info/top_level.txt
709 2022-12-02 17:04 python_example-0.0.0.dist-info/RECORD
--------- -------
3132 9 files

setup.cfg from pyscaffold

The Python community is moving away from setup.py and is to setup.cfg and pyproject.toml (for more info check PEP 517, 518). We can now check how pyscaffold does packaging, which can be considered as industry standard.

# setup.py
setup(
name="python_example"
...
)
# setup.cfg
[options]
zip_safe = False
packages = find_namespace:
include_package_data = True
package_dir =
=src

# Function arguments passed like this
[options.packages.find]
where = src
exclude =
tests

For every option in setup.py there is an equivalent API in setup.cfg . While arguments are mapped one to one, for functions there is a special syntax, that might not be intuitive at first glance.

find_packages() is mapped to find:and find_namespace_packages() is mapped to find_namespace: . Arguments for these functions are passed in a separate [options.packages.find] section.

Conclusion

We have examined various examples for package discovery. While it is good to know how to do it yourself, which helps you debug packages in the wild, I would not recommend creating these files yourself.

The best way for package is to use project generators like Cookiecutter, poetry or pyscaffold and follow a common project structures like src-layout.

Resources

--

--