PyInstaller with Pandas — Problems, solutions, and workflow with code examples

Liron Soffer
4 min readAug 7, 2019

A few days ago, I had to automate a small python script I wrote so it can run as an application. This seemed to be a simple task but turned out to be trickier than I thought. The catch: it needed to run without an internet connection. This introduced me to the “dependency hell”: The annoying problem that happens when you send someone your code but for some reason, it doesn’t work at their machine but you have no idea why since he or she had installed everything so you start asking for God’s advice.

Why not use Docker?

Searching the web for a solution, I ran into a few methods to do so. One of them is Docker. I contemplated on whether to use PyInstaller or Docker, and decided against Docker from the following reasons:

  • It requires installation at the receiver, which I prefer to avoid.
  • It appears harder to learn, and I wanted to get the task done as fast as possible.
  • It’s overkill. Although setting up infrastructure for future needs is good practice, I couldn’t see the benefits of using it for my small project.

Now let’s get to business…

Important commands

Once you understand the whole process, using PyInstaller is as simple as running these two lines of code. I’ll explain them later. I place them here so you can scroll back to find them.

pyi-makespec — onefile my_script.py

pyinstaller — onefile my_script.spec

Note that I wanted to create one .exe file so I can double-click-and-run the file as it is. Since I could not rely on an internet connection, I used the “onefile” command. Although there are other options in the package, I haven’t investigated them.

Preparations

  1. Create a virtual environment
    Don’t skip this step! I can’t emphasize this more, it creates a clean environment and helps deal with the “dependency hell”.
    I used the ‘venv’ package with python 3, which you can find the official guide to in this link.
    You want to install all the packages you need for your project in your virtual environment. Since the 1.17.0 version of numpy caused me dependencies issues, I had to roll back to version 1.16.4, and I recommend you do the same. In case you encounter other problems with numpy, you might want to try an even older version.
  2. Change the ‘pandas-hook’ file
    Navigate to “<Project Path>\venv\Lib\site-packages\PyInstaller\hooks” and find the ‘pandas-hook.py’ file. If it does not exist — create one. You want the file to contain the following text:
from PyInstaller.utils.hooks import collect_submodules# Pandas keeps Python extensions loaded with dynamic imports here.
hiddenimports = collect_submodules('pandas._libs')

Creating the .spec file

This is one of the most important steps to understand. You only want to run this command once, since it creates the .spec file. You will want to edit this file. Every time you run this command you’ll get a new .spec file that overwrites the old .spec file, so remember to back up your changes.

pyi-makespec — onefile my_script.py

Edit the spec file

Open the .spec file and notice this part of it:

a = Analysis(['<my_script.py>'],
pathex=['<my_script_path>'],
binaries=[],
datas=[],
hiddenimports=[],
hookspath=[],
runtime_hooks=[],
excludes=[],
win_no_prefer_redirects=False,
win_private_assemblies=False,
cipher=block_cipher,
noarchive=False)

Notice the variable a, which is some object of ‘Analysis’ type. We only care about two attributes it receives; hiddenimports and datas.

Adding implicit imports to ‘hiddenimports’

PyInstaller doesn’t check for implicit imports, which are imports you use within other .py files in your project. Thus you have to make a list of all your implicit imports and assign it to ‘hiddenimports’, e.g.

hiddenimports=['pandas','matplotlib.pyplot','os','sys']

Adding your files to ‘datas’

To include files within the .exe one-click-file, you have to specify them as a list in the .spec file. The way to do so is by creating a list of tuples, each tuple corresponding to a different file.

First, create a list named ‘added_files’ before a and assign it to ‘datas’, i.g.

datas=added_files,

Now fill ‘added_files’ list with tuples of the form:

( ‘<your_file_path>’, ‘<your_desired_file_path>’ ),

The original <your_file_path> can be a path relative to your script’s path, or the file’s absolute path.
The path inside the executable file, a.k.a <your_desired_file_path>, must be relative to your scripts’ path.
If you wish to call your files from the same level as of your script, you can do it by writing:

( '<your_file_path>', '.' )

Finally, you should have a list of all your files:

added_files = 
[( ‘<your_file_path_1>’, ‘<your_desired_file_path_1>’ ),
( ‘<your_file_path_2>’, ‘<your_desired_file_path_2>’ ),
(‘<your_file_path_3>’, ‘<your_desired_file_path_3>’ )]

distutils patch

After going through the above steps, I ran into the following issue: ModuleNotFoundError: No module named 'distutils'

So frustrating!
Luckily, someone wrote a workaround for this issue.
Copy-paste the following lines of code to the beginning of your spec file:

# work-around for https://github.com/pyinstaller/pyinstaller/issues/4064import distutils
if distutils.distutils_path.endswith('__init__.py'):
distutils.distutils_path = os.path.dirname(distutils.distutils_path)

Compile your program using the spec file

You are now finally ready to compile your script using the .spec file.
From the command line, run the following command:

pyinstaller — onefile my_script.spec

And et voilà, you got new directories in your folder. You can now find your .exe file in the ‘dist’ folder and can share it with others.

Last note: In case you don’t see .exe file in the dist folder, you might find solutions online requiring you to edit the .spec file. Remember to delete the build folder before you re-run the above pyi-makespec command. Otherwise, it won’t include your changes.

This is my first ever technical post and I’m super excited to hear what you have to say. Please leave a comment and share your thoughts with me!

--

--