Python Development Setup for Data Scientists in 2022

Ryo Koyajima / 小矢島 諒
CodeX
Published in
6 min readJun 18, 2022

--

Photo by ian dooley on Unsplash

There are a lot of useful tools and libraries appearing in recent years. Some don't seem to be famous among data scientists, while engineers often use them. Thus, I want to introduce some tools to data scientists new to Python or software development. In this article, I will show my favorite Python development tools to do data science.

I intend to introduce data scientists who want to …

  • use both Mac and Windows (WSL)
  • deploy code to cloud services like Google Cloud Run
  • handle several projects simultaneously
  • manage environmental setting by Git

Table of Content

  • Visual Studio Code(vscode); free and useful editor
  • Peacock; color schema manager [Recommended]
  • Rainbow CSV; coloring CSV file
  • autoDocstring; document generator
  • pyenv; version manager
  • Poetry; powerful package manager [Recommended]
  • Black, Flake8, isort, and Mypy; formatter and linter

Visual Studio Code(vscode); free and useful editor

https://code.visualstudio.com/

Visual Studio Code(vscode) is one of the most famous editors.
Vscode is also for data scientists because we can use Jupyter Notebooks in vscode and Python files. You don't have to code in browsers anymore.

Jupyter Notebook in vscode (Image by author)

Peacock; color schema manager

Peacock is one of my favorite extensions in vscode.
You can change the color schema with Peacock by the following steps.

  • "Ctrl(Command) + Shift + P" in vscode
  • type "Peacock: Change to a Favorite Color"
  • select your favorite one

Of course, you can set up your color schema by typing "Peacock: Enter a Color" and inputting the hex code.

Select your favorite color (Image by author)

Advantages for data scientist:
When you work on several projects simultaneously, peacock is quite dependable.
It is because you distinguish the project by its looking so that you can prevent mix-up projects.
In addition, you can control the color schema with Git so you can use the same color with different computers.

You can distinguish the project you want to work on (Image by author)
You can control the color by Git (Image by author)

Rainbow CSV; coloring CSV file

https://marketplace.visualstudio.com/items?itemName=mechatroner.rainbow-csv

If you are a data scientist, you have a lot of chances to see CSV files. Rainbow CSV can colorize your CSVs in each column. Excel is a good tool for seeing CSV, but it takes much time to open the files. Try this extension if you want to see CSV at a glance.

Colorizing dataset (Image by author)

autoDocstring; document generator

https://marketplace.visualstudio.com/items?itemName=njpwerner.autodocstring

autoDocstring is a document generator that helps you to write maintainable code. Once you define the arguments and return values in your method, this extension generates the document template.

type double quotation three times, then the document will be generated (Image by author)

pyenv; version manager

https://github.com/pyenv/pyenv

pyenv is a famous version manager for Python. To install on Mac, you can use brew install pyenvcommand. If you are a Windows user, try the following commands.

git clone https://github.com/pyenv/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

Then install a specific version(e.g., 3.9.11) of Python.

pyenv install 3.9.11

I recommend that you designate the version in the working directory by this command.

pyenv local 3.9.11

You will find the file generated by the command so that you can control the Python version in Git.

pyenv generates version file (Image by author)

Poetry; a powerful package manager

Poetry is a Python library manager that can solve between libraries. Compared to pip, Poetry can manage libraries more smartly. This separates libraries into two types; one is the list you want to install, and the other is the list of whole libraries used by the former. (Just like npm module in Javascript)

For instance, if you install pandas with poetry, it is defined in the former file, and whole packages are described in the latter.

Former defines only pandas and Python itself (Image by author)
Latter describes all the packages that are used by pandas (Image by author)

These files are automatically updated when you install new packages. You don't need to do the pip freeze command anymore.

Moreover, Poetry can generate a virtual environment so that you can execute Python in an isolated environment. Therefore, you don't need to worry about unintended dependencies.

Here is a quick start to Poetry.

$ pip install poetry # install Poetry
$ poetry config virtualenvs.in-project true --local # generate venv in working directory
$ poetry init # initial settings of Poetry
$ poetry add pandas # install package e.g. pandas
$ poetry shell # launch virtual environment

If you've installed Poetry, don't forget to set Poetry's virtual environment as the default interpreter of your vscode.

select poetry virtual environment (Image by author)

Once you’ve set up poetry and control pyproject.toml , poetry.lock , and poetry.toml by Git, you can use and share with your teammate the same environment you’ve created.

Black, Flake8, isort, and Mypy; formatter and linter

These packages faster your coding and realize neat programs.

These are only used in a development environment so that you can install them with -D option.

poetry add -D black flake8 isort mypy

Then modify vscode settings via settings.json. You can enable the above linters and formatters explicitly.

"python.formatting.provider": "black",
"python.linting.flake8Enabled": true,
"[python]": {
"editor.codeActionsOnSave": {
"source.organizeImports": true
},
"python.linting.mypyEnabled": true,

Conclusion

I've introduced several valuable tools for data scientists to set up a Python environment. I uploaded sources in this repository(https://github.com/koyaaarr/python-setup).

I hope this article is helpful to you.

--

--