Python Development Setup for Data Scientists in 2022
There are a lot of useful tools and libraries appearing in recent years. Some don't seem to be famous among data scientists, while engineers often use them. Thus, I want to introduce some tools to data scientists new to Python or software development. In this article, I will show my favorite Python development tools to do data science.
I intend to introduce data scientists who want to …
- use both Mac and Windows (WSL)
- deploy code to cloud services like Google Cloud Run
- handle several projects simultaneously
- manage environmental setting by Git
Table of Content
- Visual Studio Code(vscode); free and useful editor
- Peacock; color schema manager [Recommended]
- Rainbow CSV; coloring CSV file
- autoDocstring; document generator
- pyenv; version manager
- Poetry; powerful package manager [Recommended]
- Black, Flake8, isort, and Mypy; formatter and linter
Visual Studio Code(vscode); free and useful editor
Visual Studio Code(vscode) is one of the most famous editors.
Vscode is also for data scientists because we can use Jupyter Notebooks in vscode and Python files. You don't have to code in browsers anymore.
Peacock; color schema manager
Peacock - Visual Studio Marketplace
Subtly change the color of your Visual Studio Code workspace. Ideal when you have multiple VS Code instances, use VS…
Peacock is one of my favorite extensions in vscode.
You can change the color schema with Peacock by the following steps.
- "Ctrl(Command) + Shift + P" in vscode
- type "Peacock: Change to a Favorite Color"
- select your favorite one
Of course, you can set up your color schema by typing "Peacock: Enter a Color" and inputting the hex code.
Advantages for data scientist:
When you work on several projects simultaneously, peacock is quite dependable.
It is because you distinguish the project by its looking so that you can prevent mix-up projects.
In addition, you can control the color schema with Git so you can use the same color with different computers.
Rainbow CSV; coloring CSV file
If you are a data scientist, you have a lot of chances to see CSV files. Rainbow CSV can colorize your CSVs in each column. Excel is a good tool for seeing CSV, but it takes much time to open the files. Try this extension if you want to see CSV at a glance.
autoDocstring; document generator
autoDocstring is a document generator that helps you to write maintainable code. Once you define the arguments and return values in your method, this extension generates the document template.
pyenv; version manager
pyenv is a famous version manager for Python. To install on Mac, you can use
brew install pyenvcommand. If you are a Windows user, try the following commands.
git clone https://github.com/pyenv/pyenv.git ~/.pyenv
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc
Then install a specific version(e.g., 3.9.11) of Python.
pyenv install 3.9.11
I recommend that you designate the version in the working directory by this command.
pyenv local 3.9.11
You will find the file generated by the command so that you can control the Python version in Git.
Poetry; a powerful package manager
GitHub - python-poetry/poetry: Python dependency management and packaging made easy.
Poetry helps you declare, manage and install dependencies of Python projects, ensuring you have the right stack…
For instance, if you install pandas with poetry, it is defined in the former file, and whole packages are described in the latter.
These files are automatically updated when you install new packages. You don't need to do the pip freeze command anymore.
Moreover, Poetry can generate a virtual environment so that you can execute Python in an isolated environment. Therefore, you don't need to worry about unintended dependencies.
Here is a quick start to Poetry.
$ pip install poetry # install Poetry
$ poetry config virtualenvs.in-project true --local # generate venv in working directory
$ poetry init # initial settings of Poetry
$ poetry add pandas # install package e.g. pandas
$ poetry shell # launch virtual environment
If you've installed Poetry, don't forget to set Poetry's virtual environment as the default interpreter of your vscode.
Once you’ve set up poetry and control
poetry.lock , and
poetry.toml by Git, you can use and share with your teammate the same environment you’ve created.
Black, Flake8, isort, and Mypy; formatter and linter
GitHub - psf/black: The uncompromising Python code formatter
"Any color you like." Black is the uncompromising Python code formatter. By using it, you agree to cede control over…
GitHub - PyCQA/flake8: flake8 is a python tool that glues together pycodestyle, pyflakes, mccabe…
flake8 is a python tool that glues together pycodestyle, pyflakes, mccabe, and third-party plugins to check the style…
GitHub - PyCQA/isort: A Python utility / library to sort imports.
Read Latest Documentation - Browse GitHub Code Repository isort your imports, so you don't have to. isort is a Python…
GitHub - python/mypy: Optional static typing for Python
We are always happy to answer questions! Here are some good places to ask them: If you're just getting started, the…
These packages faster your coding and realize neat programs.
These are only used in a development environment so that you can install them with
poetry add -D black flake8 isort mypy
Then modify vscode settings via settings.json. You can enable the above linters and formatters explicitly.
I've introduced several valuable tools for data scientists to set up a Python environment. I uploaded sources in this repository(https://github.com/koyaaarr/python-setup).
I hope this article is helpful to you.