Virtual environment provided venv
module at its core is a file path structure. From official doc of venv:
“…supports creating lightweight “virtual environments”, each with their own independent set of Python packages installed in their site
directories. A virtual environment is created on top of an existing Python installation, known as the virtual environment’s “base” Python”
So what does that mean ?
When you create a virtual environment using the venv
module in Python, it creates a directory with the following internal file structure:
myenv/
├── bin/
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── easy_install
│ ├── easy_install-3.8
│ ├── pip
│ ├── pip3
│ ├── pip3.8
│ ├── python -> /usr/bin/python3.8
│ └── python3 -> python
├── include/
│ └── python3.8/…
└── lib/
└── python3.8/
├── site-packages/…
└── venv/
bin/
: This directory contains executable scripts that are used to activate and deactivate the virtual environment, as well as install packages usingpip
. Theactivate
script is used to activate the virtual environment, and thedeactivate
script is used to return to the global Python environment.include/
: This directory contains header files used by the Python interpreter and compiled extension modules.lib/
: This directory contains the Python standard library and any third-party packages that have been installed usingpip
. Thesite-packages
subdirectory contains all packages installed within the virtual environment, while thevenv
subdirectory contains files and scripts used by thevenv
module itself.
The bin/activate
script sets the PYTHONPATH
environment variable to include the bin/
directory of the virtual environment, so that any packages or scripts installed within the virtual environment will take precedence over those installed globally on the system. When the virtual environment is activated, the python
executable in the bin/
directory is used to run Python scripts, and the pip
executable is used to install and manage packages within the environment.
Overall, the internal file structure of a virtual environment created using the venv
module is designed to be self-contained and isolated from the global Python environment, so that you can manage packages and dependencies separately for each project you work on.
There are 5 major benefits of using virtula env. And the first advantage is isolation.
Isolation
Virtual environments in Python are used to create an isolated environment for a specific Python project. This is particularly useful when you have multiple Python projects with different dependencies that might conflict with each other.
For example, my old flask application requires older version of nvidia-nccl packages where as new version pulled in a new dependency: torchvision.
$ python3 -m pip list
...
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.61
nvidia-nccl-cu11 2.14.3
...
$ python3 -m pip install torchvision
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-runtime-cu11 11.7.99
nvidia-nccl-cu11 2.14.3
torchvision 0.15.1
...
Since one python environment ran can only have singleversion of package. The newer dependency will overwrite the old package. Making you only able to work on one project.
When you create a virtual environment, you are essentially creating a new Python environment with its own set of installed packages and dependencies. Any packages or modules that you install within the virtual environment are isolated from the global Python environment, which means that you can have different versions of the same package installed in different virtual environments without any conflicts.
To create a virtual environment in Python, you can use the built-in venv
module, which is included with Python 3.3 and later versions. Here's how you can create a virtual environment:
- Open your terminal or command prompt and navigate to the directory where you want to create the virtual environment.
- Run the following command:
python3 -m venv myenv
(replacemyenv
with the name you want to give to your virtual environment). - This will create a new directory called
myenv
in your current directory, which contains the virtual environment.
To activate the virtual environment, run the following command:
- On Windows:
myenv\Scripts\activate.bat
- On Unix or Linux:
source myenv/bin/activate
Once the virtual environment is activated, you can install packages using pip
just like you would in the global Python environment. Any packages you install will be installed within the virtual environment and will not affect the global Python environment.
To deactivate the virtual environment, simply type deactivate
in your terminal.
Reproducibility
Let’s say you’re working on a project that requires the NumPy package. You want to ensure that the project always uses the same version of NumPy, even if you upgrade or reinstall Python on your system. By creating a virtual environment and installing NumPy within that environment, you can ensure that the project always uses the exact same version of NumPy, regardless of changes to your global Python environment.
# create a new virtual environment
python3 -m venv myenv
# activate the virtual environment
source myenv/bin/activate
# install the required packages
pip install numpy
# save the list of installed packages to a requirements file
pip freeze > requirements.txt
# deactivate the virtual environment
deactivate
This creates a new virtual environment called myenv
, installs NumPy within that environment, and saves the list of installed packages to a requirements.txt
file. This requirements.txt
file can then be used to recreate the exact same environment on another machine or at a later time.
Dependency management
Similar to first advantage in isolation, virtual environment reduce risk of running into conflict dependency. Let’s say you’re working on a project that requires several packages, including NumPy, Pandas, and Matplotlib. You want to ensure that all of these packages are installed and available to your project. By creating a virtual environment and using pip to install each of these packages within the environment, you can easily manage the dependencies for your project and ensure that all required packages are installed and available.
Testing
You want to ensure that your tests are not affected by other packages or dependencies installed on your system. By creating a separate virtual environment for testing and running your tests within that environment, you can ensure that your tests are isolated from other packages and dependencies on your system, which can help make them more reliable and accurate.
# create a new virtual environment for testing
python3 -m venv testenv
# activate the virtual environment
source testenv/bin/activate
# install required packages
pip install pytest
# run tests
pytest myproject/tests.py
# deactivate the virtual environment
deactivate
This creates a new virtual environment called testenv
, installs pytest within that environment, and runs tests located in myproject/tests.py
. By running the tests within the virtual environment, you can ensure that they are isolated from other packages and dependencies on your system.
Plus virtual environment also helps with integration tests as well. When integrating with an external API, you want to make sure that your code works correctly with the API, but you also want to avoid accidentally making real requests to the API during development and testing. This is where virtual environments can be useful.
Here’s an example workflow for using virtual environments to perform isolation testing and integration testing.
- Create a new virtual environment for your project using the
venv
module and activate environment,
python -m venv myproject-env
source myproject-env/bin/activate
2. Install your project dependencies using pip
, including any testing frameworks you want to use:
pip install requests pytest
3. Write isolation tests for your code using pytest
, mocking out the external API using the requests-mock
library.
import requests
import requests_mock
def test_my_api_integration():
with requests_mock.mock() as m:
m.get('http://example.com/api/data', json={'result': 'success'})
response = requests.get('http://example.com/api/data')
assert response.json() == {'result': 'success'}
This test simulates making a request to the external API, but uses a mocked response instead of actually sending a request over the network.
4. Run your isolation tests:
pytest
This will run your isolation tests in the virtual environment, verifying that your code works correctly in isolation from the external API.
5. Write integration tests for your code using pytest
, making real requests to the external API:
import requests
def test_my_api_integration():
response = requests.get('http://example.com/api/data')
assert response.json() == {'result': 'success'}
By using virtual environments to perform isolation testing and integration testing, you can ensure that your code works correctly both in isolation and when integrated with other systems, without accidentally making real requests to external systems during development and testing.
This wraps up a quick walk into venv module and how to use it for isolationn, reproductlity, package management and testing. Next we will compare it with conda as another favorite environment management tool among machine learning learning community.