Prepare Your Data Science Environment
- Set the context
- Manage your environment
- Anaconda commands guide
- Extras (Pycharm & EC2 instances)
Set the context
There are many cases where you need to set up your environment, such as working on a local machine, preparing a docker container, or working directly on the server, which can be cheaper and more flexible than off-the-shelf platforms.
It is not that complicated to prepare your machine learning (ML) environment with distribution tools like Anaconda, which helps to manage and distribute Python and R libraries.
Anaconda resolves dependencies; when you install a new library or framework, then the dependency will be installed as well. For example, a command to install pandas will install numPy implicitly.
We are going to go through the steps to prepare your Python ML environment and launch notebooks to write code. Besides, instruction for Pycharm integration with Anaconda and preparing Amazon’s Elastic Compute Cloud (EC2) instances for jupyter notebook.
Anaconda provides an installer for the major operating systems. Installation should be easy, run the execution file and follow the default installer instructions.
The major components:
- Command-line to manage Anaconda which requires a setup in some cases.
- Anaconda navigator is a UI view to manage environments that does not require any setup.
Next are the steps to set up and execute Anaconda commands on macOS, ubuntu, and windows.
To execute Anaconda commands, you have to define the path in your terminal profiles — usually, the Anaconda installer will handle this.
- Open the terminal application.
- Make sure the bash profile points to anaconda.
Edit bash_profile command:
Make sure the anaconda path available [or] add it by yourself (The path can change based on your installation):
3. (Optional — for zsh shell) if your terminal uses zsh shell; then the bash_profile will not be the default path and an extra step required.
Use this command to edit the path:
Add the path to anaconda bin — if not available:
[OR] add the following script to point to the bash profile:
if [ -f ~/.bash_profile ]; then. ~/.bash_profile;fi
For other operating systems it is straightforward to run the installer. However, on a linux based OS you will use the commands to install Anaconda:
Then you have to define the terminal path — usually defined by the installer:
- Open the terminal application.
2. Open the “bashrc” paths:
3. Verify the path to anaconda available or add it to the end of the file:
4. Refresh the terminal source:
To start writing Anaconda commands on a windows machine, go to the start menu and search for “Anaconda Prompt”.
For a user interface with the major environment actions open “Anaconda Navigator”.
Manage your environment
Anaconda allows you to create a separate environment for your projects. There are two ways to manage the environments: (1) using the commands, which are flexible and portable or (2) using a user interface to control the major actions.
Way1 — Anaconda commands
Open the terminal and execute the commands.
- Create a new conda environment:
conda create -n env_name
- Create a new environment and define Python version:
conda create -n env_name python=3.8
- Remove conda environment:
conda env remove -n env_name
- List conda environments:
conda env list
- Activate conda environment: before you start executing any code make sure the relevant environment is active
conda activate env_name
- Export conda environment: will export the active environment
conda env export > environment.yaml
- Import conda environment: will create the environment
conda env create -f environment.yaml
- Export the requirements: a common requirements file for most deployments
pip freeze > requirements.txt
- Import the requirements: will import to the active environment
pip install -r requirements.txt
Way2 — Anaconda navigator interface
- Open the Anaconda navigator application.
2. Go to environments section to create, clone, import, or remove any env.
3. Click “create” to start a new Python or R environment.
Anaconda commands guide
Install the major libraries
- Create environment : bypass if you already created the environment
conda create -n env_name
- Activate your environment:
conda active env_name
- Install numPy:
conda install numpy
- Install pandas and numpy:
conda install pandas
- Install sklearn:
conda install -c conda-forge scikit-learn
- Install Tensorflow: there is no official conda distribution for Tensorflow; it’s better to use python’s pip installation command
pip install --upgrade pippip install tensorflow
- Install PyTorch:
conda install pytorch torchvision torchaudio -c pytorch
NOTE: Python 3.9 users will need to add ‘-c=conda-forge’ for installation
Start jupyter notebook
- Start a jupyter notebook from local environment:
- Start a jupyter notebook from server: replace 0.0.0.0 with the server IP :
jupyter notebook --ip=0.0.0.0 --no-browser
You can connect your Pycharm project to an existing anaconda environment.
- When you start a new Pycharm project:
Select “previous configured interpreter” and click the three dots.
Select the python file for the desired environment from the interpreter menu.
If the menu is empty you can click the three dots and navigate to “anaconda3/envs/environment_name/bin/python”
- For existing Pycharm projects:
Go to preferences and select “Python Interpreter”.
Look for the environment in the drop-down menu or add if not available from the settings button.
Environment location: “anaconda3/envs/environment_name/bin/python”
Prepare EC2 instance
- When you launch a new EC2 instance, search for the “deep learning” compute machine that comes with all the data science requirements.
- Make sure to open the default port for the jupyter notebooks. You can define this in the 6th step of instance creation.
Add a custom TCP rule with port range “8888”; For the source section, it is recommended to add only your IP.
All the steps were tested by me before writing the article; Hopefully, you find this blog useful to start your machine learning projects.