Setting Up Python Environment for Data Science/NLP Project

Ruhee
4 min readSep 3, 2023

--

What we need to run any program:

  1. A computer
  2. Code editor to create/make changes to the code i.e VSCode
  3. Running environment : Web server (for web site), terminal with program binary (i.e Python installed for python language)
  4. Version control for tracking code changes when more than 1 person working on same project i.e GIT
  5. Optional: GitHub.com account for sharing/collaborating with many people in the team
  6. Have a local code editor, which is downloaded for PyCharm or Visual Studio Code (chose VS Code as a result of it’s excellent debugging and 100 percent free guarantee)
  7. Have a source control system like Git and/or GitHub

Important GIT commands:

  • git init — To initiate the git on the directory/repository
  • git status — status of the git repository
  • git add <filename> or git add <file1,file2…..filen>
  • git rm <filename> or git rm <file1, file2, …. filen>
  • git commit -m “Message for that commit”

3. Install Python 3.x or latest version

If you see “python not found” even after installing, make sure to add the install path to PATH and install python through brew installer for macOS.

How do we make sure modules installed for one project would not create conflict with another project ?

Download and install Anaconda to create separate virtual environments (containers) in order to run the code separately. Anaconda provides framework for other integrated tools like Jupyter notebook etc.

After installing Anaconda navigator, create new environment for your project Ex: np-basics as shown below. you can choose different python version for different environments/projects.

You need to tell your VSCode editor which Python interpreter, the code needs to use to run i.e conda environment specific to project.

Now, Conda virtual environment nlp-basics is setup and is connected to VSCode editor. Next step is to install all the needed modules in the virtial environment for the code to run.

Basic Python modules needed for NLP processing:

  1. nltk
  2. pandas
  3. spacy
  4. numpy
  5. matplotlib
  6. networkx (optional)
  7. prover9 (optional)

Steps to install python package/module into conda virtual environment:

(a) Activate the virutal environment created for the project

You can activate through Anaconda GUI or command line interface (CLI) through Terminal.

Make sure pip is installed in your anaconda directory. If not, first install pip by running

conda install -c conda-forge pip

Run below command to make sure pip is installed in anaconda virtual environment.

pip --version

You can save all your environment as a .yml file using the command below.

conda env export > <filename>.yml

This .yml file can be shared with another team member to clone the environment. You can either use the Anaconda Navigator GUI > Import option or the command below to clone the environment.

conda env create -f <file name>.yml

Now python environment is fully setup and ready to code !

--

--

Ruhee

Middle School student and this is my journey through the tech world!