Data Environment Setup: The First Step to Mastering Machine Learning with Python.

Learn how to install and configure the essential Python framework and libraries for a seamless machine-learning experience.

Asish Biswas
AnalyticSoul
3 min readMay 22, 2024

--

Before starting your learning journey, we’d highly encourage to join our vibrant learning community on Discord! Feel free to ask any questions, our supportive members are here to help you get started smoothly. Dive in, collaborate, and let’s grow together! We can’t wait to see you there!

Now that you are integrated with our active community, let’s begin this exciting journey. The first thing you need is an optimal development environment. We highly recommend Visual Studio Code (VSCode) paired with the Jupyter Notebook plugin. VSCode’s robust features and user-friendly interface, combined with the interactive power of Jupyter, create a seamless and efficient workspace for your Python projects.

Then you need to setup a clean and isolated Python development environment for your machine learning course. Creating separate environments ensures that your project dependencies don’t interfere with each other. It also prevents version conflicts and makes your code reproducible. This is a good practice you should always follow.

We’ll cover both Python’s built-in venv and the popular package manager conda.

Option 1: Python venv

venv is Python’s native tool for creating isolated virtual environments. It is part of Python’s standard library and focuses solely on creating Python-specific environments. It’s also quite straightforward to use.

Steps to Set Up a venv Environment

  • Open your terminal or command prompt.
  • Navigate to your project directory.
  • Create a new virtual environment:
python -m venv learn_data_science

In the example above, learn_data_science is the environment we create. Feel free to use your preferred environment name.

  • Activate the environment
  • On Windows:
learn_data_science\Scripts\activate
  • On macOS/Linux:
source learn_data_science/bin/activate
  • Install necessary packages using pip.

Option 2: conda

conda is a cross-platform package and environment manager that comes with the Anaconda distribution. conda manages both packages and environments. It’s not limited to Python and can handle multiple languages.

Steps to Set Up a conda Environment

  • Install Anaconda or Miniconda (if not already installed).
  • Open your terminal or Anaconda Prompt.
  • Create a new environment:
conda create --name learn_data_science python=3.8
  • Feel free to replace learn_data_science with your preferred environment name and specify the desired Python version.
  • Activate the environment:
conda activate learn_data_science
  • Install necessary packages using conda or pip.

Installing Required Libraries

After setting up your environment (whether using venv or conda), let’s install the necessary Python libraries for this course.

Using pip:

pip install pandas numpy matplotlib seaborn plotly scikit-learn

Using conda:

conda install pandas numpy matplotlib seaborn plotly scikit-learn
  • Numpy: Numpy provides support for operating on large, multi-dimensional arrays and matrices. We’ll use numpy for numerical operations, linear algebra, and statistical computations. 🚀
  • Pandas: Pandas is your data manipulation Swiss Army knife. It handles data in tabular form (like Excel spreadsheets) using DataFrames. We’ll utilize pandas to clean, transform, and analyze data efficiently.
  • Matplotlib: Matplotlib is a powerful plotting library. We’ll use it to create static, 2D visualizations (line plots, scatter plots, histograms, etc.).
  • Seaborn: Seaborn builds on Matplotlib and adds statistical visualizations. It’s great for exploring relationships between variables.
  • Plotly: Plotly is an interactive plotting library. We’ll create dynamic, interactive visualizations (scatter plots, bar charts, etc.) with it.
  • Scikit-Learn (sklearn): Scikit-Learn is the go-to library for machine learning. It provides tools for classification, regression, clustering, and more. We’ll use it to build predictive models.

Conclusion

In this lesson, we explored two methods for creating isolated Python environments: venv and conda. In the next lesson, we’ll dive into the exciting world of data exploration techniques and exploratory data analysis (EDA). We’ll uncover hidden patterns, relationships, and insights within your datasets. Understanding your data thoroughly is the first step toward building robust machine learning models.

--

--