End To End Data Science Project — Part-2: Project Setup Using Setup.py

Ayman Ejaz
3 min readNov 19, 2023

--

Part-2: Project Setup Using Setup.py

The setup.py file is an essential component of Python data science projects as it plays a crucial role in packaging and distributing your code. It provides the necessary instructions for building, installing, and managing your project's dependencies.

python setup.py install

For Full Code Click Here: GitHub

Here’s a brief explanation of setup.py code:

  1. Import Statements:
from setuptools import find_packages, setup
from typing import List

Imports necessary modules for package setup and type hinting.

2. Function Definition:

def get_requirements(file_path: str) -> List[str]:
'''
This function returns the list of requirements
'''
requirements = []
with open(file_path) as file_obj:
requirements = file_obj.readlines()
requirements = [req.replace('\n', "") for req in requirements]
return requirements

Defines a function `get_requirements` that takes a file path as input, reads the requirements from the file, and returns them as a list of strings.

3. Package Setup:

setup(
name='End-To-End-Data-Science-Project',
version='0.0.1',
description='This is a project to demonstrate the end-to-end data science process using Python and Machine Learning libraries like Scikit Learn, Tensor',
packages=find_packages(),
install_requires=get_requirements('requirements.txt')
)

Configures the package setup using `setup` from `setuptoosls`. It includes details such as package name, version, author, description, packages to include, and dependencies specified in the ‘requirements.txt’ file. The dependencies are obtained using the get_requirements function.

Code defines a function to retrieve package requirements from a file and sets up a Python package using `setuptools`, specifying details like package name, version, author, description, and dependencies. This is a common structure for Python projects, especially those involving data science and machine learning.

Purpose of setup.py:

  1. Packaging: The setup.py file defines the structure and contents of your project's package, including its name, version, description, and dependencies.
  2. Installation: It enables the installation of your project using package managers like pip, ensuring that all required dependencies are installed along with your code.
  3. Distribution: It facilitates the distribution of your project to others by allowing them to easily install and use your code using pip or other package managers.

Key Components of setup.py:

  1. Project Metadata: This includes information like project name, version, description, author, and URL.
  2. Dependencies: This specifies the external libraries or packages required for your project to function properly.
  3. Build Instructions: These instructions define the steps involved in building the project’s distribution package.
  4. Installation Instructions: These instructions specify how to install the project and its dependencies on a target system.

Benefits of using setup.py:

  1. Standardized Packaging: It adheres to a standardized packaging format, ensuring compatibility with popular package managers.
  2. Easy Installation: It simplifies the installation process for users by automating dependency management.
  3. Reproducible Builds: It facilitates reproducible builds, ensuring consistent project setups across different environments.
  4. Simplified Distribution: It streamlines the distribution of your project to others, making it easily accessible for reuse.

For Full Code Click Here: GitHub

In summary, the setup.py file plays a vital role in managing and distributing your data science project, promoting code reusability and collaboration within the Python community.

Follow me for more! LinkedIn || GitHub.

--

--