End To End Data Science Project — Part-2: Project Setup Using Setup.py

3 min readNov 19, 2023

Part-2: Project Setup Using Setup.py

The setup.py file is an essential component of Python data science projects as it plays a crucial role in packaging and distributing your code. It provides the necessary instructions for building, installing, and managing your project's dependencies.

python setup.py install

For Full Code Click Here: GitHub

Here’s a brief explanation of setup.py code:

Import Statements:

from setuptools import find_packages, setup
from typing import List

Imports necessary modules for package setup and type hinting.

2. Function Definition:

def get_requirements(file_path: str) -> List[str]:
'''
 This function returns the list of requirements
 '''
 requirements = []
 with open(file_path) as file_obj:
 requirements = file_obj.readlines()
 requirements = [req.replace('\n', "") for req in requirements]
 return requirements

Defines a function `get_requirements` that takes a file path as input, reads the requirements from the file, and returns them as a list of strings.

3. Package Setup:

setup(
 name='End-To-End-Data-Science-Project',
 version='0.0.1',
 description='This is a project to demonstrate the end-to-end data science process using Python and Machine Learning libraries like Scikit Learn, Tensor',
 packages=find_packages(),
 install_requires=get_requirements('requirements.txt')
 )

Configures the package setup using `setup` from `setuptoosls`. It includes details such as package name, version, author, description, packages to include, and dependencies specified in the ‘requirements.txt’ file. The dependencies are obtained using the get_requirements function.

Code defines a function to retrieve package requirements from a file and sets up a Python package using `setuptools`, specifying details like package name, version, author, description, and dependencies. This is a common structure for Python projects, especially those involving data science and machine learning.

Purpose of setup.py:

Packaging: The setup.py file defines the structure and contents of your project's package, including its name, version, description, and dependencies.
Installation: It enables the installation of your project using package managers like pip, ensuring that all required dependencies are installed along with your code.
Distribution: It facilitates the distribution of your project to others by allowing them to easily install and use your code using pip or other package managers.

Key Components of setup.py:

Project Metadata: This includes information like project name, version, description, author, and URL.
Dependencies: This specifies the external libraries or packages required for your project to function properly.
Build Instructions: These instructions define the steps involved in building the project’s distribution package.
Installation Instructions: These instructions specify how to install the project and its dependencies on a target system.

Benefits of using setup.py:

Standardized Packaging: It adheres to a standardized packaging format, ensuring compatibility with popular package managers.
Easy Installation: It simplifies the installation process for users by automating dependency management.
Reproducible Builds: It facilitates reproducible builds, ensuring consistent project setups across different environments.
Simplified Distribution: It streamlines the distribution of your project to others, making it easily accessible for reuse.

For Full Code Click Here: GitHub

In summary, the setup.py file plays a vital role in managing and distributing your data science project, promoting code reusability and collaboration within the Python community.

Follow me for more! LinkedIn || GitHub.

End To End Data Science Project — Part-2: Project Setup Using Setup.py

Part-2: Project Setup Using Setup.py

Written by Ayman Ejaz