End To End Data Science Project — Part-2: Project Setup Using Setup.py
Part-2: Project Setup Using Setup.py
The setup.py
file is an essential component of Python data science projects as it plays a crucial role in packaging and distributing your code. It provides the necessary instructions for building, installing, and managing your project's dependencies.
python setup.py install
Here’s a brief explanation of setup.py code:
- Import Statements:
from setuptools import find_packages, setup
from typing import List
Imports necessary modules for package setup and type hinting.
2. Function Definition:
def get_requirements(file_path: str) -> List[str]:
'''
This function returns the list of requirements
'''
requirements = []
with open(file_path) as file_obj:
requirements = file_obj.readlines()
requirements = [req.replace('\n', "") for req in requirements]
return requirements
Defines a function `get_requirements` that takes a file path as input, reads the requirements from the file, and returns them as a list of strings.
3. Package Setup:
setup(
name='End-To-End-Data-Science-Project',
version='0.0.1',
description='This is a project to demonstrate the end-to-end data science process using Python and Machine Learning libraries like Scikit Learn, Tensor',
packages=find_packages(),
install_requires=get_requirements('requirements.txt')
)
Configures the package setup using `setup
` from `setuptoosls
`. It includes details such as package name, version, author, description, packages to include, and dependencies specified in the ‘requirements.txt’ file. The dependencies are obtained using the get_requirements
function.
Code defines a function to retrieve package requirements from a file and sets up a Python package using `setuptools`, specifying details like package name, version, author, description, and dependencies. This is a common structure for Python projects, especially those involving data science and machine learning.
Purpose of setup.py:
- Packaging: The
setup.py
file defines the structure and contents of your project's package, including its name, version, description, and dependencies. - Installation: It enables the installation of your project using package managers like pip, ensuring that all required dependencies are installed along with your code.
- Distribution: It facilitates the distribution of your project to others by allowing them to easily install and use your code using pip or other package managers.
Key Components of setup.py:
- Project Metadata: This includes information like project name, version, description, author, and URL.
- Dependencies: This specifies the external libraries or packages required for your project to function properly.
- Build Instructions: These instructions define the steps involved in building the project’s distribution package.
- Installation Instructions: These instructions specify how to install the project and its dependencies on a target system.
Benefits of using setup.py:
- Standardized Packaging: It adheres to a standardized packaging format, ensuring compatibility with popular package managers.
- Easy Installation: It simplifies the installation process for users by automating dependency management.
- Reproducible Builds: It facilitates reproducible builds, ensuring consistent project setups across different environments.
- Simplified Distribution: It streamlines the distribution of your project to others, making it easily accessible for reuse.
In summary, the setup.py
file plays a vital role in managing and distributing your data science project, promoting code reusability and collaboration within the Python community.