Data Science Project Structure Simplified!πŸ“–

Source : Vecteezy

I recently came across the method of creating a project structure with so much ease, just using a single python file, yes, you heard it right, just a few lines of code and you are out of the dilemma of creating folders and files manually.

I would like to thank Mr. Krish Naik for sharing such information which helps budding data scientists (including me ✌️)to upgrade their skills by β€œsubtle tweaks” in their project implementation.

Obsolete method (which I will never ever use after this, and you too should not):

Previously, to be honest, I used to create all the folders of any data science project manually and accordingly its subfolders as well. If you are a data science practitioner, then you must be knowing that to maintain the code compatibility (modular coding) and following industry standards requires a folder structure, which of course, follows DRY (Don’t Repeat Yourself) principle.

New method (which I am going to adopt, and you too should):

When I started, I mentioned we will be creating all these using a single python file. So, let’s straight away take a look at how this works.

Step 1: Create your main data science project folder.

Step 2: Inside the folder, create a python file (.py extension) of your choice, I created it as β€œtemplate.py”

Step 3: Now just write the below mentioned code. If you are short of time, just copy and paste. (I would recommend you to write, as it will help you understand better).

import os
from pathlib import Path
import logging

logging.basicConfig(level=logging.INFO, format='[%(asctime)s]: %(message)s:')

project_name = "<your-project-name>"

list_of_files = [
".github/workflows/.gitkeep",
f"src/{project_name}/__init__.py",
f"src/{project_name}/components/__init__.py",
f"src/{project_name}/utils/__init__.py",
f"src/{project_name}/utils/utils.py",
f"src/{project_name}/logging/__init__.py",
f"src/{project_name}/config/__init__.py",
f"src/{project_name}/config/configuration.py",
f"src/{project_name}/pipeline/__init__.py",
f"src/{project_name}/entity/__init__.py",
f"src/{project_name}/constants/__init__.py",
"config/config.yaml",
"params.yaml",
"app.py",
"main.py",
"Dockerfile",
"requirements.txt",
"setup.py",
"research/trials.ipynb"

]

for filepath in list_of_files:
filepath = Path(filepath)
filedir, filename = os.path.split(filepath)

if filedir != "":
os.makedirs(filedir, exist_ok=True)
logging.info(f"Created directory: {filedir} for the file {filename}")

if (not os.path.exists(filepath)) or (os.path.getsize(filepath) == 0):
with open(filepath, 'w') as f:
pass
logging.info(f"Created empty file: {filepath}")
else:
logging.info(f"{filename} already exists")

Here, I am not going to go explain each step line by line as at the end of this blog I have attached a youtube video link, which you can watch and understand thoroughly.

Regardless, let’s get high level understanding of this code:

  • We are just using simple python os module, combined with pathlib and logging module to create folders. Here a python list is created which consists of folder and file names in a β€œpath” format so that it will be easy for pathlib module to render.
  • And at last, we are just looping through each and every item in list and using them inside certain functions of the above stated modules to create the folders.

Important note : The β€œlist_of_files” list variable contains files and folders names depending on the project requirement. Some of them you will find common in almost all the projects and some may not be used. It all depends upon your project requirement.

After running the code:

Ta-daa!, you have crated the much awaited folders and files β€”

That’s it!

I hope you will definitely adopt this methodology and make your data science journey smoother and efficient!

The reference of this methodology was taken from the video attached below. Feel free to watch it and learn the industry standard end-to-end data science project implementation.

Link to video :

That’s from my side, keep exploring and learning.

Also, Read

Follow our Social Accounts- Facebook/Instagram/Linkedin/Twitter

Join AImonks Youtube Channel to get interesting videos.

--

--

Yash Wasalwar
π€πˆ 𝐦𝐨𝐧𝐀𝐬.𝐒𝐨

Ex-Research Intern @DRDO Β· Always learning Β· Loves to talk about Data Science and Life Experiences