Folder Structure for Machine Learning Projects
Simple steps to create an automated folder structure!
Having a well-organized general Machine Learning project structure makes it easy to understand and make changes. Moreover, this structure can be the same for multiple projects, which avoids confusion. In this post, we will use the Cookiecutter package to create a Machine Learning project structure.
Step 1: Make sure that you have latest python and pip installed in your environment.
Step 2: Install cookiecutter
pip install cookiecutter
Step 3: Create a sample repository on github.com (e.g., my-test)
Note: Don’t check any options under ‘Initialize this repository with:’ while creating a repository.
Step 4: Create a project structure
Go to a folder where you want to set up the project in your local system and run the following:
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science
If you run the above command multiple times (as part of practicing), it would ask you the following:
You've downloaded \.cookiecutters\cookiecutter-data-science before. Is it okay to delete and re-download it? [yes]:yes
It will ask the following options:
project_name [project_name]: my-testrepo_name [my-test]: my-testauthor_name [Your name (or your organization/company/team)]: Your namedescription [A short description of the project.]: This is a test projSelect open_source_license:
1 - MIT
2 - BSD-3-Clause
3 - No license file
Choose from 1, 2, 3 [1]: 1s3_bucket [[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')]:aws_profile [default]:Select python_interpreter:
1 - python3
2 - python
Choose from 1, 2 [1]: 1
Note: You can ignore the ‘s3_bucket’ and ‘aws_profile’ options.
Step 5: Add project to the git repository
cd my-test// Initialize the git
git init// Add all the files and folder
git add .// Commit the files
git commit -m "Initialized the repo with cookiecutter data science structure"// Set the remote repo URL
git remote add origin https://github.com/your_user_id/my-test.git
git remote -v// Push to changes from local repo to github
git push origin master
The final structure will be the following:
Note: The data folder won’t appear in github. It will be in your local folder. This is not pushed to githhub as it will be in the ignore list (.gitignore file). If you want to checkin that also, just comment out in .gitignore file and add the data folder to github.
# /data/
Thank you for reading! Please 👏and follow me if you liked this post, as it encourages me to write more!