Geek Culture
Published in

Geek Culture

How to Build the Strong Data Science Portfolio as a Beginner

Learn a unique way to build a data science portfolio that gets you hired.

How to Build the Strong Data Science Portfolio as a Beginner
Photo by Hal Gatewood on Unsplash

As a beginner, I had many questions about how do I start? how do I learn or where do I get ideas to work on projects? So, after a long search, I found a project on data analysis. It took me 3 days just to write code and I was happy with my first try, but then there was this big question of how do I share it with the world? I simply did not have good coding skills or documentation skills to showcase my work, so I stored it in the cloud and forget about it. After a month, I was randomly looking for more projects on GitHub and found this amazing profile that motivated me to create my portfolio. That was the best decision I made as it put me on the map of the developer community and soon after I started to get emails from the recruiters and beginners about my projects.

Platinum Award — KDnuggets
Platinum Award — KDnuggets

Getting a job is usually the main reason for building a portfolio. Sometimes, it’s necessary if we don’t have the relevant education or experience (eugeneyan.com). In this modern world, the employers are skeptical about hiring new graduates so how do you convince them that you are best for the job? You display your skills by showing the work you have done in a previous project. The stronger your online portfolio means the higher chance you have of getting hired for your dream job.

“The portfolios are extremely critical to have because when you’re in the interview it shows your real-world experience, so you can explain to an employer from A to Z the entire data science workflow.” — David Yakobovitch.

The other motivation is to create your personal project that satisfies your curiosity about learning new things. When we learn a new skill, we want to experiment and eventually build a working product that can be used in the real world.

In this article, we will learn the ways you can showcase your work as a data science beginner. You will learn about some new platform that makes your life easy and tips on building strong portfolios.

GitHub

Let me just clear the misconception among data scientists, yes GitHub is necessary, and we all should learn git. As a data scientist, I use Github daily where I look for interesting data sets and projects. This is the most popular platform among developers and to be honest, the recruiter does check your GitHub profile before calling you for interview.

Getting started with Julia Machine Learning Library with FastAI.jl
Image from GitHub

GitHub is a global collaborative platform where people share and collaborate on projects. As you can see in my profile below how I have contributed to other people’s projects and also worked on my own projects too.

Abid’s GitHub portfolio
Image by Author | kingabzpro

Tips to create a solid profile:

  1. Create your profile page, for a complete tutorial check Sarah Hart’s blog.
  2. Document every project with links, cover images, and detailed descriptions.
  3. Fork the project that you like the most and send your first pull request (freecodecamp.org).
  4. Be active on this platform by contributing, bug reporting, and pushing your current projects.

Deepnote

Deepnote is much simpler the GitHub and its beginner friendly too. If you are familiar with the Jupyter notebook then it will be piece of cake for you to publish your first project. My experience with Deepnote is absolutely amazing as the platform provides you all the qualities of GitHub but is much simpler and focused on the data scientist’s community.

Deepnote notebook
Image by Author | Pakistan Vaccination Progress

Recently, they have introduced a Deepnote profile which will show case all the notebooks you publish with your information and profile picture.

Abid’s Deepnote profile
Image by Author | Abid Ali Awan

Just like GitHub Gist, you can share a snippet of your code with your team or the public in general. I have used Deepnote cell on all the Medium Publication and social media platforms. You can check my previous article to understand how to implement a Deepnote cell. Using snippets of code with output gives you the ability to share your projects on multiple platforms.

The reason I prefer Deepnote embedded cell over GitHub Gist is that it comes with output, not just static output but with interactive features.

You can use Plotly and display your chart in a Medium article.

Tips to create a solid profile:

  1. Update your bio, profile photo, and contact information.
  2. Always add detailed descriptions about your project by using markdown cell.
  3. Use the cover photo to make your project stand out.
  4. Use App features in Deepnote to create Interactive webapp.
  5. Keep posting your old project or even reposting notebooks from GitHub.

DagsHub

DagsHub is new to this world and it’s making its name quickly by providing one stop solution for machine learning practitioners and data engineers. DagsHub comes with a DVC server, MLflow, Visualizing pipeline, and GitHub Synchronization. We won’t be going deep into features rather we will be focusing on the features that make it stands out.

DagsHub allows you to share your GitHub repository and create your data science project with the ability to visualize machine learning and data pipelines. It also has hidden feature README.ipynb as your project description file, which is best for beginners who are not used to markdown and data scientist who love working on Jupyter Notebook. It is similar to GitHub which means you need to learn both Git and DVC to use this platform properly.

What I’ve seen other users enjoy is the ability to visualize their project structure, via the pipeline, as well as the ability to see their data and models as an integral part of the project. Also, the fact that we are based on open-source tools instead of reinventing existing solutions is something people like. — Dean

DagsHub pipelines
Image by Dean | dagshub

My profile is quite new, but I am loving this platform as they provide me with a complete machine learning ecosystem. I think I prefer it more than GitHub in terms of features and UI simplicity.

Abid Ali Awan’s DagsHub profile
Image by Author | Abid Ali Awan

Tips to create a solid profile:

  1. Learn DVC, Git, and mlflow to take full advantage.
  2. Add project description to your notebook and README.
  3. Update your profile by adding bio, avatar, and contact information.
  4. Try to add dvc.yaml and dvc.lock in your project to display data pipelines, for more information check Defining the Pipeline.
  5. Keep an active profile by contributing to open-source projects and by pushing your personal project. You can use fds cli to make your life easy and avoid mistakes.
  6. Takes full use of DVC by uploading your data and model on remote server. Recruiters are interested in candidates that know the complete data science cycle from data ingestion to dashboards.

Kaggle

If you want to get noticed faster in the world of data science you should create a Kaggle account and start contributing to competitions, datasets, notebooks, and discussions. When you become grandmaster people respect you and offer you better career opportunities. If you ask me, I will suggest you create a Kaggle profile while learning basics. Learn from experts and discover your niche. I am a huge fan of this platform as it provides support for a beginner to compete and develop innovative solutions for various industries. It is the backbone of AI research.

Kaggle notebook
Image by Author | Kaggle

You can check out my profile below as from the start I have been contributing in various categories to gain ranks. Currently, I am an Expert but with one gold and silver medal in the competition, I will become a Master which is not easy and honestly, I respect Grandmasters as they have proven that they are the best among other data practitioners.

Abid’s Kaggle Profile
Image by Author | Kaggle

Tips to create a solid profile:

  1. Be active on the platform by using new datasets and creating data analysis or machine learning models.
  2. Participate in discussion, learn from experts, and ask for help.
  3. Use web scraping to publish a new dataset.
  4. Participate in most competitions to learn several types of machine learning problems and to earn badges.
  5. Focus on publishing your best work with detailed descriptions and high-quality code.
  6. Write about yourself in bio and add contact details.

Blog

Writing Blogs are the next step after creating your project on the above platforms. If you want to expand your audience, I will highly suggest you start with Medium. Writing a blog is not necessary but you get more traction from various fields. The Medium allows you to create your profile and let you publish your articles under various publications such as Towards Data Science and Towards AI. You can develop your blogging site or use another similar platform such as Analytics Vidhya.

Abid Ali Awan Medium Blog
Image by Author | Medium

Tips to create a solid profile:

  1. Write blogs about the project you personally worked on.
  2. Create blogs on Emerging technology or on new data science applications.
  3. Do proper research while writing blogs and add citations to avoid platform rules violation.
  4. Use attractive cover photos for every blog.
  5. Always write about what you learn from your experience while developing certain data science project.
  6. Don’t follow the trend, focus on the things you are good at.

Portfolio Website

You can also display your project on the personal website and if you are not a web developer there are some simple tools available to make the process quite easy. You can check out How to Build a Data Science Portfolio Website with Hugo & GitHub Pages and Hugo for various templates.

My portfolio website has a project from all the platforms with short descriptions and subcategories. It took me three days to create the entire website and deploy it on GitHub pages.

Abid’s Portfolio
Image by Author | Abid’s Portfolio

Tips to create a solid portfolio website:

  1. Add your skill, bio, and CV.
  2. Display your Experience and Awards.
  3. Showcase your projects with links to your GitHub or Deepnote projects.
  4. Make your website minimal and interactive so that the recruiter has an easy time scrolling through your entire portfolio.
  5. Keep your portfolio website up to date with the latest project you are working on.

Weight & Biases

I usually use Weight & Biases for machine learning experimentation and logging performance metrics of my models, but that changed with the introduction of W&B profile. You can write a blog about your current project by using embedded links and graph integration. It is quite similar to other portfolio platforms I mentioned but it comes with the peak of direct integration with python libraries.

Ayush profile has impressed me the most as he has been contributing to other organizations while writing blogs about machine learning.

Ayush’s wandb profile
Image by Ayush | Weights & Biases

The W&B project has model performance metrics as shown below.

Image by Author | kaggle-seti

Tips to create a solid profile:

  1. Join other data science organizations and participate in group projects.
  2. Use W&B API to display your machine learning project results.
  3. Write blog using W&B metrics integration.
  4. Add a bio, profile picture, contact information.
  5. Try to engage in community discussion and always look for a new interesting project.

Conclusion

W&B is a wildcard as it is famous for logging the experiments and not for portfolios but with introduction of interactive blogs has given us the unique advantage of displaying your project and create a strong portfolio.
If you are a beginner, I will suggest you start with Deepnote as it’s free for teams and give your beginner-friendly tools to get started. If you are looking to get noticed by the data science community, try creating your profile on GitHub and Kaggle. If you are into creating your brand, then start with blogging sites or create your website.
In the end, I want you all to create your profile on all the platforms I mentioned above as they all come with unique advantages in impressing your potential employer. I know it’s quite overwhelming at the start but once you get used to documenting and showcasing your projects it will get easy.

Top Blogs Rewards for October 2021 — KDnuggets

The Original Blog is posted on KDnuggets.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store