Real Dataset To Build Data Science Project as a Beginner Data Scientist

This fintech dataset can help to build a real ML model for loan default prediction as a beginner data scientist.

Maria Gusarova
4 min readJul 19, 2022

This article is part of a series where we walk step by step through solving fintech problems with different Machine Learning techniques using the “All lending club loan” dataset. Here you can find the complete end-to-end data science project for beginners to learn data science.

Kaggle has over 50,000 public datasets and 400,000 public notebooks available and it is easy to get lost in details. The choice is in front of you, and it is better to find a dataset based on the industry or area of interest where you would be interested in working with data.

For example, it could be Health (World Health Statistics 2020 dataset, Mental Health in Tech Survey dataset, Heart Failure Prediction dataset), Finance (Stock Exchange Data dataset, Credit Card Fraud Detection), Fashion/Shopping(H&M Personalized Fashion Recommendations dataset, Groceries dataset), Travel (Trip Advisor Hotel Reviews dataset) and so on, I encourage you to explore Kaggle datasets by your interest.

If your interest is fintech, one dataset you should get your hands on is All Lending Club loan data.

This dataset can be used to solve the problem of lending risk with data science. And the data is 100% based on real applicants, and if you are a beginner data scientist it will help you to gain real knowledge and experience by working with real data.

You can use the Lending Club dataset and solve real lending risk problems for a real company.

Lending Club is a peer-to-peer (P2P) lending community and it was founded in 2007. So far, it has issued more than $50 billion in loans and connected more than three million borrowers with investors, where borrowers typically request up to $40,000, while investors (lenders) provide these funds for up to 8% returns.

This is how the lending P2P process looks like:

Drawing by the author using www.flaticon.com

Borrowers are assessed based on credit history, credit score, and other details. The higher credit score, low DTI, and lengthy credit history the lower interest rates (or less risk to investors) the applicant gets.

The problem you can solve

To borrow money, credit analysis is performed. Credit analysis involves the measure to investigate the probability of the applicant paying back the loan on time, or if the applicant defaults and fails to pay it back. There are two basic risks:

  1. Business loss of potential revenue that results from not approving a good candidate or rejecting too many;
  2. The financial loss that results from approving a candidate who ends up not paying the loan back.

These challenges get more complicated as the count of applications increases that are reviewed by loan officers. Human approval requires an extensive hour of effort to review each application, however, the company will always seek to optimize costs and improve human productivity. This sometimes causes human error and bias, as it’s not practical to digest a large number of applicants considering all the factors involved.

ML as the solution

With ML, you will be able to effectively tackle these challenges. First, semi-automate the massive approval process, so that loan officers can focus on the most important part of the application. Second, you can optimize the trade-off between revenue and default loss to yield maximum benefit to the business.

Want to learn more? Here is the complete end-to-end data science project for beginners to learn data science. By completing this project: 1) you will experience the entire data science cycle yourself, 2) you will develop a project that you can use to prove your experience, and 3) you will answer the most popular interview questions in case you decide to pursue the career of a data scientist.

What do you struggle with in your early journey? Please share it with me here, and I am happy to help! I listen to your stories carefully and want to produce content that helps you in this journey. For more content like this, sign up for my newsletter.

--

--