Udacity “communicate data findings” — Prosper Loan Exploratory Data Analysis (EDA)

Olamide Emida
5 min readSep 28, 2023

--

I completed the Udacity “communicate data findings” project from the Data Analysis Nanodegree course. I chose the Prosper loan dataset that you can find here, and the task was to perform an Exploratory Data Analysis using Python and to create a presentation with explanatory plots that convey my findings.

I summarized the presentation in this blog post and provided visuals to facilitate comprehension. I looked at the variables of loans that could influence the borrower rate. The main focus was on the following variables: loan amount, monthly income and credit grade. I introduced each variable, showed the purpose of the loan and plotted the relationship between borrower rate vs. loan amount, monthly income and rating grade.

About Prosper

Prosper Loan

Prosper is the first peer-to-peer lending marketplace in the United States. It has facilitated over $23 billion in loans to more than 1.4 million people. Prosper allows individuals to invest in each other. Borrowers can easily apply for a fixed-rate, fixed-term loan online between $2,000 and $50,000. Individuals or institutions can invest in these loans and earn appealing returns. Prosper manages all loan servicing on behalf of the matched borrowers and investors.

Dataset Overview

The data contains 113,937 loans from 2006 to 2014, with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.

Preliminary Wranging

I loaded the data and assessed the data for data quality issues both visually and programmatically. Then, I identified and performed the following data cleaning to make the data ready for exploration:

  1. Selected variables of interest
  2. Changed data types for the listing creation date column
  3. Filled null values in credit grade and prosper rating columns
  4. Renamed the listing category column and created a new average credit score column
  5. Deleted extraneous columns
  6. Updated the numeric values in the Listing category column and the Employment status column.
  7. Removed null values and
  8. Removed duplicate rows

Here is the information about the final cleaned data:

Information of the cleaned data showing the columns names and shape of the cleaned data

The 81 variables of the original dataset have been reduced to 15 variables of interest.

Exploratory Data Analysis

Univariate Exploration

Distribution of Borrower Rate

Borrower rate takes a range of values between 5% and 35%. The loan borrower rate with the highest frequency is around 14%. The borrower rate for Prosper seems to be generally higher for borrowers. However, this is not necessarily a cause for concern as the rate is influenced by various factors such as credit score, loan term, and loan amount.

Distribution of borrower Rate

Distribution of Loan Amount

The loan amount ranges from $1,000 to $25,000, with most loans around $4,000. The distribution is right-tailed.

Loan Amount distribution

Purpose of the Loans

Most loans were acquired to consolidate debt, which could be why most loans were around $4,000.

Purpose of the loans

Bivariate Exploration

What features of loans influence the borrower rate?

Loan Amount Vs. Borrower Rate

Surprisingly, there is a negative relationship between loan amount and borrower rate. Some borrowers of lower loan amounts below $10,000 paid higher borrower rates, while borrowers of loan amounts above $25,000 paid below 20% borrower rates.

The relationship between loan amount and borrower rate

Rating Grade Vs. Borrower Rate

The loans are graded based on credit risk, ranging from AA (the least risky) to HR (the highest risk). There is a positive relationship between rating grade and borrower rate. As the grade (riskiness) increases, the borrower rate increases on average.

The relationship between Rating Grade and Borrower Rate

Multivariate Exploration

Monthly Income and Borrower Rate by Loan Amount

There is a negative relationship between monthly income and borrower Rate. Some borrowers of lower loan amounts (below $15,000) with monthly income below $10,000 paid higher borrower rates. Other borrowers earning above $10,000 monthly and borrowing above $15,000 paid below 25% as borrower rate.

The relationship between monthly income and borrower rate by loan amount

Monthly Income and Borrower Rate by Rating Grade

The negative correlation between monthly incomes and borrower rates only applied to low-risk loans (AA, A, and B). As income increases, the borrower rate decreases for these levels of risk.

As the level of risk increased, the relationship between monthly incomes and borrower rates was positively correlated. For higher-risk loans (C, D, E, HR), the borrower rates increased as the grade increased, irrespective of the monthly income.

Therefore, the grades assigned to borrowers significantly influenced their interest rates. Irrespective of the borrowers’ monthly incomes and loan amounts, loans with high credit grade (high risk) tend to have high borrower rates.

The relationship between monthly Income and Borrower Rate by rating grade

Conclusion

These are the findings I have gathered from the analysis.

  • The loan borrower rate with the highest frequency was around 14%, while the borrower rate was between 5% and 35%.
  • Prosper mostly gave out loans below $5,000 and required a credit score between 650 and 750.
  • Most of the loans obtained were to consolidate other loans and were current, completed, and C-grade.
  • The negative correlation between the borrower rate and the loan amount was surprising. The borrower’s interest rate was impacted by the credit grade assigned to their loan. So, irrespective of the loan amount, the risk attached to the loan determines its borrower rate.

This is an exciting project for me as it allowed me to apply data analysis to my field of interest.

You can access the complete code for this project in my Github here.

Here is the link to my other projects.

Finally, connect with me on LinkedIn and Twitter. Thank you so much.

I hope you found this enjoyable to read!

--

--

Olamide Emida

I am a Chartered Accountant and a Data Analyst. I make sense of complex data using SQL, Excel, Power BI and Python.