Udacity “communicate data findings” — Prosper Loan Exploratory Data Analysis (EDA)
I completed the Udacity “communicate data findings” project from the Data Analysis Nanodegree course. I chose the Prosper loan dataset that you can find here, and the task was to perform an Exploratory Data Analysis using Python and to create a presentation with explanatory plots that convey my findings.
I summarized the presentation in this blog post and provided visuals to facilitate comprehension. I looked at the variables of loans that could influence the borrower rate. The main focus was on the following variables: loan amount, monthly income and credit grade. I introduced each variable, showed the purpose of the loan and plotted the relationship between borrower rate vs. loan amount, monthly income and rating grade.
About Prosper
Prosper is the first peer-to-peer lending marketplace in the United States. It has facilitated over $23 billion in loans to more than 1.4 million people. Prosper allows individuals to invest in each other. Borrowers can easily apply for a fixed-rate, fixed-term loan online between $2,000 and $50,000. Individuals or institutions can invest in these loans and earn appealing returns. Prosper manages all loan servicing on behalf of the matched borrowers and investors.
Dataset Overview
The data contains 113,937 loans from 2006 to 2014, with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others.
Preliminary Wranging
I loaded the data and assessed the data for data quality issues both visually and programmatically. Then, I identified and performed the following data cleaning to make the data ready for exploration:
- Selected variables of interest
- Changed data types for the listing creation date column
- Filled null values in credit grade and prosper rating columns
- Renamed the listing category column and created a new average credit score column
- Deleted extraneous columns
- Updated the numeric values in the Listing category column and the Employment status column.
- Removed null values and
- Removed duplicate rows
Here is the information about the final cleaned data:
The 81 variables of the original dataset have been reduced to 15 variables of interest.
Exploratory Data Analysis
Univariate Exploration
Distribution of Borrower Rate
Borrower rate takes a range of values between 5% and 35%. The loan borrower rate with the highest frequency is around 14%. The borrower rate for Prosper seems to be generally higher for borrowers. However, this is not necessarily a cause for concern as the rate is influenced by various factors such as credit score, loan term, and loan amount.
Distribution of Loan Amount
The loan amount ranges from $1,000 to $25,000, with most loans around $4,000. The distribution is right-tailed.
Purpose of the Loans
Most loans were acquired to consolidate debt, which could be why most loans were around $4,000.
Bivariate Exploration
What features of loans influence the borrower rate?
Loan Amount Vs. Borrower Rate
Surprisingly, there is a negative relationship between loan amount and borrower rate. Some borrowers of lower loan amounts below $10,000 paid higher borrower rates, while borrowers of loan amounts above $25,000 paid below 20% borrower rates.
Rating Grade Vs. Borrower Rate
The loans are graded based on credit risk, ranging from AA (the least risky) to HR (the highest risk). There is a positive relationship between rating grade and borrower rate. As the grade (riskiness) increases, the borrower rate increases on average.
Multivariate Exploration
Monthly Income and Borrower Rate by Loan Amount
There is a negative relationship between monthly income and borrower Rate. Some borrowers of lower loan amounts (below $15,000) with monthly income below $10,000 paid higher borrower rates. Other borrowers earning above $10,000 monthly and borrowing above $15,000 paid below 25% as borrower rate.
Monthly Income and Borrower Rate by Rating Grade
The negative correlation between monthly incomes and borrower rates only applied to low-risk loans (AA, A, and B). As income increases, the borrower rate decreases for these levels of risk.
As the level of risk increased, the relationship between monthly incomes and borrower rates was positively correlated. For higher-risk loans (C, D, E, HR), the borrower rates increased as the grade increased, irrespective of the monthly income.
Therefore, the grades assigned to borrowers significantly influenced their interest rates. Irrespective of the borrowers’ monthly incomes and loan amounts, loans with high credit grade (high risk) tend to have high borrower rates.
Conclusion
These are the findings I have gathered from the analysis.
- The loan borrower rate with the highest frequency was around 14%, while the borrower rate was between 5% and 35%.
- Prosper mostly gave out loans below $5,000 and required a credit score between 650 and 750.
- Most of the loans obtained were to consolidate other loans and were current, completed, and C-grade.
- The negative correlation between the borrower rate and the loan amount was surprising. The borrower’s interest rate was impacted by the credit grade assigned to their loan. So, irrespective of the loan amount, the risk attached to the loan determines its borrower rate.