Loan Prediction Problem by Analytics Vidhya

Sachin Kumar
3 min readMay 13, 2019

--

Taken from AV website

New Article — Deep Learning Made Easy: Part 1: Introduction to Neural Networks

This loan prediction problem of Analytics Vidhya is my first ever data science project. Below is the step wise step solution of the problem with which I achieved Rank 960 on the Public Leaderboard in the hackathon by @AnalyticsVidhya Practice Problem: Loan Prediction III

Here is another solution by me — Twitter Sentiment Analysis by AV

Here is the GitHub link of the same — https://github.com/sachink382/Loan-Prediction-Analytics-Vidya/tree/master

Let’s start

Here is the Dataset. We will be doing this problem in two steps.

Step 1 — After loading your data in the RStudio the first step is Data preprocessing. It is a data mining technique that involves transforming raw data into an understandable format. In a typical ML problem, 90% of the time should be spent on Data Preprocessing.

The column ID of training data set is of no use so we have removed it. For XGBoost you have to convert your variables into numeric otherwise you’ll get an error. Similarly for the test set -

Now, let’s deal with the NAs or the missing values in our dataset. If you’ll convert characters into factors then you have to deal with the NAs if they are present. But if you’ll use as.numeric to convert your data into numeric then you don’t have to worry about it. I have shown both steps.

And for the test set also. Now we have almost completed our first step. One last thing which we can do in order to increase the efficiency of the model is Feature Scaling.

Step 2 — Building Models and selecting the best.

We will start with Logarithmic Model. I have shown below the complete code from forming a model to predicting and saving it back in a CSV file.

Our second model is K-Nearest Neighbour or KNN

Our third model is SVM with linear Kernel

The next three models are Naive Bayes, Classification Tree and Random Forest. Till now I got my best results with Random Forest.

And lastly, I applied XGBoost for our loan prediction

Hope this will help the beginners to start their project.

--

--