My Machine Learning Journey

Oluwabukunmi Ige
2 min readJul 2, 2019

--

This will be a series of blog posts where I will be implementing all the machine learning algorithms I have learnt so far. Each blog post will be peculiar to different algorithms and I will preprocessing the datasets used to ensure each algorithm used performs well enough. This series will be focused on Supervised Learning, both classification and regression problems will be covered. An overview of the datasets that will be using during the course of this series is given below:

Classification:
For the classification problem, the dataset that will be used is the loan data from the lending club. The lending club is a US peer-peer lending company that enables borrowers to create unsecured personal loans between $1,000 and $40,000. Our goal is to predict if a customer will pay back his/her loan.
The data is from 2007–2010. You can access the data set here

Regression
For the regression problem, the dataset that will be used is the California Housing Prices data set. The problem is to determine house prices from the features given in the data set.

Content
The data pertains to the houses found in a given California district and some summary stats about them based on the 1990 census data.
Columns
-
longitude
-latitude
-housing_median_age
-total_rooms
-total_bedrooms
-population
-households
-median_income
-median_house_value: This is our target variable
-ocean_proximity.

This series will entail comparing all the machine learning algorithms I have learnt on the regression and classification datasets and see which of them performs the best.

The first of the series will be using Logistic Regression to solve the Liver disease classification problem. The first blog post can be accessed here.

Thank you and I can’t wait to get started.

--

--