My First Machine Learning Project

Yash
DataX Journal
Published in
5 min readJul 10, 2022

Just like most of us, the seemingly endless lockdown had me explore something new every other month, in search of “my calling”. I was looking for something technical around the end of September, and I decided to try my hands on a machine learning course.

Python logo with code in the background

The main focus was learning intermediate Python (Pandas, matplotlib, seaborn, etc.) and then study, analyze, and apply a few machine learning models. Here is my experience during the course —

  • We were guided by a professor throughout the course and were explained the various implementations and uses of each algorithm, helping us out in each technical aspect that we lacked in.
  • It was my first time trying out machine learning, and the course helped me realize the potential of machine learning, deep learning and artificial intelligence as a whole.
  • At the end of the course we were required to implement what we had learnt on to a dataset of our choosing. This helped in making sure that we understood and could apply what we had learnt through the course. This is the highlight of this article.

Here I will summarize my experience in parts, followed by my project.

The Basics

In the first month, we were taught intermediate python and important statistics knowledge that we would need getting into machine learning. It also included introducing us to the various layers of artificial intelligence and its applications in real life. Plain and simple. We were taught how to work with pandas, how to work with matplotlib, how to work our way around means, medians, modes, standard deviations and so on. How to read data, true positives, false positives and the like. We were introduced to the world of machine learning, showing us how machine learning, deep learning, and artificial intelligence are not the same, and some cover broader aspects while others are a more specific form of study. We also covered the types of machine learning models, and the various terms that come with them and the data.

Introduction to ML Models

We started how any newcomer would start. We first learnt about how logistic and linear regression worked, and how to apply, study, and get the required code done using python. We were then introduced to and taught to how to work around with models such as Decision Tree, Random Forest, Naive Bayes, KNN, K-Means, and Apriori Algorithm. Side-by-side, we were also taught how to check how well the model is performing, be it through it’s accuracy, recall, precision, drawing ROC curves and checking the AUC (Area Under the Curve), checking the log loss or as in case of regression, checking R² etc.

The Project

At the end of this course, we were to use what we had learnt, how to work with pandas, the different algorithms we learnt, how to read data and so on, apply it on a dataset of our choice that we could either ask the professor to provide or search on our own on sites like kaggle(do check out if you’re looking to make a project of your own), and present our findings to an invited professor.

We were to be divided into teams of 4, and I was selected to lead of the first team. We decided to do our project on a dataset from Kaggle, on the prediction of chances of a heart attack. It was a dataset that sought binary classification, so basically dividing the patients into two classes, based on a high risk of a heart attack and a low risk of a heart attack.

As taught, we first moved forward with data study and preprocessing. So we used matplotlib to draw up graphs and heat maps of the different labels and found in which of them lay the outliers. We then decided to replace the outliers in two methods, first with the median of the label data, and then the mean of the label data, so that we could check which would suit the dataset best and we could have the best results.

When it came to building the models, we decided to use Logistic Regression, Decision Tree, Naive Bayes, KNN and KMeans. I divided the different models among the team members and we got to work. As soon as the work was done,, I compiled the code, finished the documentation and then we started working on the presentation. We presented our findings to the visiting professor, who then counter-questioned us on our findings as well.

We found that the model that suit the dataset the best was logistic regression with the outliers replaced with the median of the data, which gave us the best results in all checks, be it AUC, log loss, accuracy, precision, recall and the such, and so we decided to present that as the model that would be best suited for our dataset.

My Takeaway

While this was not a large-scale project, it definitely was a huge learning experience about something absolutely new to me. It gave me insight as to how to work with data, study it, and apply the information obtained into the code I would write to prepare both supervised and unsupervised machine learning models. It was definitely something that I found very interesting, and through the project I proceeded to choose AI/ML as my specialization in my bachelor’s degree.

Feel free to send me connection request and message at LinkedIn to collaborate on the above mentioned projects or a few tips on writing blogs. I also enjoy graphic designing, video editing, and game development, so we could always have a chat on that!

Stay safe!

--

--