Anemia Prediction Using Machine Learning Techniques


As we all know, the remarkable advances in health industry have led to a significant productions of data in everyday life.

This data requires preprocessing to extract use-full information which can be useful for analysis, prediction, recommendation and decision making. In medical science, disease prediction at the right time is the central problem for professionals for prevention and effective treatment plan.

Anemia is a disease which is caused by the deficiency of red blood cells which unable to deliver the oxygen throughout the body. Sickle Cell Anemia is a type of anemia in which shapes of RBC is like a disc shape which stop the blood flow through RBC. This is caused due to Hemoglobin-S.

Photo by ANIRUDH on Unsplash

In our study , we found out using various classifier algorithm like Random Forest, Decision Tree, Naïve Bayes etc. ,we can predict early stage sickle cell anemia so that patients can take required medicine on time and prevent from anemia. During our study, we got to know that real time dataset of anemic patients is required for the purpose of prediction anemia disease. So we are also planning to use classification algorithms like Random Forest, Naïve Bayes etc. for our proposed methodology

Anemia is a growing problem amongst young children living in rural India. However, there has not previously been a detailed study of the biological etiology of this anemia, exploring the relative contributions of iron, vitamin B12, folate and Vitamin A efficiency, inflammation, hookworm and malaria

Symptoms Of Malaria

Malaria Symptoms


We are following the methodology to implement our idea as follows:

1. Take Input Data

  • First we are collecting dataset.
  • We are providing the dataset in csv format
  • Import the dataset using library

2. Pre-Processing And Cleaning Dataset

  • Describe the dataset to know about the dataset like columns of dataset
  • Check for the null values if present in the dataset
  • Drop the null values if present
  • Clean the dataset like discard the attributes that are not required
  • As our Model is taking only integer data for prediction
  • Preprocess the dataset to convert the categorical data if present
  • Use the required libraries for preparing dataset for prediction 3. Feature Extraction/ Feature Selection
  • After preprocessing is done dataset is ready for feature extraction.
  • As we there lot of information present in the dataset.
  • We have to extract the required attributes for prediction.
  • Apply the efficient feature extraction technique so that only most correlated attributes with the prediction attributes selected.
  • Feature Extraction avoids the over Fitting of our model

Some methods that can be used for Feature Extraction are:

  • I. SelectKBest
  • II. PCA(Principle Componet Analysis)

Here we extracted features that are required for model training.

4. Apply Classification Algorithms

  • After feature extraction now comes to model training.
  • First of all we have divided the dataset into training and testing using method called as train_test_split().
  • Now select the classification algorithm and import it from respective libraries.
  • Algorithms that we are going to use are(as of now): A. Random Forest B. SVM 15 C. Naïve Bayes etc.
  • Apply the algorithms on training dataset for prediction of diagnosis i.e. patient is anemic or not 5. Performance Evaluation:
  • After generating results of training data from our model.
  • We have to evaluate our model’s accuracy that at what extent our model is close to the actual result.
  • For performance evaluation, there is a matrix called as confusion matrix where we can the how much record has been predicted correctly and how much record predicted wrong.
  • Also we can evaluate our model using various scores like accuracy score MSE score, R-square score etc.

Tech Stack That can be Used To Implement the Idea

Technology and Tools that we are going to use in out project:

a) Python — We are using python 3 programming language in our project because python is a very simple as well as powerful language. Python is also the go to language for machine learning projects. Python also has a robust library support for Machine learning.

b) Anaconda/Google colab

This is a Jupyter notebook ide where we can easily run and also see the output of each cell simultaneously. We will use Google colab as it already has many of the required libraries installed.

c) Pandas

This is one of the most important library for data science applications. It is use for cleaning and perfecting our dataset before inserting it in the machine learning model.

d) SciKit

It is a machine learning library containing many models like classification, regression and clustering algorithms. It also has metrics module which is used for checking the accuracy of the models.

e) Matplotlib

It is a library used for data analysis. It is a library used to create various types of graphs. f) Seaborn — It is a library used for creating many types of graphs.


So basically Malaria is a bigger problem that world is suffering from. So By Implementing our idea we can predict the anemia at earlier stage and can save lives.




Love podcasts or audiobooks? Learn on the go with our new app.

An Introduction to Data Analysis and a Few Common Traps to Avoid

Zero to Hero on Hash Tables

Top 10 Knowledge Translation Resources

Further Developing The Analytic Process (TAP)

CyberCity 3D and OpenStreetMap: A Comparison

Setting Up Google Colab for CNN Modeling

SQL Queries for Data Scientists

Making Analytics Easy with Data Science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Raunak Mishra

Raunak Mishra

More from Medium

Here’s a quick way to solve a business problem with python

Essential Libraries To Have In Your Toolbox For Data Science And ML — Series #2 — Pandas

Lets Talk About Real Time Machine Learning

Predicting NBA All Stars with Pyhton