Making an AI that can Diagnose your Symptoms!

Aditya Dewan
Analytics Vidhya
Published in
8 min readMay 19, 2021


Being sick sucks.

No, seriously — there’s nothing more frustrating then waking up one morning, sniffling around for 10 seconds, before it hits you. You, my friend, have a cold. 🥺

Well…probably, anyway. Sure, you might have a cold — but, you can’t be 100% sure that you don’t have like, pneumonia or something.

Okay that was an extreme example — but, it proves the point. With the monster that is COVID, hospitals are packed, doctors overwhelmed, and people are stuck inside houses.

So, if it does turn out that an illness has taken your body hostage, there’s virtually no way for you to know. It’s just risky to go out in the first place!

And, this doesn’t just affect developing countries. Over 46 k Americans have died from a lack of doctor access — whether it’s faulty health insurance, physical barriers, or just fear.

That, is insane. We’re living in an age, where you can order pizza without getting off the couch, talk to someone through a bunch of pixels, and have virtually anything delivered to your doorstep in 24 hours.

So if we can use the internet to deliver products…why not healthcare? Why can’t we automate clinical diagnosis?

That is the question I asked myself — as I embarked on a month-long endeavour to solve this problem. The result? A webapp that can somewhat diagnose a patient (given 1–4 symptoms)!

So, without further ado — let’s get into it.

Uh…how do we start?

Before embarking on this quest, we first need a map. Otherwise, we’ll end up being stuck in a random desert where a cough is diagnosed as AIDS, and stomach-ache as a migraine. Not fun. 👀

In a traditional setting, this is how clinical diagnosis works:

If we’re going to emulate this process, we need a way to:

  1. Input the patient’s symptoms,
  2. Inference (predict) a potential disease (based on prior knowledge)
  3. And finally, combine all of this to provide a single diagnosis.

The first part is relatively simple — it’s just getting information from the user. The key challenges here are parts 2 and 3, because…well, we need to make sure that our predicting method actually works. Especially, if we were dealing with real patients!🙈

So…how do we tackle these problems?

The answer, lies in AI — specifically, MLPs.

Using your brain on a computer to diagnose people.

Okay, maybe that was a slight exaggeration. Regardless, that’s what AI is at its core — loosely modelling the brain to find trends within data.

While I won’t be covering the fundamentals in too much depth (here’s an article for more info), there’s still some room for explanation.

MLPs (multi-layer perceptrons) are a type of neural network. As the name suggests, they’re based on top of small building blocks called perceptrons.

Trust me, they sound more complicated than they actually are. A perceptron is just a function — you give it some set of input values, and based on that, you receive an output.

So, a neural network is a bunch of these little functions stacked together, in a sort of monstrous, mega-function. The advantage? The complexity lets you go from 1x + 1 = 2, to diagnosing symptoms!

And that, is why we’re using MLPs. As far as neural networks go, they’re on the simpler side — but, the advantage is enhanced speed and numerical capabilities.

Cool! Now, how does it help us?

For this problem, we want to classify diseases based on symptoms. And since this is a (creatively named) classification problem, we can use the MLP to predict what disease the user most likely has. It’ll never be 100% sure (that’s still impossible), but it can get pretty close!

In total, there are 41 different diseases that we can predict — not too many, but a pretty good start!

So…we’re done?

Not exactly — MLPs need numerical data to work. And, as you’ve probably noticed by now, symptoms are strings (words).

We need a way to convert our users’ symptoms, to actual numbers.

This can be solved pretty simply — we just get a list of common symptoms, take its position n, and encode it as n-1. So, the first symptom would be converted to 0, the second 1, the third 2 — you get the point.

What this means, is that we take a list of symptoms (all_symptoms):

[‘itching’, ‘skin_rash’, ‘continous_sneezing’]

And turn it into a list of numbers:

[0, 1, 2]

Once we’ve gotten our output, we can just convert the predicted disease to a string, using a list called all_diseases, which contains…all diseases. This way, our MLP can work entirely in numbers!

But, there’s (another) problem. MLPs are really picky, meaning that you can’t input different-sized data. As of now, we have 4 symptoms as the limit — though, not every patient will have 4 symptoms. What if you just have a cough? A mild rash? A fever? We can’t force a patient to make up symptoms!

This means that we need to make a slight change — something that lets the user leave a symptom blank, but still doesn’t change the size of the data inputted.

And it turns out, that we can! We just add a “nan” value and replace it with the number 0 — so, whenever the patient leaves a slot blank, we just replace it with that number. Over time, our network learns to ignore these zeros as meaningless noise, excluding them completely from the end diagnosis.

So this:

[‘itching’, ‘skin_rash’, ‘continous_sneezing’, (blank)]

Becomes this:

[1, 2, 3, 0]

Problem solved!

Ultimately, our model has one input layer, one output layer, and three hidden layers (where all of the processing magic happens).

There’s also a dropout layer, letting us turn parts of the network on and off for better training. Check out our diagram below:

(The hidden layers are actually 1000 and 600 nodes, but, that would be a bit too large to show in a diagram)

Boom! We’ve just created a pretty solid network. But, that’s not all.

Right now, our network is basically just a shell. It knows nothing — so, if someone were to actually use it, they would just get a bunch of random diseases. To actually get it to learn, we need to train the model.

Basically, give it a ton of data and allow it to make inferences!

Sending our model to school 🏫

You can’t drive a car without wheels. And, you can’t train a model without data.

For this project, we’re using a database from Kaggle, with existing classes of symptoms and diseases from us to draw from! To make our symptom and disease list, we need to find all the different (unique) diseases in this dataset.

How do we do this? Using Pandas!

(Before you ask — no, not actual pandas. I mean the machine learning library)

Here are the steps we can take to get us to the end lists:

#1 — Shorten the dataset.

There are way, way too many columns in here than the average person would care about. Plus, basically all the columns after symptom #4 are basically blank — so, we’ll just use the first four symptom columns. By the way — that’s why our patient has a limit of 4.

#2 — Split into training/testing data.

After we’ve shortened the dataset, we need to split it into data that the model can train on, and data we can use to test it! We can do this with Scikit Learn (thank you for existing), and then convert all the dataset values to strings (since they’re currently objects).

#3 — Make a (unique) list of all diseases and symptoms.

Now, we can use the .unique method to grab all of the different…well, symptoms and diseases. After adding in the nan value for no symptoms (we talked about this earlier), we’re good to go!

#4 — Convert said list to integers.

This is simple — we remove all underscores and unnecessary spaces, and then find the position of our diseases/symptom inside our master list. So, if I input cough, and cough is second inside the list (not including nan), it’ll be converted to 2. This is combined in a function called data_to_index.

And we’re good to go!

Now, we just need to train our model. We’ll be using the Adam optimizer for this (helping us improve our network’s accuracy), and our MLP ends up with an accuracy of over 92%!

Not bad for an AI, eh?

But, we can do better. We don’t want our bot to live the rest of its sad days on someone’s PC — we actually need to get the AI out into the world!

And that, is what we’re doing in the last step — deploying the model in a web app.

Hello, World!

All things considered, deploying is actually one of the harder parts of ML (yes, including the crazy multivariate calculus 😅). It’s one thing to have a model work in isolation — and another entirely, to get it working in real life.

Here’s a quick diagram on the structure of our webapp:

First, we’re going to add some HTML and CSS. This is going to take in the input, and display the final prediction — so you know, the user actually sees something (kind of important).

Next, we’re going to use Flask to setup a python-based app. This is going to be the meat of our code sandwich — taking the input from the actual page, applying preprocessing, and then running the model on top of everything! It ends off by displaying the prediction on the app with HTML.

Finally, we actually need somewhere to host our app — and Heroku, is perfect for this! We can just upload all the code onto a GitHub branch, and then deploy our app!

And, we’re done! Next steps would be to a) retrain the model, b) make the app look prettier, and c) test it in the real world!

Ultimately, this project is a testament.

A testament, that we do have the power to solve the world’s biggest problems.

No matter who you are, where you’re from — you have the power, to make an impact.

You, can make a dent in our demons.

So — let’s take that first step.

Aditya Dewan

That’s pretty much it from me. I hope you guys liked this, and don’t forget to follow for more :D

See you guys next time! 😉

Code for this article (main branch) and webapp (WebApp branch):

My monthly newsletter:



Aditya Dewan
Analytics Vidhya

Building companies. Machine Learning Specialist Philosophy x Tech.