Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Follow publication

Dengue Fever Prediction

Jack Ross
Analytics Vidhya
Published in
5 min readMar 6, 2020

Subscribe to my weekly newsletter here! ✍️✉️

Dengue viruses are spread to people through the bite of an infected Aedes species (Ae. aegypti or Ae. albopictus) mosquito. Dengue is common in more than 100 countries around the world. Forty percent of the world’s population, about 3 billion people, live in areas with a risk of dengue. Dengue is often a leading cause of illness in these areas.

I’ll be using data from San Juan, Puerto Rico and Iquitos, Peru to predict the total cases of dengue fever infections for each week. Let’s start out by looking at the total cases of dengue plotted against a time series.

As we can see above, we have 18 years worth of data for San Juan (1990–2007) but only 10 years for Iquitos (2000–2009). To combat this, I split the data into 2 groups (after splitting the training data into a validation set to avoid leakage) based on which city the data belonged to. It’s also hard to see any real correlation on the plot above so I engineered a “month” feature in order to get a better understanding of when infections are most likely to occur. This feature happened to be the most important of any feature in the San Juan data as shown in the plot further down.

The evaluation metric I’ll be using for my models is mean absolute error.

I’ll be using this as my evaluation metric because it penalizes outlier values less harshly than other metrics like mean squared error (MSE). This is beneficial because we see large spikes in the number of infections in the plot above and we want to be able to anticipate these spikes as well as possible.

This plot gives us a better understanding of our data and shows that infections in San Juan start to rise around July and decline starting in November, while in Iquitos we see infections begin to rise in August…

Create an account to read the full story.

The author made this story available to Medium members only.
If you’re new to Medium, create a new account to read this story on us.

Or, continue in mobile web

Already have an account? Sign in

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Jack Ross
Jack Ross

Written by Jack Ross

Data Engineer • YouTuber • Digital Nomad. Leveraging the internet to create a life with more freedom. 💻✍️🎥 Subscribe: www.TheJailbreak.io

No responses yet

Write a response