Chicago has recently had troubles controlling the spread of West Nile Virus. As a result, traps were set up to capture mosquitos. These traps are frequently checked to see if any of the captured mosquitos carry the virus. Knowing which traps are more likely to have West Nile present is important as it provides insight to the city as to where the city is best served to spray for the eradication of mosquitos. Spraying can have adverse affects on the environment and it is also very expensive, thus it’s necessary to target specific areas and at specific times.
The chart above provides a clear illustration of the time of year mosquito populations will be largest. This is not in anyways surprising but does provide confirmation to common beliefs.
The chart below illustrates the number of mosquitos found in traps that contain West Nile Virus grouped by species. Not only is volume important but the percentage of mosquitos that carry the virus by species is important as well. Knowing which species are more likely to carry the virus will be useful if the species tend to exist in different areas.
The major features of this dataset however are the weather conditions. Time of sunrise and sunset, windspeed, temperature, precipitation, dew point and many other weather related factors tell us a lot about the prevalence of the virus. Using these weather features as well as others such as the land area contributed significantly to our models.
The models were built using gradient boosted trees as well as random forests. By doing so, I was able to predict 0.69 accuracy score as well as a 0.72 AUC score.
The final visual below is a heat map showing the zip codes for which we predict the presence of the virus. The darker the area, the greater predicted chance for the occurrence of West Nile Virus.