Case Study: The Summery Nuisance — A Box-Plot Analysis

Manali Shinde
One Datum At A Time
6 min readFeb 5, 2018

--

Hello Readers!

I wanted to present a really simple way of analysing a small dataset, containing only about 45 rows in Microsoft Excel, in Python 3. This is a really great, and visually appealing way of comparing two variables in one dataset, and it let’s you view all the data at a glance, with the median and standard deviation. Thus, I hope you find this analysis fun and perhaps even insightful! Let’s begin.

Executive Summary

The study used in this analysis was done by Lefèvre, T. et al, and was featured in the scientific journal “PLos One”. Called Beer Consumption Increases Human Attractiveness to Malaria Mosquitoes, researchers aimed to study whether beer or water consumption increases mosquito attraction.

Researchers gave one group of (human) subjects beer to drink, while the others had water. The subjects were then taken into a controlled environment and mosquitos were released to measure how many would be attracted individuals who drank beer, and how many to water. This experiment can be useful during the summer, when mosquito and outside BBQ season is at it’s peak.

It is possible to find the full summary of the article and the experiment in John Rauser’s keynote, given at the Strata Conference. Rauser makes a great point at asking when and how can we assume that the results we find are significant. In the analysis, we cannot necessarily assume that the conclusions that the data comes to is significant. Even though the data points to one particular conclusion. We will be using the mean, median, and standard deviation to see which group of individuals, in this experiment, mosquitos were likely to target.

Libraries

For this analysis, I used a couple of different Python libraries. Starting with the pandas for simple dataframe manipulation and since I am working with numerical data, it offers the use of various data structures and operations.

The next important library I imported was Seaborn and Matplotlib. Seaborn is used for plotting the data in various way, as is matplotlib. However, I found that for the purpose of this analysis, seaborn was not only simpler to used, but it is visually appealing as well. For the time being, I found that I was having issues with the code for matplotlib, but I’m sure that is subject to change in due time.

The final library that I found to be most important was the statistics one. Importing statistics allowed me to use Python’s built-in stats library for efficient data handling and interpretation. The statistics library allows you to use the mean, median, and standard deviation quickly.

Finally, I told pandas to read the data that was stored in a CSV file, and proceeded with my analysis.

The Data at a Glance and Using Seaborn

Table 1: The Data Head and Seaborn Code

The data contained two variables: Responses, and Treatment. Overall there were 44 test subject that were either given a Beer or Water treatment (at random). Mosquito attraction was then measured by how many mosquitos would fly to and bite either the Beer individual or Water individuals.

Using seaborn, I first used the function sns.set() to set the boxplot diagram to show ticks. Then, I used sns.boxplot() to tell seaborn to create a box and whisker plot for the data in question. The x-axis was set for Treatment, and y was set for Response. The great thing about seaborn is that you can make your graph visually appealing by adding a colour pallet.

The Comparison

Table 2: Mosquito Response Base on Human Liquid Intake

This graph shows us a comparison of Beer versus Water treatment individuals. Ranging from 12 to 30 mosquitos, using a box and whisker plot allows us to see that individuals that had the beer treatment were more attractive to mosquitos than the individuals who had the water treatment.

Although we cannot assume that these results are significant, the table above shows us that the median of the beer measurements, 24 mosquitos, corresponds with the higher interquartile of the water treatment group. The median of the water treatment group, 20 mosquitos, corresponds to the lower interquartile of the beer treatment group. This shows us that a higher number of mosquitos targeted the beer group, this is because the lower interquartile for that group is still higher that the median and maximum number of mosquitos in the water group. Therefore, if we interpret this table alone, it would support the argument that beer intake will increase the amount of mosquitos that are attracted to an individual.

Statistical Measurements

Statistical measurements correspond to the patterns we see in Table 1. On average, around 24 mosquitos are attracted to beer individuals, while the average number of mosquitos attracted to water treatment individuals is 19. As we saw in the box and whisker plot, the medians are 24 and 20 for the beer and water treatments. The difference is around 4 mosquitos, which corresponds to the standard deviation of 4 mosquitos, for both treatment variables.

Additional Questions That Can be Asked

One of the most important questions to ask is whether these results are significant, do they mean something? This can be solved by random shuffling the sample labels, and plotting the data. For the purpose of this case study, we will not be diving into random shuffling, although, for future data this will be done. For a full random shuffling result, it is possible to view the Strata Conference video, which found these results to be significant after 50, 000 iterations.

While this experiment took place in a controlled environment indoors, there are multiple external factors that affect mosquito behaviour. For instance, what would happen t mosquito behaviour in an outdoor environment? It should be important to observe mosquitos in a controlled outdoor environment where they may be a mix of beer and water individuals. Observing mosquito behaviour in an outdoor environment where the subjects are in a mixed group may give us an insight of what would occur in an typical summer situation.

Also, as we saw, mosquitos seem to be attracted to alcoholic beverages, however, it would should be explored whether these insects are attracted to the alcohol content or the specific ingredients in beer. Additional experiments could measure mosquito attraction to wine, whiskey, or any similar beverages and compare that with the beer and water subjects to observe what exactly is attracting mosquitos.

Conclusion

In conclusion, this very simple analysis gave us a little insight into mosquito behaviour. According to Table 1, while it does seem that beer intake increases mosquito attraction, additional research could be done on other beverages, and to test the significance of this data. Perhaps next time you have a backyard BBQ, it could be worthwhile to compare who has the most mosquito bites — beer drinkers or none beer drinkers.

Hope you found this case studying interesting! Look out for similar case studies in the near future. Thank you for reading, and be sure to let me know what you think!

--

--

Manali Shinde
One Datum At A Time

A health informatician and aspiring health data analyst. I am a photographer, writer, dancer, and public health advocate. Join me on my journey!