What Fruits Are Bad?

Graham Albers
INST414: Data Science Techniques
3 min readOct 12, 2023

Typically, everyone thinks that fruit can be a healthy and good snack but is that really the case? Many people do not realize the amount of sugar in a fruit per serving. This can have certain impacts on people’s health, including their, blood pressure, insulin levels, liver, and weight. The goal is to figure out what fruits have over a 20% sugar-to-calorie percentage. Anything that is over 20% can be identified as a high-sugar fruit and has a surplus of sugars compared to other fruits.

To find our data on fruits the free list of APIs’’s was accessed on GIT hub. From there the website fruityvice.com provides a free access API with about 40 different fruits. In the data set, there was the name of the fruit, the ID of the fruit, the family of fruit, the genus of the fruit, the order of the fruits, and lastly the nutrition facts on the fruit. I decided to parse the file and make it a JSON file to then make a pandas data frame. All of this was done in Python using the requests, JSON, and Panda’s libraries. Then a data frame was created only involving the name and nutrition categories for each fruit in the data frame.

In the graph pictured above the nodes are blue circles labeled as their different types of fruit, While the edges are the pink lines that connect the fruits that have above a 20% sugar-to-calorie percentage. To make this graph the NetworkX library in Python was used. Also, the matplotlib library and NumPy libraries were used to help formulate the data. For a node to be considered important in this case it must have over a 20% sugar-to-calorie percentage.

The three most important nodes are green apples, melons, and grapes since they have the highest sugar-to-calorie percentages of 30.48%, 23.53%, and 23.19%. We can see this from the code provided above where each fruit and its sugar-to-calorie percentage are added to a dictionary and then the three most important nodes are displayed. To clean the data, I used only the features that were necessary for the analysis. That consisted of the name of the fruit and its nutrition facts. When the edges were being created all the fruits were accounted for accurately since the percentages being created were for each individual fruit and its calorie and sugar count.

From the data there were some limitations, the data set only had 44 fruits in it and as you may know, there are many more fruits than that leaving it as somewhat of a limited selection. Looking at the data it can be shown that green apples, melons, and grapes have a very high sugar-to-calorie percentage. But it can also be told there are 14 fruits that have over 20%. Out of all the fruits in the dataset, about 31% of them had a sugar-to-calorie percentage of over 20%. These certain fruits could have a correlation to health problems or anything of that nature since they have such a high sugar content.

Here is a link to my git hub repository with my code and results. https://github.com/Galbers2/INST414/blob/main/Assingment_2.ipynb

--

--