Mushroom Classification using Visualization

Prabhath R
Analytics Vidhya
Published in
5 min readOct 25, 2019

This article is in continuation with the article written by my friend Shravan (https://medium.com/@shravan.adulapuram/mushroom-classification-edible-or-poisonous-9327a56c6fc9) where in he introduced us to the problem and set out to bust the common myths prevailing regarding the edibility of mushrooms. With the myths being busted we solved one part of the problem i.e. to stop people from identifying mushroom edibility based on simple features. But clearly that is not enough. We need to make a framework using which we can predict the mushroom class/edibility based on given features. Normally this is done through modelling using logistic regression, random forest etc. But here we are going to try to solve the problem without using any of the algorithms, but rather by visualization and simple data exploration.

Bi-variate Analysis

We start by examining the Chi square statistic values for all the mushroom features w.r.t. class/edibility of the mushroom in order to find out which features are important.

Chi Square and P values for different mushroom features
Chi Square and P values for different mushroom features

From the table we can see that all the features are significant (since p<0.05). But in order to understand how significant each are we need to take a look at bi-variate cross tabulated plots. While its incorrect to compare Chi square statistic values(to determine which feature is more significant in determining the mushroom class) it was observed that the features having high Chi square statistic had more sub features that would take only one of the mushroom class. Some of the cross tabulated plots are shown below.

Odor vs Class of mushroom
Spore print color vs mushroom class
Gill color vs mushroom class
Ring type vs mushroom class
Stalk shape vs mushroom class

We can see that in the first four graphs certain feature value could identify mushroom class without ambiguity, like if mushroom has foul odor it is poisonous. The stalk shape vs class graph is shown just to demonstrate why it has a lower Chi square statistic than the other 4.

Simple predictor

With this we can make a basic mushroom class predictor:

  1. Set of features which denote mushroom is poisonous:
    a) Buff and green gill color.
    b) Green spore print color.
    c) Buff, cinnamon and yellow stalk color.
    d) Pungent, foul, creosote, fishy, musty and spicy odor.
    e) Mushroom with no rings.
    f) None ring type.
  2. Set of features which denote mushroom is edible:
    a) Sunken cap shape.
    b) Purple and green cap color.
    c) Red and orange gill color.
    d) Purple, orange, yellow and buff spore print color.
    e) Rooted stalk root
    f) Grey, red and orange stalk colors.
    g) Almond and anise mushroom odor.
    h) Mushroom in waste habitats.
    i) Numerous and abundant population characteristic.

Tri-variate analysis

But the mushroom class can’t be predicted in all other cases. In order to solve this we will look into tri-variate analysis, where we tried to compare mushroom class with 2 features. Using a loop all possible graphs were generated (all possible combination of features vs class) and a few of the significant ones are shown below.

Odor and spore print color vs mushroom class
Odor and gill attachment vs mushroom class
Veil color and gill attachment vs mushroom class

The value above the mixed marker denotes the probability of mushroom being edible for that feature combination. From the above 3 graphs it can be seen that except in one combination (none odor & white spore print color in graph1 , none odor and free gill attachment in graph 2 , free gill attachment and white veil color in graph 3), we are able to predict mushroom class without any problem. But this is still not enough. So we will take one of this case ( odor as none and spore print color as white since in all other possible combinations of odor and spore print color there is no ambiguity) and try to further break it down.

In order to do that we sliced the dataframe to contain only the rows having odor as none and white spore print color. For this dataframe we again plotted all possible combination of features vs class. One of the resultant graph is shown below.

Mushroom class distribution with none odor and white spore print color

From the above graph we can infer that if we know the population and habitat properties of mushroom we can predict the mushroom class when odor is none and spore print color is white.

Final predictor

By combining all the results from above we get the the final predictor framework:

  1. If odor is almond or anise, then mushroom is edible.
  2. If odor is pungent, foul, spicy, musty, fishy, creosote then mushroom is poisonous.
  3. If odor is none and spore print color is green, then mushroom is poisonous.
  4. If odor is none and spore print color is white, then we go to habitat and population properties.
  5. If population is numerous, scattered or solitary, then mushroom is edible.
  6. If population is several and habitat is leaves or path, then mushroom is edible.
  7. If population is clustered and habitat is waste, then mushroom is edible.
  8. If population is clustered and habitat is leaves, then mushroom is poisonous.
  9. In all other cases of odor being none then mushroom is edible.

It should be noted that this is just one set of features which can be used to predict mushroom class. Its possible that we can get other combinations of features using the other graphs from the beginning of the tri-variate analysis.

And there we have it. A framework to predict mushroom class without modelling. The final result actually looks like the output of a classification tree but it was arrived at with some simple visualization and data exploration.

The above process was done using Jupyter notebook with python. The code can be found at: https://github.com/prabhathur/Data-Science/tree/master/Mushroom%20Classification.

--

--

Prabhath R
Analytics Vidhya

Currently pursuing Post Graduate Program in Data Science from Praxis Business School, Bangalore