Wine Review PT6— EDA — ML
Introduction
We have applied NLP to sommeliers description for each wine in part 5 .
Now we can find out frequency of words for a wine, for a grape and in all descriptions. Here we are going to visualise the frequency of words.
Dataset
We will use winemag-data-130k-v2.csv dataset for machine learning.
Source Code
Task
- Frequency of words for a wine
- Frequency of words for a grape
- Frequency of words
wordcloud
wordcloud module enable us to visualise frequency of words and weight of words. Here we import the module.
import wordcloud as wc
Frequency of words for a wine
Here we create a function which use wordcloud to generate frequency of words as an image and then show the image. We also pick a wine randomly to see the frequency of words.
The correspond description.
tart snappy lime flesh rind dominate green pineapple pokes crisp acidity underscoring stainless steel fermented
Frequency of words for a grape
Similar to above except we need to group by variety (grape) first.
Group by variety and then we concatenate all descriptions for the variety. Finally, we create a new DataFrame.
We then visualise it.
The correspond description.
comprised rare variety given time ferment age french oak half new ashy red fruit meets mild structure considerable tannic grip tiny amount made abouriou grape found almost exclusively southwest france produces balances acidity juicy red fruits herbal edge light layer tannin structured aftertaste despite proximity bordeaux marmandais managed retain abouriou grape variety part conservatory obscure grape varieties producer made fine fruity smoky attractive tannins swathes juicy black fruits fine
Frequency of words
To find out frequency of words in all descriptions, we can concatenate all descriptions and then use wordcloud to visualise it.
Conclusion
With wordcloud we can visualise frequency of words easily. Furthermore, we can use this method with NLP to find out if a word appear frequently and not the key word that used to describe wine and variety.
If any words that seems to be stop words and appear frequently then we could add it to stop words in NLP so the word will be filter out while NLP is analysing descriptions.
Finally, we have finished exploratory data analysis.
Next we are going to prepare and preprocessing data for training our model.