Wine Review PT6— EDA — ML

Nelson Punch

Published in

Software-Dev-Explore

3 min readSep 13, 2021

Introduction

We have applied NLP to sommeliers description for each wine in part 5 .

Now we can find out frequency of words for a wine, for a grape and in all descriptions. Here we are going to visualise the frequency of words.

Dataset

Kaggle

We will use winemag-data-130k-v2.csv dataset for machine learning.

Source Code

Code in google colab

Task

Frequency of words for a wine
Frequency of words for a grape
Frequency of words

wordcloud

wordcloud module enable us to visualise frequency of words and weight of words. Here we import the module.

import wordcloud as wc

Frequency of words for a wine

Here we create a function which use wordcloud to generate frequency of words as an image and then show the image. We also pick a wine randomly to see the frequency of words.

The correspond description.

tart snappy lime flesh rind dominate green pineapple pokes crisp acidity underscoring stainless steel fermented

Frequency of words for a grape

Similar to above except we need to group by variety (grape) first.

Group by variety and then we concatenate all descriptions for the variety. Finally, we create a new DataFrame.

We then visualise it.

The correspond description.

comprised rare variety given time ferment age french oak half new ashy red fruit meets mild structure considerable tannic grip tiny amount made abouriou grape found almost exclusively southwest france produces balances acidity juicy red fruits herbal edge light layer tannin structured aftertaste despite proximity bordeaux marmandais managed retain abouriou grape variety part conservatory obscure grape varieties producer made fine fruity smoky attractive tannins swathes juicy black fruits fine

Frequency of words

To find out frequency of words in all descriptions, we can concatenate all descriptions and then use wordcloud to visualise it.

Conclusion

With wordcloud we can visualise frequency of words easily. Furthermore, we can use this method with NLP to find out if a word appear frequently and not the key word that used to describe wine and variety.

If any words that seems to be stop words and appear frequently then we could add it to stop words in NLP so the word will be filter out while NLP is analysing descriptions.

Finally, we have finished exploratory data analysis.

Next we are going to prepare and preprocessing data for training our model.

Part 7