Image for post
Image for post

How to uncover the predictive potential of textual data using topic modeling, word embedding, transfer learning and transformer models with R

Textual data is everywhere: reviews, customer questions, log files, books, transcripts, news articles, files, interview reports … Yet, texts are still (too) little involved in answering analysis questions, in addition to available structured data. In our opinion, this means that some of the possible predictive power is not used and an important part of an explanation is missed. Why is that the case? Textual data, in contrast to structured data from databases and tables, is not ready for usage…


Training Word Embedding models and visualize results
Training Word Embedding models and visualize results

This story is written by Jurriaan Nagelkerke and Wouter van Gils. It is part of our NLP with R series ‘Natural Language Processing for predictive purposes with R’ where we use Topic Modeling, Word Embeddings, Transformers and BERT.

In a sequence of articles we compare different NLP techniques to show you how we get valuable information from unstructured text. About a year ago we gathered reviews on Dutch restaurants. We were wondering whether ‘the wisdom of the crowd’ — reviews from restaurant visitors — could be used to predict which restaurants are most likely to receive a new Michelin-star. Read…


Image for post
Image for post

This blog is about getting corporate identity graphics ready in R using ggplot. Many corporates have decent identity Powerpoint decks and Excel templates available to work with, they might even have developed a PowerBI/Tableau template to fit their corporate identity. Yet, identity templates for R (or other languages used) are often not readily available. Marketing often does not have these tools in scope while developing templates to be used once a new corporate identity has been launched. The default graphics that ggplot provides in R are already of good quality. Sometimes the color palette needs some improvement and occasionally the…


Image for post
Image for post

This story is written by Jurriaan Nagelkerke and Wouter van Gils. It is part of our NLP with R series ‘Natural Language Processing for predictive purposes with R’ where we use Topic Modeling, Word Embeddings, Transformers and BERT.

In a sequence of articles we compare different NLP techniques to show you how we get valuable information from unstructured text. About a year ago we gathered reviews on Dutch restaurants. We were wondering whether ‘the wisdom of the crowd’ — reviews from restaurant visitors — could be used to predict which restaurants are most likely to receive a new Michelin-star. Read…


Image for post
Image for post

This story is written by Jurriaan Nagelkerke and Wouter van Gils. It is part of our NLP with R series ‘Natural Language Processing for predictive purposes with R’ where we use Topic Modeling, Word Embeddings, Transformers and BERT.

In a sequence of articles we compare different NLP techniques to show you how we get valuable information from unstructured text. About a year ago we gathered reviews on Dutch restaurants. We were wondering whether ‘the wisdom of the crowd’ — reviews from restaurant visitors — could be used to predict which restaurants are most likely to receive a new Michelin-star. Read…

Wouter van Gils

Consultant, teacher, data scientist @Cmotions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store