The beautiful juxtaposition of two branches of a tree. Source: Pexels

Collocations are phrases or expressions containing multiple words, that are highly likely to co-occur. For example — ‘social media’, ‘school holiday’, ‘machine learning’, ‘Universal Studios Singapore’, etc.

Why do you need Collocations?

Imagine, having a requirement wherein you want to understand the text reviews left by your customers. You want to understand the behavioural insights like who are your customers, how many of them visit your place, what are they interested in, what do they buy, what activities do they engage with, etc.

For more simplicity, let’s consider that you have a restaurant and you have several thousand reviews. …


Photo by Taras Shypka on Unsplash

In my previous blog, I discussed how I landed up interning at Dentsu. I also discussed that I worked on scouting and building a POC for a cloud agnostic, open source API management tool/platform which could help in setting up API design, gateway, store, and analytics. In this blog, I will be jotting down my work in much more detail.

We will be exploring four API Management platforms, namely:

Our main requirements which will help us in evaluating an API Management tool are:

  1. Documenting our APIs
  2. Building an API Gateway
  3. Securing our APIs
  4. Versioning APIs

I…


Predicting Ted Talk Ratings

Photo by Hermes Rivera on Unsplash

This blog is in continuation to my NLP blog series. In the previous blogs, I discussed data pre-processing steps in R and recognizing emotions present in ted talks. In this blog, I am going to predict the ratings of the ted talks given by viewers. This would require a multi-class classification and quite a bit of data cleaning and preprocessing. We will discuss each step in detail below.

So, let’s dive in!


This post is in continuation with my NLP blog series. You might want to checkout my previous blog in which I discussed data pre-processing in R. In this blog, I will determine the emotions that are associated with the Ted Talks. At the end, I will compute a HeatMap of emotions and talks to aid in our visualization.

So, without further ado, let’s dive in!

As I have already discussed detailed data pre-processing steps in my last blog post, I will assume that you have already prepared a Document Term Matrix.

We will begin by importing General Inquirer Category Listings


I have recently got my hands dirty with Natural Language Processing (NLP). I know, it’s a little late to the party but I am at least in the party!

To start with a general overview, I implemented quite a few tasks related to NLP including Text Classification, Document Similarity, Part-of-Speech (POS) Tagging, Emotion Recognition, etc. These tasks were made possible by implementing text pre-processing (noise removal, stemming) and text to features (TF-IDF, N-Grams, Topic Modeling, etc). I implemented these in both R and Python. So, I will try to jot down my experiences in both of these environments. …


I have recently shifted gears in my life. A shift to academics after spending quite sometime in the industry has been equally exciting and challenging at the same time. Even with all the diversity in the cohort of my own program (in terms of work experience and country of origin) along with a campus hustling with activities all day, I wanted to explore the communities ingrained deep within the culture of Singapore. A natural choice for me, was to look for meetups.

After subscribing to quite a few meetups of my interest, I finally got a notification of DataKind Singapore


Machine Learning Crash Course

I came to know about Google’s Machine Learning Crash Course (MLCC) from Sundar Pichai’s tweet.

I then enquired about it with some close acquaintances working in Google. I was soon pretty convinced of pursuing this course, after their good words about it and my own research on the course content. This post is going to be an account of my learnings from MLCC. I will structure the learnings in such a way that it will look more like a review. I will also include what I really liked about the course and things which I think they can possibly…


At Truebil, I was fortunate enough to be given an opportunity to solve a unique engineering problem. We had already outsourced CRM development to a third party but I had to integrate the data flow to CRM from our product and back. I mentioned earlier that I was given a unique engineering problem because of the challenges it posed. The challenge didn’t lie in the CRM integration alone, but the fact that Truebil has umpteen number of in-house products which spawn data close to about 1 Million data streams per hour. Besides building a bastion of such magnitude, I knew…

Shubhanshu Gupta

Pursuing MS in Data Science from NUS. Worked for 3 years as a Software Development Engineer in Early & Mid age Startups. Read more: https://shubhanshugupta.com/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store