This is a Lasso; it is used to pick and capture animals. As a non-native English speaker, my first exposure to this word is in supervised learning. In this LASSO data science tutorial, we discuss the strengths of the Lasso logistic regression by stepping through how to apply this useful statistical method for classification problems in R and how the Lasso can be “similarly” used to pick and select input variables that are relevant to the classification problem at hand.

Data analysts and data scientists use different regression methods for different kinds of analytics problems. From the simplest ones to…

In this post, we would go through the steps to plot pie charts on a world map, just like the one below.

Plot showing the leading causes of death in the year 2014 for various countries

This image probably scared you as much as it did to me when I realized I need to create something the same as this. “How am I supposed to do this thing when I’m not even good in R programming, in fact, a newbie?” You’re probably asking this to yourself right now, or you already did. I feel you, but I got you. …

Which advertisements have been effective?

“Half of my advertising dollars are wasted, the problem is that I don’t know which half” — John Wanamaker.

With marketing mix modeling (MMM), analysts attempt to answer causal questions like “how does TV spend drive my sales? How much should I spend on TV next year?”

In research, the best practice in addressing causal questions is to use randomized experiments. However, this is not practical for companies because the advertisement is either on or off for the population at any moment of time. In other words, with above-the-line, non-targeted campaigns, it is not possible to have a control vs…

Drawing inferences from A/B tests is an integral job to many data scientists. Often, we hear about the frequentist (classical) approach, where we specify the alpha and beta rates and see if we can reject the null hypothesis in favor of the alternate hypothesis. On the other hand, Bayesian inference uses Bayes’ Theorem to update the probability for a hypothesis to be true, as more evidence becomes available.

In this blog post, we are going to use R to follow the example in [1] and extend it with a sensitivity analysis to observe the impact of tweaking the priors on…

As Artificial Intelligence continues to push its boundaries on cognition, it takes on a challenge that we humans do so naturally — to understand and respond using natural language. Human conversations are incredibly rich in content. The foundations of the information carried across is laid upon the words at face value, tempered by the prosodic features like tone, pitch and volume, the power difference between the two speakers, in addition to the emotions and attitudinal disposition hinted through facial expression, eye contact, body language and even the time delay of the response. …

In this blog post, we explore two sets of emotion combinations using word2vec. Specifically, one posited by Robert Plutchik in 1980 and the other popular media chart featured in using characters from Inside Out. We are limiting the scope to only dyads, i.e. the combination of two basic emotions that make up a more complex emotion.

Just as blue and red gives purple; joy and surprise gives delight.

In this review, we explore various distributed representations of anything we find on the Internet — words, paragraphs, people, photographs. These representations can be used for a variety of purposes as illustrated below. We try to select subjects that seem disparate, instead of providing a comprehensive review of all applications of distributed representation.

Input: Model purpose
Word vectors: Sentiment analysis
Paragraphs vectors: Clustering paragraphs
People vectors (Wiki articles): Comparisons
Photos and Words vectors: Photographs retrieval

Excited? I am! Let’s jump in.

Distributed representation of words

This is where the story begins: the idea of representing some qualitative concept (e.g. words)…

CNN has been successful in various text classification tasks. In [1], the author showed that a simple CNN with little hyperparameter tuning and static vectors achieves excellent results on multiple benchmarks — improving upon the state of the art on 4 out of 7 tasks.

However, when learning to apply CNN on word embeddings, keeping track of the dimensions of the matrices can be confusing. The aim of this short post is to simply to keep track of these dimensions and understand how CNN works for text classification. We would use a one-layer CNN on a 7-word sentence, with word…


Word embeddings are commonly used in many Natural Language Processing (NLP) tasks because they are found to be useful representations of words and often lead to better performance in the various tasks performed. Given its widespread use, this post seeks to introduce the concept of word embeddings to the prospective NLP practitioner.

Word embeddings allow words to be represented by a series of numbers — which we would refer to as real-valued vectors from now on. For example, the following phrase can be represented by a series of vectors, each vector having a dimension of 2.

Joshua Kim

PhD Candidate — NLP; Founder of

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store