Image for post
Image for post
Photo by Annie Spratt on Unsplash

The need for Aspect-based Similarity

Recommender systems help people find more relevant items. One of the use cases is to help researchers find relevant papers for their work. One way to improve such models would be to use user feedback to update the model. But in cases where user feedback is sparse or unavailable, content based approaches and corresponding document similarity measures are used. Generally, Recommender system recommend a document depending on whether it is similar or dissimilar to the seed document. This similarity assessment neglects the many aspects that can make two document similar. One can even argue that similarity is an ill-defined notion…


Image for post
Image for post
Photo by Glenn Carstens-Peters on Unsplash

One of the primary goals of training NLP models is generalization. Testing in the wild is expensive and does not allow for fast iterations. So the standard paradigm for evaluating models is to use train-validation-test splits to estimate the accuracy of the model. The held-out datasets are often not comprehensive and contain the same biases as the training data. This might result in over-estimating real world performance. Also, aggregating performance into a single statistic makes it difficult to figure out where the model is failing and how to fix it.

Introducing CheckList

This paper proposes a new evaluation methodology and an accompanying…


Using pre-trained language representation models

There are two strategies for applying pre-trained models to tasks — feature-based approaches where the pre-trained language representations are used as additional features for a different model architecture and fine-tuning where the language representations are used as it after fine-tuning all the pre-trained parameters.

Introducing BERT

BERT, which stands for Bidirectional Encoder Representation Transformers is a language representation model that is designed to pre-train deep bidirectional representations from unlabelled text by jointly conditioning on left and right contexts in all layers. …


I recently started exploring Machine Learning for text (so far, I had been working with images) and I was introduced to the Facebook’s StarSpace. For those who are unaware of StarSpace, this is how Facebook Research describes it.

StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems

One of the use cases mentioned in the repository is TagSpace for generating word / tag embeddings and that is what this article is about.

This article is not going to explain what embeddings are and why we need them as I believe…


Deep Learning algorithms are designed to mimic the working of our human brain. We know our brain is a powerful computer. So, an algorithm that mimics such a computer must need a lot of processing power. This is one of the many disadvantages of Deep Learning. A few other disadvantages are:

Large amount of data

Training a Deep Learning model requires a lot of data. But it does not stop here. The distribution of the data (in a classification task) must be uniform for good results. A non-uniform distribution can make the results of the model biased.

Possibility of overfitting

While training a model, there is a…


The deadline for the final phase of GSoC just got over and I’ve made my submission. In this article, I share info on what I did, how I documented my work and what I plan to do in the future.

What did I do?

As mentioned in my earlier articles, in the final phase of GSoC, I focused more on documenting my work and building a smaller model for AutoBound. Why documentation? GSoC is aimed at promoting and encouraging students to contribute to open source projects. So, the project we build during GSoC is open source and you can expect other developers contributing to…


The second evaluation of Google Summer of Code is complete and the results came out a week back. This article explains what I did during the second phase of GSoC building AutoBound.

The main two main tasks I focused on during the second phase of GSoC were

  1. Data Collection
  2. The plugin-server pipeline

Data Collection

The plugin and server ends for data collection were completed during the first phase of GSoC. So, here I will explain the workflow for collecting data I used.

To train the model, I needed high res aerial images and good data. I got them from OpenAerialMap and OpenStreetMap


It has been almost two weeks since coding officially started with Google Summer of code. As mentioned in my previous post, I have been selected to work as a Student Developer for OpenStreetMap through Google Summer of Code.

I have been working on my project and the experience so far has been great. In this post, I’ll be explaining what I’ve done and you can find the code in the development branch of this repo.

The Action Class

First, I had to create an Action class. The methods of this class will be called when an action occurs. For my plugin, I wanted…


Image for post
Image for post
Google Summer of Code

This year, I got my proposal selected for Google Summer of Code 2019. (You can find some details about my proposal here). Since I got the news that my proposal has been selected and I shared this info with people, I started getting three questions:

  • What is Google Summer of Code (GSoC) and is it an internship?
  • What do I need to know to get selected for GSoC?
  • Will you help me with my proposal when I apply for the next year?

In this article, I’ll be addressing these questions one by one followed by some stuff to keep in…


My proposal got accepted for Google Summer of Code, 2019. I will be working as a Student Developer for OpenStreetMap from May 27, 2019, up to August 19, 2019.

The Project — AutoBound

I will be working on building a plugin for JOSM that can automatically identify the rooftops of buildings in a given map area. This will make it easier for contributors of OSM to easily mark buildings and add data.

The tool will have two parts:

  • The Front End — This will be written in Java and will act as an interface between the user and JOSM. The plugin will allow users…

Vishal R

Likes to talk about Machine Learning and plays the Harmonica

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store