Recommender systems help people find more relevant items. One of the use cases is to help researchers find relevant papers for their work. One way to improve such models would be to use user feedback to update the model. But in cases where user feedback is sparse or unavailable, content based approaches and corresponding document similarity measures are used. Generally, Recommender system recommend a document depending on whether it is similar or dissimilar to the seed document. This similarity assessment neglects the many aspects that can make two document similar. One can even argue that similarity is an ill-defined notion…
One of the primary goals of training NLP models is generalization. Testing in the wild is expensive and does not allow for fast iterations. So the standard paradigm for evaluating models is to use train-validation-test splits to estimate the accuracy of the model. The held-out datasets are often not comprehensive and contain the same biases as the training data. This might result in over-estimating real world performance. Also, aggregating performance into a single statistic makes it difficult to figure out where the model is failing and how to fix it.
This paper proposes a new evaluation methodology and an accompanying…
There are two strategies for applying pre-trained models to tasks — feature-based approaches where the pre-trained language representations are used as additional features for a different model architecture and fine-tuning where the language representations are used as it after fine-tuning all the pre-trained parameters.
BERT, which stands for Bidirectional Encoder Representation Transformers is a language representation model that is designed to pre-train deep bidirectional representations from unlabelled text by jointly conditioning on left and right contexts in all layers. …
I recently started exploring Machine Learning for text (so far, I had been working with images) and I was introduced to the Facebook’s StarSpace. For those who are unaware of StarSpace, this is how Facebook Research describes it.
StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems
One of the use cases mentioned in the repository is TagSpace for generating word / tag embeddings and that is what this article is about.
This article is not going to explain what embeddings are and why we need them as I believe…
Deep Learning algorithms are designed to mimic the working of our human brain. We know our brain is a powerful computer. So, an algorithm that mimics such a computer must need a lot of processing power. This is one of the many disadvantages of Deep Learning. A few other disadvantages are:
Training a Deep Learning model requires a lot of data. But it does not stop here. The distribution of the data (in a classification task) must be uniform for good results. A non-uniform distribution can make the results of the model biased.
While training a model, there is a…
The deadline for the final phase of GSoC just got over and I’ve made my submission. In this article, I share info on what I did, how I documented my work and what I plan to do in the future.
As mentioned in my earlier articles, in the final phase of GSoC, I focused more on documenting my work and building a smaller model for AutoBound. Why documentation? GSoC is aimed at promoting and encouraging students to contribute to open source projects. So, the project we build during GSoC is open source and you can expect other developers contributing to…
The second evaluation of Google Summer of Code is complete and the results came out a week back. This article explains what I did during the second phase of GSoC building AutoBound.
The main two main tasks I focused on during the second phase of GSoC were
The plugin and server ends for data collection were completed during the first phase of GSoC. So, here I will explain the workflow for collecting data I used.
To train the model, I needed high res aerial images and good data. I got them from OpenAerialMap and OpenStreetMap…
It has been almost two weeks since coding officially started with Google Summer of code. As mentioned in my previous post, I have been selected to work as a Student Developer for OpenStreetMap through Google Summer of Code.
I have been working on my project and the experience so far has been great. In this post, I’ll be explaining what I’ve done and you can find the code in the development branch of this repo.
First, I had to create an Action class. The methods of this class will be called when an action occurs. For my plugin, I wanted…
This year, I got my proposal selected for Google Summer of Code 2019. (You can find some details about my proposal here). Since I got the news that my proposal has been selected and I shared this info with people, I started getting three questions:
In this article, I’ll be addressing these questions one by one followed by some stuff to keep in…
My proposal got accepted for Google Summer of Code, 2019. I will be working as a Student Developer for OpenStreetMap from May 27, 2019, up to August 19, 2019.
I will be working on building a plugin for JOSM that can automatically identify the rooftops of buildings in a given map area. This will make it easier for contributors of OSM to easily mark buildings and add data.
The tool will have two parts:
Likes to talk about Machine Learning and plays the Harmonica