[Week 2— GourmetNet]

Metehan Yıldırım
bbm406f16
Published in
2 min readDec 11, 2016
Image taken from → http://insights.principa.co.za

Introduction

On this week we worked on grouping our businesses so that we can handle the sparse matrix problem from last week. We tried different approaches using Word2Vec. In this blog we will explain the approaches we have tried.

What Did We Do?

Word2Vec and K-Means Clustering

We first tried to use Google’s pretrained Word2Vec model. Word2Vec is a neural network that finds the semantic similarities between words. The first problem we faced was that the Google corpus was 3.5GB and the dataset was 2.5GB. Therefore some of our computers didn’t have enough RAM for this. Therefore we were able to use only one computer. Later when we have the word vectors we didn’t have to use Word2Vec so this problem evoporated. First we got the tags from every business. Then applying Word2Vec to tags we got our vectors. For every business we calculated the arithmetic mean of tags. So then we had one vector for every business. Then we applied K-Means Cluster to this data. After that we tested our cluster with some words. And it failed to differentiate the categories of restaurants eg:“Chinese” and “American” is in the same cluster. Even when we increased the number K of K-Means Cluster. Therefore we figured out that this approach wasn’t going to work.

Manual Clustering On Cuisines

Upon our closer inspection on the dataset we observed that most restaurant’s tags consisted of their cuisines. So we decided that manually clustering our data on certain cuisines was a better approach. So we did this and divided our data on cuisines. But we thought this still wasn’t enough categorization. So next week we will try to divide each cuisine into more clusters using K-Means.

Here is our project : https://github.com/metehanyildirim/GourmetNet

That’s it for this week. See you next week! By the way here is the Google corpus we are using.

--

--