Uncharted: Yelp Dataset Edition

Michael Chen
2 min readFeb 27, 2016

--

Last weekend of February, late Friday night: I’m at VTHacks and I think to myself, let’s apply what I learned in Economics and econometrics in combination with my very basic understanding of MapReduce and machine learning to create some models and crunch the Yelp Academic Dataset (alternate link). My goal was to have a project that has two parts. Build a model for the backend and then test the model/backend from iOS. I did half of the the official Apple iOS tutorial. I thought, “Alright, time to do the real challenge. Build the model. Apply my knowledge. I’ll come back to finishing the iOS later tomorrow.”

Saturday 1:50pm: I realize I have no idea what I’m doing. The examples I learned in econometrics have very contained datasets. The Yelp dataset is very… not contained. Lots of collinearity. Lots of omitted variables most likely. I also forgot a lot of the alternate methods to alleviate some of the problems caused by collinearity and omitted variables, only remembering the terminology. For example, I recall you use 2SLS to estimate instrumental variables. What does that mean? I have no idea. I don’t have my econometrics textbook to go look. So I went down a rabbit hole instead. Apparently through the demos and examples, I’m supposed to throw the dataset into Amazon EMR and let the magic happen. Something to do with mrjob

Sunday 12:35am: So I went back to finishing the iOS app. The whole build a model endeavor? I don’t think so… I’m going to need to do more research before I can make anything useful. I got to the point where I understood on the surface level what the mrjob package does (well explained here). Make a python script with it, run it with AWS EMR, watch magic happen. How do I make the script? Not sure. That’s for another day. Anyways, onwards to the good news. I’m understanding Swift a bit more and have gained quite a bit of comfort working in Xcode. Highly recommend the official Apple tutorial linked above. Maybe I can even get to the part where I call the Yelp API and at least use something related to Yelp…

Sunday 2:30pm: I finished the iOS app! Woo. I barely even began making my ‘model’. Woo! Why am I cheering for both? It’s because I didn’t know what to do. More importantly, I kept my head in the game. I’m not saying this was a huge feat. I’m simply glad I stayed tenacious. At least I learned that there is a package for MapReduce called mrjob. I also learned what AWS EMR is.

What will you learn over a weekend? Dare to try something new?

Special thanks to my buddies Julia and Alkesh for stopping by to say hi! Go Hokies :p

--

--

Michael Chen

ML@ROBLOX — Trying to make some sense in a hectic world