Week 4 — Estimating Preferences By Region Using Yelp Data

Machine Likes It
bbm406f16
Published in
4 min readDec 18, 2016

Hello again!

So far we have decided what data should we use, how can we store it, how can we use it, what are our options in terms of algorithms… And now the most fun part! Trying algorithms on the data and seeing the results!

This week we have tried LDA and NMF on a sample of our data and got some satisfactory results. Here is a NMF vs LDA:

NMF

So first NMF results, after twitching the parameters of our TfidfVectorizer for BoW model we obtained these results with ngram range (1,4).

Topic #0:
great | selection | food | beer selection | beer
Topic #1:
coleslaw | service | customer service | customer | notch customer service
Topic #2:
fish | sandwich | chips | small | fish sandwich
Topic #3:
tuesday thursday | tuesday | night tuesday thursday | night tuesday | fantastic
Topic #4:
mushrooms | canned | pizza | night | place
Topic #5:
amazing | years | stopped | place | sandwiches
Topic #6:
good | wings | quick lunch | nice ndoesnt look like | ndoesnt
Topic #7:
night crowd wasn fault | understaffed light | took hour sandwiches best | monday | monday night
Topic #8:
make | menu | neighborhood | decent | service
Topic #9:
bar | food | bar food | cool bar | great

We used every review as a whole, which was 16 reviews for a test restaurant that is chosen randomly. As you can see, we have some valuable information. But not all of them are useful. Either we need more data or we need to tweak parameters.

So in our second attempt with NMF, we decided to divide every review in to sentences, after all almost every sentence has its own topic, right? With same parameters we obtained following results:

Topic #0:
good | selection | beer | beer selection | just
Topic #1:
area | pittsburgh | pittsburgh area | relocate pittsburgh | relocate pittsburgh area
Topic #2:
chips | came | nice homemade chips | nice homemade | nice
Topic #3:
food | 20 | minutes | fried | menu
Topic #4:
sandwiches | hour | took | bread | sandwiches cut
Topic #5:
place | great place | lunch | place lunch | crowd
Topic #6:
wings | excellent | types | types wings | different
Topic #7:
fish | cheese | sandwich | chicken | buffalo
Topic #8:
great | bar | service | neighborhood | neighborhood bar
Topic #9:
staff | friendly | amazing | great | friendly staff

Surprisingly better results! We can clearly see whats good or important about this restaurant. With more reviews and improvements, we believe we can even achieve better results.

LDA

We also tried LDA for topic modelling in same data with same parameters. After seeing great results with sentence parsing with NMF, we decided to give it a go with both models also in LDA to compare,

No sentence parsing, ngram(1,4):

Topic #0:
great | place | good | bar | food
Topic #1:
took hour | understaffed | reubens ve | understaffed light monday | took hour sandwiches best
Topic #2:
mushrooms | canned | townie bar | canned mushrooms | music
Topic #3:
good | food | fish | great | friendly
Topic #4:
try time | place stick crowd | ordered reuben wanted | today nwe ordered | reuben wanted
Topic #5:
coleslaw | amazing | drink | wings | tuesday thursday
Topic #6:
away | service | nice drink great | notch customer service awesome | notch customer service
Topic #7:
bar | cool bar | cool | menu | typical bar food
Topic #8:
remember staff | average | crispy | place yelp | overall delicious pizza aside
Topic #9:
nice little neighborhood | town biz hotel right | town biz hotel | menu make place worthy | place worthy check

Similar to NMF. Can’t really decide what is good or bad.

Sentence parsing, ngram(1,4):

Topic #0:
special 10 | special | food | sandwiches | type food large
Topic #1:
chips | great | nice homemade chips | nice homemade | friendly
Topic #2:
wings zucchini planks | offering samples drafts selected | planks excellent good | offering samples drafts | zucchini
Topic #3:
minutes | 20 | food | ncheers | canned
Topic #4:
place | crowd | pittsburgh area | don | area
Topic #5:
food | alexion | coleslaw | sandwiches | 00
Topic #6:
good | wings | excellent | selection | beer
Topic #7:
flavor | fish | nice | sandwich | slaw nice good
Topic #8:
area | service reasonable | reasonable prices make | service reasonable prices make | prices make good
Topic #9:
service | drink | nice | nice drink great company | nice drink great

Slightly better than no sentence parsing. But In our opinion NMF did a better job for this test. But it’s too soon to decide what we are going to use for all of our data, we need more test with different parameters.

We will run some tests on cities, more reviewed restaurants, high and low scored restaurants and see the results. Stay tuned for more updates! (Which may come sooner than a week… Or not.)

--

--