Our learnings from Huddle on Machine Learning

Zomato
Zomato Technology

--

By Naresh Mehta

This post was originally published on Zomato Blog on November 08, 2017.

Last month, we hosted the 4th edition of the Huddle, by Zomato — this meetup was focused on Machine Learning (ML). The topic was chosen keeping in mind the intense discourse this theme beckons, as well as our own aggressive focus on ML for the past 18 months.

The gathering consisted of engineers, data scientists and CTOs from several leading technology and product companies such as — Mobikwik, Times Internet, Policy Bazaar, Snapdeal, Grofers, Shuttl, and BabyChakra among others. The sessions by the teams at Hike Messenger, Delhivery and Zomato ensured that the evening was an insightful one, ripe with engaging discussions during the Q&A and open knowledge sharing post the talks.

As always, we are sharing below some of the key learnings from the sessions, hope this is relevant for other ML enthusiasts -

The 3 ingredients of personalization
Debdoot Mukherjee — Director, Data Science at Hike opened the evening’s session with a talk on product personalization. He explained the role and interaction of three key ingredients — Content, Context and User — to get personalization right.

Debdoot highlighted that a ‘cold start problem’ can be solved by starting with popularity and gradually introducing relevancy as we learn more and more about the user. He further emphasized on the importance of ‘explain-ability’ of recommendations i.e. calling out why we are recommending what we are recommending to a user; which can be taken a step further by introducing manual feedback. Below is an example of a popular e-commerce portal, following the ‘explain-ability’ mantra.

Hike Presentation

Logistics optimization and reducing mis-routes
Dr. Kabir Rustogi and Rahul Kumar Singla from the Data Science team at Delhivery led the second session on logistics — which they highlighted was a billion dollar challenge for the Indian economy today!

Kabir pointed out that to solve any vehicle routing, facility location or network design problem, one needs to know exactly where their customers are located. However, with addresses coming in all shapes, sizes and threats (as seen in the hilarious image below), finding the optimal shipping route is a big challenge that logistics companies face today.

Delhivery presentation

To solve for the above, Delhivery moved from heuristic rules of address parsing to a smart Machine Learning based approach, significantly increasing the number of localities that the product can predict accurate geocodes for.

The latter approach not only parses, but also fixes typos/errors in addresses — using a proprietary generative algorithm based on ‘probabilistic graph models’ and phonetics-based fuzzy matching. Besides allowing Delhivery to do rooftop-level geocoding for addresses, this approach also enables them to accurately identify polygon boundaries for each of their serviceable localities, all of this by leveraging their in-house historical data of deliveries (~150mn) and location tracking data of delivery fleet.

Solving the image aesthetics problem
ML team from Zomato spoke about building an ensemble model using deep and feature-based learning to assess image aesthetics of photos uploaded by users on Zomato. Showcasing only high-quality, rich content from the millions of pictorial UGC added every month is critical, not only for user experience, but also for ensuring high click-through rates on the platform.

They explained how the aesthetics problem was initially set up as a classification problem with historical manual labelling of images serving as training data. However, the classic AlexNet structure which works very well for classification of photos into food shots vs. ambience shots, didn’t quite produce high quality results for aesthetics (low vs. high quality images) due to aspect ratio constraints.

Introduction of spatial pyramid pooling in the network helped them circumvent the constraint, and finally the ensemble of fisher vector-based feature model with deep learning resulted in a 90% + accuracy for image aesthetics classification.

Overall, the Huddle was highly engaging with each session touching upon different types of issues being solved across domains, using varied aspects of machine learning. What was interesting to note was how age-old problems are now being solved more efficiently and smartly by leveraging the exponentially increasing power of computation and storage.

Fun times ahead indeed!

We’ll be back with our next Huddle within a few months. Watch this space for more! For any information on Huddle, do reach out to us at huddle@zomato.com

Naresh Mehta is leading the data science and advanced analytics vertical at Zomato with key focus on recommendation engines, search algorithms and ensuring harmony between statisticians/analysts and ML engineers in data science team.

--

--