Industry @ RecSys 2016

sbourke
5 min readOct 18, 2016

--

RecSys recsys.acm.org is the annual conference about recommender systems where researchers and industry practitioners from around the world share their latest and greatest results and thoughts about Recommender Systems.

This was the 10 year anniversary of the conference. Because the industry presence was so heavy this year at RecSys in comparison to most other years (Exception being Foster City) I thought it would be useful to make a post focussed specifically on the industry track, learnings and outcomes.

RecSys 2016 in Cambridge / Boston, Massachusetts

Industry @ RecSys 2016

This year there were three industry sessions in total as well as research papers from different companies in dedicated research sessions, not to mention many talks in different workshops. From talking to various people who are either running teams or contributors in teams at the companies doing cool and interesting work I reinforced some thoughts I had in my head before the conference (So I could actually be very wrong :-) —

Deep learning: It’s occasionally used, and with the exception of one company, it does not provide any wow factor to their underlying engagement / revenue numbers. Embeddings did seem to provide pretty useful signals in many different ways, but this is generally done via shallow techniques such as word2vec. Criteo gave a pretty interesting presentation on some research that they had done Meta-Prod2Vec — Product Embeddings Using Side-Information for Recommendation

The main challenge highlighted with deep learning was that its really difficult to debug when the model learns something weird. When anyone did use it — then it appeared to be a part of a general ensemble technique. Most likely deep learning is a great tool to use to solve a problem we just haven’t identified yet. The keynote given by Sander Dieleman of DeepMind at the Deep learning workshop about predicting latent values with audio signals only for cold-start items is pretty cool though.

Everything is a recommendation: It should go without saying that Recommender Systems are a fantastic opportunity for a system to engage with a user and help them discovery useful and interesting information.

This can be somewhat missed when attending the research tracks at RecSys because they are so focussed on improving a particular aspect of an overall system. The industry sessions highlighted the significant range of challenges being addressed in an overall system.

The talk in particular Past, present and future view of recommender systems from an industry perspective which was given by Justin Basilco from Netflix and Xavi Amatrian from Quora demonstrated the various different signals each company tries to extract from their product to understand user interests, intent and future activity. The talk from Deepak Agarwal and Xavi Amatrian about building real world recommender systems also hit this message home in a two part tutorial session —

Google gave a keynote talk about their Google Now / Universal search product and the various signals they incorporate into the system. In this case Google mentioned their multi modal user representation, coupled with their pretty extensive data infrastructure (Products such as Knowledge Graph) when trying to match a user with a particular end goal. This had remnants of Jeff Deans talk at RecSys 2014

In the end the same underlying theme was present, lots of different signals are used. Lots of different algorithms in different parts of the system are used, either to infer signals which support recommendation or recommendation directly. The Facebook feed presentation was another example of this… it even had some deep learning.

Distributed machine learning is hard, and may not be worth it: Taking raw uncooked data and turning it into a cooked dataset is usually the most intensive part of a machine learning pipeline. However once the cooked dataset has been achieved, usually the memory footprint is dramatically smaller, and most likely manageable on a single machine. Most likely companies such as Google and Facebook have a genuine need for distributed machine learning. After that though if you are operating in the world of one or two hundred million users a single machine approach could suffice.

Scaling out a linear model may be somewhat straight forward in a map reduce context, but something similar to collaborative filtering using matrix factorisation has significant communication overhead due to the iterative nature of the job and the size of the learnt model , as a consequence data structures to make this more efficient need to be created — See ALS Spark implementation for example

There was a specific talk to this given by Yves Raimond of Netflix — called (Some) pitfalls of distributed learning in the large-scale recommender workshop. A key message is not that distributed machine learning is bad, just that its hard. And possibly a single machine implementation is potentially faster than and easier to debug then a scaled out version. You can make a back of the envelope guess about the scale at which Netflix is operating at based on their recommending for the world talk they gave.

An interesting thought is that Map-reduce was created to stop the process of hand written parallelism it would seem that a good abstraction for machine-learning is still not mainstream. Tensorflow for example does allow for distributed machine learning perhaps their abstraction is still missing from tools such as Spark.

Overall there were a bunch of very practical tips picked up either via presentations or having the opportunity to talk with the presenters during the conference.

Links to industry focussed papers in the main track which did not sit in the industry sessions

--

--

sbourke

Data Lead @ Outfittery, previously Zapier, ThredUp, Schibsted. #recsys #ranking #search #ir #machinelearning #engineering #data