5 Data takeaways from Netflix, SignalFire and Ocean Protocol

Published in

DataSeries

4 min readNov 21, 2018

Panel @ World Summit AI: Tony Jebara, Director of Machine Learning at Netflix; Ilya Kirnos, CTO SignalFire; Trent McConaghy, CTO Ocean Protocol; Mike Reiner, Venture Partner at OpenOcean

Highlights video:

5 Key Lessons Learned

Have a process of gathering data in place, before actually gathering data. If you have a scarcity of data what can you do with it? Bootstrap your data. Draw samples of your placement to build a model, and repeat this process over and over again. Do it five times and then, using that, make predictions across all five models. Here you can get not just predictions but also the uncertainty. Once you have the uncertainties, it becomes really important to communicate these to the user. Trent, from Ocean Protocol, has realized that you can even co-design the UX and the algorithm, simultaneously. What’s the key link between the UX and the algorithm? Confidence intervals. In fact, the best part is that a really good UX can overcome the shortcoming of the algorithm or even the data — and vice versa.
Understand the biases in your data and do something about it (de-biase your data) Newsflash: Data is not the truth. You can all the data you want, but it will ultimately reflect society’s biases. Data will also reflect the biases of the tools and the hardware used to gather the data in the first place i.e if you gather data on an iPhone it won’t necessarily generalize to the data of a MacBook Air version of the UI. We must keep in mind that the goal of gathering data is primarily to find patterns so that a machine learning model can generate some action and get some beneficial business outcome from it. The problem, though, is that more often than not one will find inactionable correlations in the data. So, yes, you can say that you’ve got a great data set and built a great model (with exceptional predictive accuracy). But until you’ve gone out to A/B tested things on that data, it’s difficult to truly consider what impact the data has. If you don’t implement some type of randomization, biases will likely remain in the data.
The most complicated models can, at times, be the most brittle ones. People often strive to have the most complicated models. The most complicated ones also usually have the most lines of code, and can often be relatively more “buggy” with pipeline modules that could also fail. Opt for a simple model that underpins the key tenets of the more complicated deep neural network model, but that is much more linear and does not require as much of the data. It’s always nice to have a model (or even a few models) with fewer dimensions as a backup plan!
Be Data Defensible. The textbook approach for data, and in particular learning from data is not as consistent as you might think. Tony stresses the importance of injecting “chaos monkey”-type exercises into data and finding out how algorithms would change as a result. For example, what would happen if Netflix’s entire dataset from the Asia-Pacific was wiped out? How would the algorithm react? Try and make data more reliable, but always handle values by taking into consideration uncertainties as well!
Know the difference between Predictability (Correlation) and Causation! Deep learning is great and so are machine learning techniques…great at predicting that is! They tend to, however, learn the difference between one field to the next, but don’t necessarily learn the causal relationships between different fields. The reality is that we, as human beings, are much more than predictive machines (or correlation machine). What we are, and should be, interested in is causality. As a result, it’s always important to keep in question what could go if you find (simply) nice correlations.

My favorite line of the panel discussion:

Mike Reiner: “What’s data quality to you, Trent?”
Trent McConaghy: “Everything’s a PDF! You can assume that there are error bars on incoming data. And you can propogate that all the way to the model, statistically using Monte Carlo (method).”

Background of speakers

Moderator:

Mike Reiner, Co-founder of City AI and Venture Partner at Open Ocean. Mike invests in innovative startups from Europe and beyond, in several areas and industries, including AI (which he is particularly fond of).

Panelists:

Dr Tony Jebara is a sabbatical professor at Columbia University and Director of Machine Learning at Netflix. Integral to the successful implementation of machine learning at the company, Tony leads a team driving engagement which has saved Netflix a billion dollars yearly.

Ilya Kirnos is the Founding Partner and CTO at SignalFire. Prior to co-founding SignalFire, Ilya was a Software Engineer at Google (2004–2012). During his time at Google, Ilya held several technical leadership positions. He was a Technical Lead for Gmail Ads and was responsible for predicting consumer purchase intent and consumer Gmail monetization. Ilya was also a Technical Lead for AdWords Performance and Scalability where he managed responsiveness and uptime of the AdWords frontend.

Trent McConaghy is the Founder and CTO of Ocean Protocol, a decentralized substrate for AI data & services. It’s designed to catalyze a data commons side-by-side with many data marketplaces, make verified & privacy-preserving compute more accessible and give provenance to training data, model building, and prediction. This, in turn, can help catalyze autonomous driving, medical research, and more. Trent’s long-term goal is to help ensure that humanity has a role in an increasingly autonomous world.

Comment, share, and be sure to follow us on DataSeries, for the latest stories from our network! You can find the full-length video below. Definitely worth a watch!

5 Data takeaways from Netflix, SignalFire and Ocean Protocol

5 Key Lessons Learned

Background of speakers

Written by Keertan Menon