An ‘Entity Embeddings’ sharing with New York’s AI Community
On 6th June 2019, I had the good fortune to address around 50 Artificial Intelligence (AI) professionals of the New York City (NYC) Deep Learning (DL) Meetup Group on the topic of ‘Entity Embeddings’. I was kindly invited to speak there by Kris Skrinak @skrinak, AI Architect at Amazon AWS who hosts the above NYC DL meetup along with Pallavi Gadgil.
The venue was the Amazon midtown NYC office 10 levels above it’s book store there. With pizza & wine served at the start, the hour long talk evoked keen interest with lively interactions from the audience consisting of AI, Software Professionals, Data Scientists, Industry Executives, Enthusiasts, Students and others.
‘Entity Embeddings’ is an upcoming AI technique for applying deep learning. It involves representing the categorical data of an information systems entity with multiple dimensions to generate better quality predictions. It is being extensively used in several large AI production systems at companies such as Google, Instacart, OpenAI, Twitter & many others.
Why is it important ? Business leaders can no longer ignore AI which is estimated by Forbes to be US 150 $ Trillion industry by 2025. Within AI, Entity Embeddings is a powerful technique that works across different business domains and verticals. I call it as the ‘Mathematization of Organizational Intelligence’. For AI Technology leaders, it is important to know that Entity Embeddings is independent of any specific Machine Learning (ML) method & also does not need any domain specific feature engineering knowledge or sector expertise for designing AI models.
Scope of my talk : In my presentation, I presented the ‘Entity Embeddings’ concept along with examining it’s usage in the following 3 AI papers : Two Kaggle Competition winner papers — Artificial Neural Networks Applied to Taxi Destination Prediction (Yoshua Bengio’s team ) & Entity Embeddings of Categorical Variables & Google Research paper — Deep Neural Networks for YouTube Recommendations.
The Youtube recording is here. The following subjects at the respective timelines were covered by me :
0:00 — Why Talk of Entity Embeddings ?
2:45 — What Are Entity Embeddings ?
8:32 — Importance of Entity Embeddings
9:34 — Two Perspectives — Word Embeddings & Real World Tabular Data
10:15 — Word Embeddings (including references to contemporary research)
27:00 — Real World Tabular Data
34:55 — Machine Learning Library Support
39:39 — Artificial Neural Networks Applied to Taxi Destination Prediction
42:33 — Entity Embeddings of Categorical Variables
45:24 — Deep Neural Networks for YouTube Recommendations
52:30 — Industry Usage — Twitter, OpenAI, Healthcare Domain, etc.
55:13 — Aricles, Summary, Call To Action
Interactions & Discussions : My talk was interspersed with many questions which I tried to answer to the best of my abilities with simple every day examples. Some of the areas where people had questions were as follows :
- Clarity on Embeddings, initialization, update & sharing mechanisms
- t-SNE tool to visualize Embeddings outputs to understand it’s impact
- Understanding Embeddings usage in various organizational data types
Since Entity Embeddings is an extremely important area of applied research, I hope to clarify further on some of the questions raised in a future article.
Earlier References : I had earlier presented on the same topic at a ‘This Week in Machine Learning & AI ( TWiML & AI of Sam Charrington) study group session & also wrote about it’s usage in Collaborative Filtering algorithms for Movie Recommendations.
FastAI Shoutout: A huge thanks to Jeremy Howard & Rachel Thomas of fastai for introducing me to the concept of Entity Embeddings and the excellent ground breaking work they have been doing in the area of AI research and cutting edge AI online education for the masses. References to FastAI were made at the following timelines :
14:07 — Size of Embedding
26:23 & 35:35 — Fastai Library Support function — add_datepart
37:38 — Jeremy Howard on Embedding size
44:37 — Rachel Thomas on ‘Rossman Stores Competition’ paper
52:30 — Jeremy Howard on commercial & scientific opportunities
Thanks to Kris Skrinak @skrinak & Pallavi Gadgil for their amazing support and hard work in organizing the meetup and also to all the participants who spared time for the same. All feedback & suggestions appreciated. You are also welcome to visit our Easy AI page for more information from time to time.