Entity Embeddings …

… in the context of categorical variables

Romain Bouges
unpack
3 min readJan 18, 2021

--

What is entity embeddings in the context of categorical variables?

Categories can be as diverse as cloud types (Cumulus, Cirrus, Altostratus …), words (list of all the French words) or age (integers from 0 to 120+) and may be represented by object as abstract as numbers or vectors. Combinations of those categories can help to predict the result of another category, for instance predict a salary range according to categories such as the age, level of study, family situation, origin and working class of the parents.

Mathematically, links between those categories (composed of element) can be represented, in the case of metric spaces for instance (elements in it will have a notion of distance between them such as variance, mean of the absolute difference …), by the distance between them. More specifically, entity embedding is a transformation (a morphism) that will make those abstract elements (numbers, vectors …) closer or further according to the yet unknown relations between those concepts that will hint to a prediction (a salary range as mentioned before or a cloud type according to categories such as the location, the season, the humidity, the temperature and the altitude). Elements of the input space separated by similar distances (no relation) will be mapped into an output space containing elements separated by a smaller or a larger distance according to their relationship.

The embedding concept stems from the fact that this transformation is lowering the dimension of the output compared to the input space. It means that similar categories or categories combination will be deemed as belonging to the same dimension. Mathematically, an embedding is an injection from a lower dimension to a higher dimension space: in the case of entity embedding of categorical variables, we use the reciprocal transformation.

What are the advantages of entity embeddings?

The first one is that, when putting the embeded entities as an input to a neural network, it will speed up its learning process [1] and a reduction of the memory usage in the contect of word [2].

Another advantage is the possibility to highlight unknown relations between categories that were seemingly unrelated (data clustering, vizualization through 2D plotting …) [3] [4].

Word embedding is a field in Neural Language Processing benefiting from this technique and which, recently makes it especialy popular [5].

Data clustering using entity embedding technique.

Credit

[1] https://arxiv.org/pdf/1301.3781.pdf, Tomas Mikolov, Kai Chen, Greg Corrado and Jeffrey Dean, “Efficient Estimation of Word Representations inVector Space” (2013).

[2] https://arxiv.org/pdf/1604.06737.pdf, Cheng Guo and Felix Berkhahn, “Entity Embeddings of Categorical Variables” (2016).

[3] ijcai.org/papers15/Papers/IJCAI15–513.pdf, Yitan Li, Linli Xu, Fei Tian, Liang Jiang, Xiaowei Zhong and Enhong Chen, “Word Embedding Revisited: A New RepresentationLearning and Explicit Matrix Factorization Perspective” (2015).

[4] http://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9342 Fei Wu, Jun Song, Yi Yang, Xi Li, Zhongfei Zhang, and Yueting Zhuang, “Structured embedding via pairwise relations and long-range interactions in knowledge base,” (2015).

[5] https://code.google.com/archive/p/word2vec/

--

--