Summary: Smoothing the Geometry of Probabilistic Box Embeddings

Pouya Pezeshkpour
4 min readApr 23, 2019

--

Authors: Xiang Li, Luke Vilnis, Dongxu Zhang, Michael Boratko, Andrew McCallum

In this paper, the authors present a novel hierarchical embedding model, inspired by a relaxation of box embeddings into parameterized density functions using Gaussian convolutions over the boxes. There is a recent interest in geometrically based embeddings capturing specific relations between entities such as hierarchies, partial orders, and lattice structures. The overall goal of this work is to provide a solution to an Achilles' heel of one of the previously introduced geometry based method, i.e., box embedding.

Relaxed Box Lattice

The box embedding is an embedding method which provides a geometry-based representation for all elements in our defined task-specific space. Before jumping to the details of the model, let us first review some mathematical background studied in this work.

Non-strict partially ordered set (poset): Poset is a pair P, ≤, where P is a set, and ≤ is a binary relation. For all a, b, c ∈ P,

Using the defined concept of poset, we define a lattice as a poset where any subset of elements has a single unique least upper bound, and greatest lower bound. A lattice is equipped with two binary operations, ∨ (join), and ∧ (meet). a∨b denotes the least upper bound of a, b ∈ P, and a ∧ b denotes their greatest lower bound. Furthermore, A vector lattice is a vector space endowed with a lattice structure. Using the above concepts we define the box lattice as an embedding method, wherein each concept in a knowledge graph is associated with two vectors, the minimum and maximum coordinates of an axis-aligned hyperrectangle, or box (product of intervals). A simple representation of box embedding is depicted in the following figure:

To associate a measure, marginal probabilities of events are given by the volume of boxes, their complements, and intersections under a suitable probability measure. Under the uniform measure, if event x has an associated box with interval boundaries (xm, xM), the probability p(x) is given by \prod_{i}(xM_i − xm_i). Use of the uniform measure requires the boxes to be constrained to the unit hypercube so that p(x) ≤ 1.

When using gradient-based optimization to learn box embeddings, an immediate problem identified in the original work is that when two concepts are incorrectly given as disjoint by the model, no gradient signal can flow since the meet (intersection) is exactly zero, with zero derivative. In this work, the author proposed a kernel smoothing, specifically convolution with a normalized Gaussian kernel instead of “hard edges” of the standard box embeddings which lead to unwanted gradient sparsity. As a result, instead of representing the entities with defined boxes, they consider a convolution of those boxes with a Gaussian signal. The overview of this idea is presented in the following figure:

To evaluate this idea, the authors approximate a closed form for the marginal probabilities of events as:

where:

and soft(x) = log(1 + exp(x)).

Experiments

The authors demonstrate increased or matching performance on WordNet hypernymy prediction, Flickr caption entailment and a MovieLens-based market basket dataset. The result of WordNet hypernym prediction task is depicted in the following table:

As it shows, The smoothed box model performs nearly as well as the original box lattice in terms of test accuracy, requiring less hyper-parameter tuning than the original. They also conduct the same experiment in a more sparse environment, by only considering the mammal subset and changing the ratio of negative samples when calculating the F1 score, the result is presented in the following table:

As we can see from the table, with balanced data, all models nearly match, while as the number of negative examples increases, the performance drops for the original box model, but Smoothed Box still outperforms OE and Box in all setting. The relaxed box embedding method performs similarly in the Flickr caption entailment and a MovieLens-based market basket dataset achieving state-of-the-art results.

This work provides a very novel solution for one of the most important drawbacks of box embedding technique. I find the geometry-based embeddings such as Box embedding and Poincare embedding and all efforts to make them more applicable very essential since they can help us to achieve a better understanding of the embedding space instead of completely rely on neural networks to decide the fate of our desired tasks.

--

--