Vec2Text: Can We Invert Embeddings Back to Text?

Yuan An, PhD
3 min readOct 16, 2023

Current NLP techniques heavily rely on text embeddings for similarity computation. A piece of text is encoded into a sequence of numerical values called embedding. Many also wonder whether it is possible to decode or invert text embeddings back into the original text.

In the study “Text Embeddings Reveal (Almost) As Much As Text” [1] by Morris et al., the authors explored solutions to the problem of inverting embeddings. At first, the basic method they tried wasn’t very successful. However, by refining their technique—where they made corrections step-by-step and tried embedding the text again—they got impressive results. With this improved method, they were able to perfectly reconstruct 92% of texts that were originally 32 tokens long.

Most importantly, when they applied their technique to more advanced embedding models, they found that their method could even bring out personal details, like full names, from clinical notes. This raises privacy concerns about how such embeddings are used and emphasizes the importance of securing such data.

They proposed a vec2text model inspired by a method called controlled generation [2]. Figure 1 was taken from the paper [1] which illustrates the architecture of vec2text.

Figure 1: Overview of the vec2text approach; extracted from [1]

--

--

Yuan An, PhD

Faculty member in the College of Computing and Informatics at Drexel University; Doing research in NLP, Machine Learning, Ontology, Knowledge Graph, Embeddings