Photo by Romain Vignes on Unsplash

Document Clustering Based On Non-negative Matrix Factorization

Wei Xu, Xin Liu, Yihong Gong

Yoav Navon
2 min readAug 29, 2019

--

In the paper, it’s presented a document partitioning method based on the non-negative factorization of the term-document matrix of a given document corpus. This way they construct a topic model, where each axis of the space derived captures a certain topic. Each document is represented as an additive combination of the base topics, and the axes of the semantic space are not necessarily orthogonal.

I find that is ingenious to take advantage of the non-negative domain, to create this method of non-negative matrix factorization.

I have a question about the non-orthogonal basis. It is mention that each document takes only non-negative values in all the latent semantic directions. Then how would it be possible to represent the point marked in figure 1.

Firgure 1

This picture is taken from the paper, where the arrows represent the basis vectors of the new space. We can see that there is no positive linear combination of the black arrows to create the blue dot, so it is not clear to me how is this document represented.

--

--