The Same, or Not the Same — That is the Problem of Entity Resolution

Four methods for finding identical or similar entities in biomedical and customer data

Sixing Huang
CodeX

--

Photo by Raquel Martínez on Unsplash

Humans are highly skilled at discerning whether two things are identical or not. For instance, we know that HIV and Human Immunodeficiency Virus refer to the same virus. But we also know that Jimmy Kimmel and Jimmy Fallon are not the same person, even though they have similar names and share the same profession. The process of identifying and linking the same entities is called entity resolution.

Entity resolution can be seen as a special case of classification in machine learning. It is a fundamental process, especially for knowledge graph construction, fraud detection, and customer relationship management. On the one hand, different people can name the same thing differently, even within the same organization. On the other hand, we often group similar entities together so that we can understand them better as a whole. For example, we can group addresses together into a district so that we can calculate its average house price or crime rate.

Entity recognition can be quite challenging for computers, as it requires the system to understand the context of unstructured data and to identify the patterns that are associated with different…

--

--

Sixing Huang
CodeX
Writer for

A Neo4j Ninja, German bioinformatician in Gemini Data. I like to try things: Cloud, ML, satellite imagery, Japanese, plants, and travel the world.