[archive] Different Flavors of Relatedness

Milan Stankovic, PhD
Milstan’s Old Blog
5 min readAug 18, 2011

This is a re-print of a post that has been originally published on my blog on 18th of August 2011.

In earlier blog posts I talked about Semantic Proximity of concepts and using Linked Data to derive a notion of semantic relatedness. Driven by a more theoretic part of my thesis I was lead to consider other different ways to compute relatedness of concepts, such as those based on co-occurrence in texts, or those relying on the social graph. While we may speak of different performance of those approaches in different situations, there is nothing that would stop us from combining them. Obviously, if two different notions identify a pair of concepts as mutually related, then we can be more certain about their relatedness. But there is an additional richness in combinations as different combinations of different notions might result in different types of relatedness. The following image represents different types of relevance notions and the classes of relatedness emerging from their combinations.

Different notions of relevance

Social Relevance comes out of social connections or similarity between people. The systems that use this notion rely on the assumption that a person is likely to be interested in what the person’s fiends are interested in. Facebook suggests friends of our friends as people we might be interested in befriending. It also shows content liked by our friends as relevant to us. Other systems construct user profiles and, in the lack of any information about friendship deduce the information about similar people, and use those profiles of similar people to recommend things (in a way similar to what Amazon does).

Advantages

  • The basic assumption of this approach is strongly confirmed by the actual human practices [find studies that show that]. People often like to know what their friends are interested into. Friendships and connections contribute to the development of interests and therefore recommendations based on this assumption are likely to be judged as desired.
  • Known by users and easy to understand why something is recommended to them.

Disadvantages

  • Often difficult to construct due to intrasparency of the social graph. It is difficult to obtain social graph information, and this approach is mostly applicable only for Social Networks who have access to such data.

Content Relevance comes out of co-occurrence of concepts/terms in texts. The basic assumption behind this approach is that if two terms or concepts appear frequently together in texts, or similar concepts sets, they are likely to be related and relevant for one another. Such an approach is used by Google AdWords to look into terms that co-occur in search queries and suggest relevant terms for advertising campaigns, or for Google Suggest that proposes useful additional keywords in Web search.

Advantages

  • Relatively easy to obtain a corpus on the Web, which makes this method highly accessible.
  • Tools for performing it are available as open source.
  • Widely used and known by developers.

Disadvantages

  • The quality of recommendations depends heavily on the corpus used, and its fitness for the recommendation domain and scenario.
  • Relatively easy to influence the results by producing content with an intention to enforce false relevance. Content farms represent a threat to the approach if the Web content is used unrestrictively.

Semantic Relevance comes out of relations of concepts explicated in some semantic knowledge base/graph. Approaches using WordNet, DBpedia and similar knowledge bases have been proposed, mostly in research, to establish a notion of semantic relatedness and use those knowledge bases for concept suggestion.

Advantages

  • The approach is based on the meaning, and therefore likely to provide insight into more complete and less expected recommendations then statistics-based approaches.

Disadvantages

  • The quality of recommendations depends heavily on the chosen knowledge base, and its fitness for the recommendation domain and scenario.
  • The availability of knowledge bases usable in this approach is not high, and for some cases the application of this method would have to involve a construction of a specific knowledge base.

Combined Approaches

Once we have outlined the 3 basic notions of relevance it is interesting to look at their possible combinations. Being grounded in different basic assumptions, the 3 basic approaches produce qualitatively different suggestions of related concepts. We look at those differences and provide an overview of their possible combinations, by trying to predict the qualitative nature of recommendations that the combined approaches would be able to provide.

Social, Content and Semantic relevance

Concepts that are considered relevant by all 3 notions of relevance, are likely to be the most highly relevant concepts, almost the same as the initial input concepts.

Social and Semantic relevance, non-Content

Concepts that are both related by meaning, and are used by connected and similar people would indicate the things used by a same circle of people and that are related by meaning. Recommendations based on this combined notion can help define communities of practice, and especially point to the concepts that are not often used in the same context, but rather used by the same and similar people in different contexts.

Social and Content relevance, non-Semantic

Concepts that often co-occur in content and are used by people who are connected, are likely to define common situations and contexts that a particular community usually faces. The co-occurrence in texts indicates that the concepts are used in the same context (the one that the text is about), and the additional relevance achieved by connected people indicates that this context is actually used by people who know each other (or who may otherwise be considered as similar). However, because of the lack of semantic relations between the concepts, it is not likely that the people are connected by their domain of knowledge and activity, but rather by other interests and affinities.

Semantic and Content relevance, non-Social

Concepts that are both related by their meaning and co-occur in content are likely to represent similar or interdependent things that are often mentioned together because of their functional interdependence.

Social, non-Content, non-Semantic

Concepts that are relevant only in the social sense, with no semantic relevance and that do not co-occur in content, are likely to be interest associations — things that similar and like minded people are interested in, but are so different that they may rarely be referred to in the same context. Relevance in this sense might, for instance, result from the fact that people interested in Football often befriend people interested in Biology.

Content, non-Social, non-Semantic

Concepts related only by co-occurrence in content, without any semantic similarity and without a community using them together, are likely to define a vocabulary of situations and contexts that people who are not like-minded nor connected can face.

Semantic, non-Social, non-Content

Concepts related only by meaning, and not used by similar/connected people, and not co-occurring in content, are likely to be related concepts that a common user would not think of as related but would recognize them as such. They lack of joint use makes such semantic connections often overlooked, possibly even by experts — as those relevance relations do not take part in defining the communities of practice.

Originally published at web.archive.org on August 18, 2011.

--

--

Milan Stankovic, PhD
Milstan’s Old Blog

Milan is a Parisian Tech Founder. PhD in Computer Science from Sorbonne. Startup made and sold. Making computers better companions to humans. http://milstan.net