**A High Level Overview of the Probabilistic Soft logic**

Since early times, data and information have become the pivot of evolution, whether this data be documented or passed down by word. And with the advent of the internet, data generation has escalated, producing numerous amounts of data. This mass generation of data has made it harder to maintain and pass down all the elements of a piece of information without losing its value or content elements. So in order to comprehend and extract valuable information from data, machine learning algorithms have been applied applied for data analysis.

___________________________________________________________________

*Credits : This blog is written based on the work performed by the developers of the PSL and is based on the publications of Jay Pujura, Stephen Bach and Matthias Bröcheler.*

*Check out the next post on “**The Logic behind the Probabilistic Soft Logic**” for the mathematical work-around of PSL.*

___________________________________________________________________

Data can be visualized as a set of points in a feature space. Applying machine learning on these data points, for various tasks such as classification, clustering and regression, aids us to garner useful information.

But such machine learning algorithms assume the data to bear similarity among each other and be independent. When we say that the data points are independent, it means that, we cannot arrive at conclusive decisions about one data point based on the features or information we have about another single data point alone.

In reality this is not how the world works. Data Independence has become as scarce as possible making the real world entities and the relationships among these entities tightly connected. This interconnection can now be visualized as a network or a massive web. And such connected data is known as “Relational Data”.

Relational Data is a large network that constitutes interdependent entities linked to one another through relationships. The Internet and Social Networking sites serve as well known examples of relational data. It is easy to observe how the user profiles, interaction, social presence and platforms can all be converged to form a network of information.

Meanwhile, there are some other examples that do not explicitly scream “relational data” but are still rich sources of relational data.

E.g. Free Text : Gathering entities and relationships from free text can build a useful comprehension about the nature of that content.

Hence, we have a need to identify how different entities are related to each other and study their probabilistic models for data analysis. This is where Statistical Relational Learning (SRL) comes into play.

*Statistical Relational Learning*

So as the term suggests, ‘Relational Learning’ is merely trying to identify if a certain type of relationship exists between 2 entities. Adding to the previous statement, ‘Statistical Relational Learning’ is knowing to what extent or probability a relationship exists between 2 entities.

For example, let’s take the following scenario where A and B are 2 persons and X is a Political Party. The *husbandOf *relationship between A and B indicates that B is the wife of A. The *supports *relationship indicates that A supports the Political Party X. But there is no relevant evidence regarding Person B’s (the wife’s) stand-point with regard to Political Party X.

In reality, there are basically 3 relationships that could exist between Person B and Political Party X with regard to Person B’s stand.

- Person B
*supports*Political Party X. - Person B
*isAgainst*Political Party X - Person B
*isNeutralTowards*Political Party X.

Considering that we do not have enough evidence to support any of the above arguments, we take the other relationships between the entities in this graph. As such, it is identifiable that Person A supports Political Party X. And since Person A is the husband of Person B, there is a higher probability that Person B is a supporter of Political Party X as well. Hence, we avoid resolving to the idea that Person B is against Political Party X which is most probably unlikely to happen.

This is a trivial example of ‘Relational Dependency’. Based on the existence of one relationship, we can probabilistically infer the existence or absence of another relationship. And these dependencies are used to model the interdependent, connected data.

*Statistical Relational Models*

Probabilistic models fall under 2 major categorizations.

- Generative Models
- Discriminative Models

Generative models take an observable variable and model the unknown variables.

E.g : Probabilistic Relational Models [1], BLOG [2], Relational Topic Models [3].

Discriminative models consider only the unknown variables when scoring different configurations. They work based on the following Discriminant Function where we model the unknown variable Y based on the observed variable X.

This discriminant function can be translated as a probability distribution where the probability will be log proportional to the discriminant function.

Examples of Discriminative models : Markov Random Fields[4], Max-Margin Markov Fields [5], Structural Support Vector Machines[6] and Markov Logic Networks [7].

*The PSL Framework*

PSL is an SRL framework that uses logic to define SRL problems [8,9] . The problem and its domain are stated in the form of logic and are built as the model. This framework is made up of the following components.

- Predicates
- Atoms
- Rules
- Sets

# Predicates

A relationship forming a link between 2 entities is known as a predicate.

For example, Friends (Rachel, Phoebe). If Rachel and Phoebe are friends “Friendship” will be the predicate.

# Atom

When the relationship is defined, in a general sense, in terms of unknown, random or observed variables, it is called an atom. Simply,

Predicate + Arguments → Atom

When we substitute actual values from real world entities into the atoms, then that becomes a ground rule. A ground rule can be evaluated to be true or false to some degree.

E.g.

Predicate : Friends

Atom : Friends (ENT1, ENT2)

Ground rule : Friends (Rachel, Phoebe)

In PSL, these are continuous random variables where beyond being simply true or false, the rules are true or false to some degree. If we say that the truth value of Rachel and Phoebe being friends is 0.9, then it represents a stronger friendship.

# Rules

Rules are written down in a manner such that they display the relational dependencies between the entities.

E.g.

The above rule simply states the transitivity the triadic closure. ‘A friend of my friend is also my friend’. These rules are not always true or always false. But they always hold. Hence, a weight will be assigned to the rule (6.0 as shown in the above atom) to indicate how often or how far these rules hold. These weights are still relative and can change from domain to domain.

# Sets

Sets are aggregates over atoms.

Average[Friends(Rachel, X)]

This aggregation over ground rules helps to garner an idea about Rachel’s friendships as whether Rachel maintains strong friendships or does not.

So this is how relational rules are written and evaluated on a very high level.

So far we’ve seen a brief summary about PSL and its components. Now the question at hand how do we use these relational rules when it comes to solving real world scenarios.

*PSL’ s Applications*

**1. Entity Resolution**

2 different references may refer to the same underlying entity. Due to the differentiation of the 2 references, their friendship links may also be differentiated causing 2 different models to be built around the same entity. Hence, Entity resolution is the process of identifying the references that point to the same entity.

Based on the above , we can come up with the following relational dependencies.

*If 2 entities have the same name, then they are probably the same.*

- 4.0: Name(A, name_A) & Name(B, name_B) & Similar(name_A,name_B) -> samePerson(A, B)
- In this case, we have translated the intuitive assumption or dependency into a computationally reasonable form.

*If 2 entities have similar friends, then they are probably the same.*

- 2.5: SimilarFriends (A, B) -> SamePerson (A, B)
- This states that, if 2 people have identical friendship circles or social networks, then they are most probably the same person.

*If A=B and B=C, then A and C represent the same person.*

- 20.0: SamePerson(A, B) & SamePerson (B, C) -> SamePerson(A, C)
- This rule is basically the transitivity principle that often holds in practice.

Hence it will be assigned a comparatively higher weight.

**2. Link Prediction**

When constructing relational dependencies, not all entities will have enough data to support the relational link between them. This is the same scenario as illustrated in Figure 6, where we do not directly have enough evidence to support the wife’s support relationship with Political Party X. But we resolve to predicting her support based on her husband’s support relationship.

We can elaborate this application using a different real world scenario where we have a network of relational dependencies built based on email messages exchanged between an office. Link prediction for this instance will render the following.

*If email message resembles a particular category or type of email , then it is of category or type.*

- 1.0: Contains (Email, ‘due’) -> HasType(Email, ‘deadline’)
- If an email contains the word due, such as ‘due by’, then it is a deadline.

*If ENT1 sends an email of type deadline to ENT2, then A is the supervisor of B.*

- 2.5: Sent(
*ENT1, ENT2*) & HasType(Email, ‘deadline’) -> supervisor (*ENT1, ENT2*) - If entity
*ENT1*sends an email to*ENT2*which has the type ‘deadline’ then*ENT1*is the supervisor of*ENT2*.

*If ENT1 is the supervisor of ENT2 and ENT1 is the supervisor of ENT3, then ENT2 and ENT3 are colleagues.*

**3. Collective Classification**

Collective classification comes in handy in predicting political opinions and biases of the public. This task can be performed to a certain extent using conventional machine learning like logistic regression and naive bayes.

E.g.

- If we know that someone adopted a puppy from the shelter, then it is safe to assume that that person is a dog-person.

Although this makes the task seem like an individual classification decision, this decision needs to take in relational information and the social networks into consideration when arriving at a conclusion. Opinions are correlated and rely on tightly-knit relational dependencies. Therefore, we take the entire network at one and label it concomitantly. This labeling fills in the relational dependency gaps and paves way to predict or classify data easily.

# Summary

So to summarize the above,

- Conventional ML algorithms work on data that is independent from each other.
- E.g. Classification, Clustering and Regression
- But with the Internet and the mass generation of data, relational dependency among data has been identified.
- This relational dependency forms a huge network of interconnected data that can be visualized as a graph.
- In graphs, there are entities and relationships that connect 2 entities at the lowest level. This triple relationship is known as a predicate.
- Predicates can be mapped as atoms that serve as rules with some general variables.
- Substituting actual values to general rules is known as ‘Grounding’ and results in a set of ‘Ground Rules’.
- The ground rules will be assigned with weights starting from 0 to infinity. These weights will serve as an indication to how often these rules hold.
- And finally we looked into how we can translate an intuitive assumption into a logic rule by substituting literals.

So the PSL helps in finding how far identified rules hold and hence this blog provides an overview of how the PSL works on the higher level. The follow-up post on the same topic will elucidate this theory mathematically dealing with the functions and formulas it uses.

# References

[1] Koller, D. (1999, June). Probabilistic relational models. In International Conference on Inductive Logic Programming (pp. 3–13). Springer, Berlin, Heidelberg.

[2] Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D. L., & Kolobov, A. (2007). 1 blog: Probabilistic models with unknown objects. Statistical relational learning, 373.

[3] Chang, J., & Blei, D. (2009, April). Relational topic models for document networks. In Artificial Intelligence and Statistics (pp. 81–88).

[4] Rozanov, Y. A. (1982). Markov random fields. In Markov Random Fields (pp. 55–102). Springer, New York, NY.

[5] Taskar, B., Guestrin, C., & Koller, D. (2004). Max-margin Markov networks. In Advances in neural information processing systems (pp. 25–32).

[6] Xue, H., Chen, S., & Yang, Q. (2008, September). Structural support vector machine. In International Symposium on Neural Networks (pp. 501–511). Springer, Berlin, Heidelberg.

[7] Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine learning, 62(1–2), 107–136.

[8] Pujara, J., Miao, H., Getoor, L., & Cohen, W. (2013, October). Knowledge graph identification. In *International Semantic Web Conference* (pp. 542–557). Springer, Berlin, Heidelberg.

[9] Brocheler, M., Mihalkova, L., & Getoor, L. (2012). Probabilistic similarity logic. *arXiv preprint arXiv:1203.3469*.