This is a blog post for inzva.

# Bayesian Networks (BNs)

Bayesian Networks (BNs) are a member of probabilistic graphical models for modeling uncertainty. BN is a powerful tool for subjective logic [2]. Bayes nets are also useful for representing flexible applicability [3]. BN is a directed acyclic graph (DAG) where the nodes denote the random variables and the edges represent their conditional dependencies

Equation 1 illustrates Bayesian inference with Boolean variables. In this equation, *p(H)* denotes the prior probability of an issue belonging to class *H* (hypothesis) where *p(E|H)* is the likelihood of observing *E* (evidence) when the given issue belongs to *H*. Note that *p(E|H)* is mostly never equal to *p(H|E)*. *You can check on **this site** to remember concepts of prior, posterior, likelihood, and evidence.*

*Example for an intuitive explanation of Bayes’ Rule **(by*** **Michael Hochster, Ph.D. in Statistics, Stanford

*)*

*:*Your roommate, who’s a bit of a slacker, is trying to convince you that money can’t buy happiness, citing a Harvard study showing that only 10% of happy people are rich.

After giving it some thought, it occurs to you that this statistic isn’t very compelling. What you really want to know is what percent of *rich people* are *happy*. This would give a better idea of whether becoming rich might make you happy.

Bayes’ Theorem tells you how to calculate this other, **reversed** statistic using two additional pieces of information:

- The percent of people overall who are happy
- The percent of people overall who are rich

The key idea of Bayes’ theorem is **reversing the statistic using the overall rates. **It says that the fraction of rich people who are happy is the fraction of happy people who are rich, times the overall fraction who are happy, divided by the overall fraction who are rich.

So if

- 40% of people are happy; and
- 5% of people are rich

And if the Harvard study is correct, then the fraction of rich people who are happy is:

10%×40%5%=80%10%×40%5%=80%

So a pretty strong majority of rich people are happy.

It’s not hard to see why this arithmetic works out if we just plug in some specific numbers. Let’s say the population of the whole world is 1000, just to keep it easy. Then Fact 1 tells us there are 400 happy people, and the Harvard study tells us that 40 of these people are rich. So there are 40 people who are both rich and happy. According to Fact 2, there are 50 rich people altogether, so the fraction of them who are happy is 40/50, or 80%.

Conditional dependency relations (arcs) from *node A* to another *node B* represent that *node B* is a child of *node A* or put it differently the *node A* is a parent of *node B* such in Figure 2.

Each node has a prior or conditional probability distribution (cpd) according to its structure (topology). Graph structure supports the representation of knowledge, distributed algorithms for inference and learning, and intuitive interpretation.

Figure 2 demonstrates the relationship between nodes. For instance, for the common effect network configuration, we have *p(A, B, C) = p(C|A, B). *That is two causes of one effect which is also known as a *v-structure. *Let’s assume that *node A* represents *pollution*, *node B* is *smoking* and *node C* represents *lung cancer*. So, the probability of lung cancer is dependent on whether the patient smokes and the amount of pollution in the patient’s home.

Alarm network is the well-known application of a real Bayesian network. Figure 3 shows the alarm example.

In this network, global semantics defines the full joint distribution as the product of the local conditional distributions:

We can calculate probabilities according to the given network. For instance:

*p(J, M, A,*¬*B,*¬*E) = p(J|A).p(M|A).p(A|*¬*B,*¬*E).p(*¬*B).p(*¬*E)*

There are some real-world applications, which are representing Bayesian Networks.

- Document Classification,
- Image Processing,
- Spam Filters,
- Semantic Search *,
- Medical Diagnosis Systems,
- Turbo Decoding Problem [6].

*For instance, search accuracy can be improved by understanding searcher intent and the contextual meaning of terms using Bayesian Networks. That is to search something take into consideration it’s semantic.

In addition to these applications, there are some works that are using Bayesian networks to predict the outcome of sports matches [7, 8].

**Implementation**

Let’s create a simple Bayesian Networks with Jayes which is a Bayesian Network Library for Java [9]. Jayes is implemented by Michael Kutschke as his bachelor’s thesis.

In this example, I create a Bayesian Network with three nodes. These have two, two, and three outcomes, respectively. The outcomes can be a binary such as *(true, false)* or *(good, normal, bad)*. In this network, I use the common effect configuration. The *classNode* is conditionally dependent of the *firstNode* and the *secondNode*.

We can also compute a posterior probability using Bayes’ theorem. Here is a simple example:

**Bonus**

- You can use a software to create Bayesian networks and make an analysis. Weka is a popular data mining software in Java which has various Bayesian network classifier learning algorithms [10].

Figure 4 demonstrates the sampling data. We have five attributes (nodes: *f1, f2, f3, f4, and f5)* and their discrete values *(0 or 1)*. The values of the *class node* represent the behavior of information sources, which are *{honest, flip, random}*[11].

- Visualizations of Bayes’ theorem,

## Further Readings

If you are interested in this topic, I would like to recommend some resources. *Bold ones are more popular than others.*

- (
**Course**) Probabilistic Graphical Models by Daphne Koller in Coursera, - (Course) Machine Learning by Tom Mitchell at CMU,
- (
**Book**) Bayesian Reasoning and Machine Learning by David Barber, - (Book) Learning Bayesian Networks by Richard E. Neapolitan,
- (Book) Modeling and Reasoning with Bayesian Networks by Adnan Darwiche,
- (
**Paper**) Bayesian Networks without Tears by Eugene Charniak, - (
**Paper**) Learning Bayesian Networks: The Combination of Knowledge and Statistical Data by David Heckerman, Dan Geiger, and David M. Chickering.

## Notes

- “
**,**” denotes**AND,** - “
**¬**” denotes**NEGATION**operator.

If you discover any bugs in the implementation parts or if you have any questions, please do not hesitate to write them as a comment.

## Acknowledgment

Many thanks to Burak Suyunu and Yusuf Hakan Kalaycı for their reviews. This post became more understandable after their comments.

**References**

[1] Wikipedia: Thomas Bayes

[2] A. Jøsang. Subjective logic. Draft book in preparation, July 2011

[3] L. De Raedt and K. Kersting. Probabilistic logic learning. ACM SIGKDD Explorations Newsletter, pages 31–48, 2003

[4] Bayes’ Theorem

[5] M. S. Lewicki. Artifial Intelligence Bayes Nets-I Lecture Notes. Carnegie Mellon University, 2007

[6] Bayesian network applications

[7] Karlis, Dimitris, and Ioannis Ntzoufras. Bayesian modelling of football outcomes: using the Skellam’s distribution for the goal difference. IMA Journal of Management Mathematics 20.2 (2008): 133–145

[8] Rue, Havard, and Oyvind Salvesen. Prediction and retrospective analysis of soccer matches in a league. Journal of the Royal Statistical Society: Series D (The Statistician)49.3 (2000): 399–418

[9] Jayes

[10] Bayesian Network Classifiers in Weka

[11] My M. Sc. Thesis

[12] Wikipedia: Bayes’ Theorem

[13] Quora: What is a good source for learning about Bayesian networks

[14] Quora: What are some real-life applications of Bayesian Belief Networks