Basics, Bayesian Network (also known as Bayes Nets, Representation Only)

Luv Verma
3 min readApr 12, 2023

--

This blog is inspired by the lectures from CS5100, NEU taught by Prof. Smith, and CS188, UCB. I have written/re-written this blog because I am using the concept heavily in deriving the objective function for the DDPM model in my other blog and it is difficult to understand what part of the information to search.

Bayes Net

Bayes Net is a type of probabilistic graphical model that represents the conditional dependencies between a set of random variables using a directed acyclic graph (DAG).

The formal definition of Bayes Net consists of (some of it is quoted as is from CS188 lecture notes):

  1. A directed acyclic graph (DAG) of nodes, one per variable X. DAG will have as follows:
  • Nodes: Each node in the graph represents a random variable, which can be discrete or continuous, and can take on any value from a predefined domain.
  • Edges: Directed edges between nodes represent the conditional dependencies between the random variables. An edge from node A to node B indicates that the probability distribution of the variable represented by node B depends on the value of the variable represented by node A.

2. A conditional distribution for each node P(X|A1 …An), where Ai is the i th parent of X, stored as a conditional probability table or CPTs. Each CPT has n+2 columns: one for the values of each of the n parent variables A1 …An, one for the values of X, and one for the conditional probability of X.

When working with Bayesian Networks, CPTs are essential for:

  • Representing the structure and the quantitative relationships between variables in the network.
  • Computing the joint probability distribution of all variables in the network.
  • Performing inference and prediction tasks, such as updating beliefs when new evidence is observed.

Let me reiterate in short.

Specifically, each node in the graph represents a single random variable and each directed edge represents one of the conditional probability distributions we choose to store (i.e. an edge from node A to node B indicates that we store the probability table for P(B|A)). Each node is conditionally independent of all its ancestor nodes in the graph, given all of its parents. Thus, if we have a node representing variable X, we store P(X|A1,A2,…,AN), where A1,…,AN are the parents of X.

Example:

Let us look at an example. Consider a model where we have five binary random variables described below:

  • B: Burglary occurs
  • A: Alarm goes off
  • E: Earthquake occurs
  • J: John calls
  • M: Mary calls.

Assume the alarm can go off if either a burglary or an earthquake occurs, and that Mary and John will call if they hear the alarm. We can represent these dependencies with the graph shown below:

In the alarm model above, we would store probability tables:

P(B), P(E), P(A | B, E), P(J | A), and P(M | A)

Given all of the CPTs for a graph, we can calculate the probability of a given assignment using the chain rule:

Consider, that we want the joint probability for the following case where there is no burglary (-b), no earthquake (-e), but alarm goes off (+a), John calls (+j) and mary doesn’t call (-m). Using Bayes Net that can be represented as:

As you can see from equation 2, that even though we had to calculate the big joint distribution, but because of Bayes Net it is just a product of simple marginal (P(-b), P(-e)) and conditional probabilities (P(+a|-b,-e), P(+j|+a), P(-m|+a)).

If you like the above introduction, please clap and share.

--

--