Monday Notes 8–27–18

Markov Chains

Markov Chains are a powerful tool. They can be used to model the evolution of probabilistic changes to a system. In its basic form, a Markov Chain is a set of states. At each time step, there is a probability that the current state will transition to another state. These probabilities are represented by a matrix where the element in the i-th column and j-th row is the probability of transitioning from state i to state j.

A 3x3 transition matrix describes a Markov chain with 3 states. If we start in state 1, then the 1st column (shown in red) is the probability of being in each state after one transition.

  • The probability of transitioning from state 1 back to itself is 0.25
  • The probability of transitioning from state 1 to state 2 is 0.50
  • The probability of transitioning from state 1 to state 3 is 0.25

Likewise, if we start in the 2nd state , then we can read the 2nd column (blue) to get those probabilities.

Here’s an an example in using the Python library numpy:

We can see that in the long-term, powers of the transition matrix start to converge. We can also see that the columns of the matrix have very similar entries. Given what was said above about how to interpret the columns of the matrix, we can say that in the long-term, the probability distribution of states “forgets” where it started.

That last property is important for machine learning applications. It means that no matter how bad your initial guess is about the state of a system, if you can find a Markov process that describes the evolution of the system then you will eventually arrive at the proper distribution of states. In the example above, the matrix powers converged rather quickly. In most real applications we are not so lucky. But with computational speed increasing every year, this problem is becoming less and less of an obstacle.

Continuous Markov Chains

It is possible to extend the notion of a state to a continuous quantity. Likewise, it is possible to extend the transition from discrete time steps to continuous time. Let’s look at both of these cases.

For discrete states, we compute the next state vector b from the previous state vector a as

The continuous state analog is

Whereas a and b represent discrete probability distributions in the first example, in the second they are probability densities over the continuous state space S.

For continuous time Markov chains, the result of applying the transition matrix gives the derivative of the state. To compute the change in state from time p to time q, we have to integrate over that interval.

The above equation tells us how to compute the change in the probability of state j. With that, we can combine our equations for continuous state and continuous time Markov Chains.

To clarify, this equation allows use to compute the change in the probability density a (which is supported over the continuous state-space S) from time p to time q.

Notice a mistake? Let me know in the comments!


What I’m Reading

Mathematical Logic, Chiswell & Hodges pages 55–81

After reading in depth about natural deduction in the previous chapters, I’ve begun to skim. The selection I’ve read from the book this week begins with the mechanization of proofs. The authors prove that every well-formed formula has a unique parsing tree by introducing the concept of depth in a formula. Depth is essentially the difference between how many parentheses have been “opened” and how many have been “closed.” They prove that there is exactly one functor (i.e. ∧, ∨, ↔,→) at depth 1, and that this functor is the unique root of the parsing tree. The proof then proceeds recursively on the branches of the tree.

There is some discussion of the duality between parsing trees and natural deduction. Natural deduction proofs can be viewed as a kind of upside-down parsing tree. The concept of signatures that was introduced in previous chapters is expanded upon in the discussion of truth tables. The signature gives us the set of symbols which represent variables which may be either True or False. The truth value of a formula can be computed by binding its variables to various truth assignments. The truth value assignment function, called a σ-structure, is represented by the character A. For instance, if φ is the formula (p→q) and A(p)=True and A(q)=True, then A(φ)=True. But if A(p)=True and A(q)=False, then A(φ)=False. The following are definitions based on the concept of σ-structures:

  1. If A(φ)=True in every binding, then φ is a tautology.
  2. If A(φ)=False in every binding, then φ is a contradiction.
  3. If A(φ)=True in some binding, then φ is a satisfiable.

The next major proof is that of logical equivalence, which shows that proving a statement using natural deduction is equivalent to proving the same statement by the method of truth tables. This theorem allows us to define an equivalence relation on formulas. De Morgan’s laws and several other logical principles are introduced as examples of such logical equivalences.


Gödel, Escher, Bach by Douglas Hofstadter, pages 111–160

The Dialogue of chapter V is an excellent piece of writing. The Tortoise and Achilles jump through various modes of meta-dialogue by reading stories about themselves. Even more layers are added as these stories involve “pushing into” and “popping out of” various Escher works. A rather esoteric pun that I found quite amusing involves a precious item rolling down a flight of stairs, to which Achilles asks, “Should we go down a story?”

One of the intentions of the Dialogue is to demonstrate that the linguistic mind does have a very deep stack, or ability to remember layers of context. The initial conflict (where a dastardly villain threatens to turn poor Tortoise into soup!) in the 0th layer of the story is never resolved, as the tale ends with a strong resolution of the 1st layer. After so many shifts of context, it is easy to forget where the initial story began.

The chapter proper then begins and theme (perhaps you will not be surprised) is recursion! Hofstadter astoundingly ties in examples from linguistics, computation, music, and even theoretical physics. This last point is of particular (pardon the pun) interest, as fundamental particles exhibit a strange recursive property.

It is in fact not possible to define even a single particle, as each particle is capable of self-interactions. An electron may emit a photon and then reabsorb it on its path from A to B. However that photon may also decay into a particle-antiparticle pair which then annihilate to produce the original photon! And those descendants of the photon have the possibility for such self-interactions and so on. Therefore, to define the path of a particle, one must consider the entire recursive structure of the way in which that particle might self-interact. This technique is called renormalization and is still a topic of research today; for example, quantum gravity remains a non-renormalizable theory.

Hofstadter also proceeds to discuss the work of his PhD thesis on the theoretical energy levels of an electron bound to a crystal lattice in the presence of a magnetic field. The crystal lattice induces a periodic behavior of the electron, and the magnetic field induces a different periodic behavior. The ratio of these periods then determines the allowable energy levels. The plot, shown below, admits a bizarre recursive structure, where the entire graph is defined in terms of smaller, somewhat distorted versions of itself.

What makes this plot, and recursive structures in general, so fascinating is that the whole can only be determined by the parts, which themselves contain the whole. There is no “atom” which can be used as a starting point, as any such atom must contain itself and therefore is not an atom at all. Hofstadter takes care to point out the above plot is purely of theoretical interest and most likely would not agree with any experiment. My own intuition as to why this might be true is that Nature very often has sly ways of side-stepping infinities.


Like what I’ve written? Have any suggestions? Let me know in the comments!