Continuous Time Markov Chain Analysis

Joost VanderBorgh
nieuwsgierigheid
Published in
5 min readNov 1, 2019

5G networks are coming. But what risk do they carry? And what plans can we draw to model its errors?

Across the world, there are many base stations that make up cellular networks. They are complicated creatures that sometimes can have outages.

Googling the news for cellular outages yields many results:

Farooq , Parwez, Imran (2015)states that the network complexity of 5G could be prone to even “higher cell outage rates due to their higher parametric complexity”. What separates 5G from previous generations is the density of the network. But the density of the network could be a drawback due to hardware or software errors or the greater number of parameters (overcomplexity) in the network. The following blog post seeks to explain the work that Farooq, Parwez, and Imran did.

What Farooq, Parwez, and Imran considered was that the different states in which the continous time markov chain could operate was three:

The first state was the optimal one where the system ran; the second was where there was when at time t, there was onne more misconfigured paramters and the third was when the Base Station was in complete outage.

There are trivial failures (λt) that do not cause outage but cause a sub-optimal state; there are also critical failures that do lead to outage (λc). The time to go to optimal state from the sub-optimal state is exponentially distributed with mean value of 1/(μdc) and the time to go from outage to optimal is 1/μc.

We can receive the generator matrix for this continuous time matrix X(t) by setting p_j(t) = Probability (X(t) and letting p(t) denote row vectors of the transient state probabilities.

This is furthered by seeing the rate matrix R:

We can use Kolmogorov’s differential equation in the matrix form:

And we can find the transient state probability vector:

The transient state probability vector can be found by:

Where 􏰢B is greater than or equal to the max_i |q_ii􏰫􏰪􏰪|and is a uniform rate paramter and P-hat is the probability transition matrix given by:

By seeing how far the probability transient vector, we can come up with an understanding of accuracy error by the uniformization method.

Occupancy time is seen as a way to quantify the reliability of the network. This can be derived from (the expected amount of time the CTMC spends in state j during interval [0,T], starting from state i and moving with transition probability matrix P-hat.

where this relates to matrix form:

We can find the steady state distribution by using the long term formula:

where 􏱈Ψ is the limiting or steady state distribution, and where Ψ_j is the limit as time goes to infinity where the Probability (X(t) = j).

We can solve this using trivial techniques:

Examining three cases where in Case Study I, trivial errors occur of one per eight hours (simulating traffic patterns during a workday as people come back and go home), critical failures are at a factor of 1/10 the rate of trivial errors. The recovery time to bounce back from an error has a mean value of 5 minutes with an exponential distribution. In addition, the mean value recovery time of time to go from suboptimal to optimal takes 6 hours.

These model parameters can also be applied to other cases and are summarised below:

In cases II, there is a simulation of the 5G network where errors may occur much higher due to a higher density of base stations and network complexity. In case III, there is the same circumstances as case I, but the recovery time to bounce back from an error is longer.

Transient analysis shows that decreasing detection and recovery time (by adjusting the parameters) has a major effect of network performance reliability. Detecting failures and compensation time should be minimized to as low as possible to reduce the errors.

Probability of states being in optimal states

Case three maintains the greatest probability of being in the optimal state after 24 hours, as compared to case one; this is because its reduced compensatory time.

The authors of the work point towards a methodology to have a fault prediction framework as follows:

This shows why an understanding of continous markov chains is important; it can help inform a pathway for when the system breaks, what to do!

To maximize optimality of the system means to ensure that the network can self-correct in the event of an error. This is reminiscent of quantum error correction, as covered in a previous post.

Article Discussed:

Continuous Time Markov Chain Based Reliability Analysis for Future Cellular Networks (2015) by Hasan Farooq, ; Md. Salik Parwez ; Ali Imran.

--

--