1. What is Cramer-Rao Lower Bound (CRLB)?
In statistics, CRLB relates to estimation of a deterministic parameter. When talking about an estimator, bias and variance are two important metrics to evaluate an estimator. Consequently, in terms of estimator evaluation, CRLB is a useful result because it theoretically states the lowest variance of ANY unbiased estimator of a parameter, how close an unbiased estimator’s variance comes to this bound is defined as estimator’s efficiency.
Let f(x,𝜃) be the probability density function (or probability mass function) of a random variable X conditioned on the value of 𝜃. 𝜃(X) is an unbiased estimator of 𝜃.
Then CRLB is defined as,
Where I(𝜃) is the Fisher Information,
2. Derivation of the CRLB.
Step 1: Leveraging the property of the probability density function (PDF), that the integral of PDF is 1.
Note: In the derivation above, we always allow the interchange of integral and derivative, however, the order is conditionally interchangeable. Say we have a function f(x,θ), a≤x≤b, m≤θ≤n. Then interchange of integral and derivative is allowed if:
- ∫f(x,θ)dx (a function of one variable θ) is differentiable in m≤θ≤n.
- Two variable functions f(x,θ) and ∂f(x,θ)/∂θ are uniformly continuous in the region [a,b]x[m,n].
When these two conditions are met, we are able to obtain this equation
It’s so called Regularity Condition in CRLB, and the exponential family of distributions, including most common distributions like Gaussian, binomial, multinomial, Poisson, gamma and beta distributions, satisfy this condition.
Step 2: Leveraging the unbiasedness, we have
Note: Interchange of integral and derivative is also allowed here as in the step 1.
Step 3: Using Cauchy–Schwarz inequality.
Step 4:
Finally, let’s combine the results from step 3 and step 4 and obtain the inequality of Cramer-Rao Lower Bound:
3. How to understand the CRLB and the Fisher Information?
In maximum likelihood estimation, we want to locate the θ which maximizes the likelihood, and typically we solve the 1st order conditions by equating the score function, the gradient of log-likelihood with respect to parameter θ, to 0.
From the way we write the Fisher Information, we know that it’s the second partial derivative of log-likelihood with respect to 𝜃, so called curvature of log-likelihood. In this case, the curvature can be thought as the acceleration of log-likelihood and provide a way to quantify sensitivity response to the change of 𝜃. Fisher Information computes the expectation of the curvature, which can be considered as weighted average of curvature. Thus, larger Fisher Information is, more sensible the log-likelihood responds to change of 𝜃, more information about 𝜃 the log-likelihood contains.
CRLB is inverse proportional to Fisher Information, which can be intuitively phrased as “more information about 𝜃 the log-likelihood contains, lower variance you can reach when perform a point estimation of 𝜃”.
4. Examples — Normal Distribution.
A group of random variables X1,X2…Xn ~ N(θ,σ^2), σ is known, θ is unknown. Compute CRLB on unbiased estimator of θ.
Step 1: Compute the Joint Probability Distribution (JPD).
In this case, the joint probability distribution is a product of N normal distributions, since every X is i.i.d.
Step 2: Compute the log of JPD.
Step 3: Compute the 2nd partial derivative of log-likelihood or compute the square of the gradient of log-likelihood with respect to θ.
Step 4: Compute Fisher Information and CRLB