Maximum Likelihood estimate

Jatin Gupta
2 min readApr 21, 2024

--

Before knowing any of the ML concepts. One must know only one algorithm/ concept/ method. And that is MLE. This is the concept which is first seen in 1912 and changed the course of the statistics. Ronald Fischer spent 5 years in trying to prove it, but could not. And Wilks spent his whole life to prove it but could not. Though he gave the estimate of the error boundary. Which is enough for the scientists that the algorithm will/ would work.

Now we use this method to solve most of our estimate problems. In trying to fit a function to the data, we just want to reduce the error. The parameters that reduce the error most could be the one/set that can be used as an estimate. In mathematics terms

Y = f(X; A) and say, we have to find a set of parameters A and mapped variables X and Y, values. We just want to know what are the values of A which will fit the f(X; A) to the Y best. We could say it will be a search when defined error as G(y — f(X)) and now just find the values which will minimize G(y — f(x); A), hereby defining G, we came to the parameter space.

Similarly, the goal of maximum likelihood estimation is to determine the values of model parameters for which the observed data have the highest joint probability in the parameter space (which generally is Euclidean Space).

MLE is?

https://wikimedia.org/api/rest_v1/media/math/render/svg/e60473465c4f0c23c58109287b846b7eca28e19a

For independent and identically distributed random variables will be the product of univariate density functions

https://wikimedia.org/api/rest_v1/media/math/render/svg/3bc35592aa22e723a2843a60a5caee69dbb05dcf

Sufficient condition

continuous over a parameter space that is compact

  • For an open Θ the likelihood function may increase without ever reaching a supremum value.

Necessary condition

the necessary conditions for the occurrence of a maximum (or a minimum) are

https://wikimedia.org/api/rest_v1/media/math/render/svg/e8e3cf11a0808cc691f26fda5576cc786bdde172

  • in general no closed-form solution to the maximization problem is known or available\
  • Another problem is that in finite samples, there may exist multiple roots for the likelihood equations

Second order condition

whether the matrix of second-order partial and cross-partial derivatives, the so-called Hessian matrix

https://wikimedia.org/api/rest_v1/media/math/render/svg/7941ed0c67598b3435e679a5695a249823059896

is negative semi-definite at, as this indicates local concavity

--

--

Jatin Gupta

An Indian Data Scientist, who loves to write about tech in e-commerce, marriage and education.