Maximum Likelihood Estimation — Parameter Estimation Technique — Machine Learning with Python Code

6 min readMay 23, 2024

Maximum Likelihood estimation (MLE) is a method of parameter estimation and perhaps the most important technique to estimate the parameters involved in machine learning, it holds the whole core of machine learning.

First let’s understand what likelihood actually is, let’s suppose you have ’n’ data points (X1,X2,X3……Xn) , now you have to choose a mathematical function or probability density (PDF) function that represents the data points well enough , it could be any well-known distribution like Bayesian, Bernoulli, Poisson etc or any other specific mathematical function (f(xi)) that completely depends on the data. Likelihood (L(Θ)) ↨, (where Θ is the parameters involved in the PDF) would be the multiplication of the probabilities at our data points (as per the PDF we have chosen).

Mathematically

Where L(Θ) = Likelihood function , f(X) = Probability density function , Θ = Parameters.

Higher the value of likelihood function, that means better the f(Θ) function is representing the data points, so we are always keen onto choosing the best possible function that will have the maximum likelihood ,so we have to choose the best fit of parameters that will lead to maximum value of the L(Θ) , that is we need to find argmax (L(Θ)) by MLE that means the best value of parameters (Θ).

Steps to calculate the best parameters to get maximum likelihood are –

1. Decide a distribution that best represents the data points , choose a PDF that defines the sample points.
2. Write out the log of the likelihood (we do this often to make calculations easier , as log is a monotonically increasing function argmax(f(x)) = argmax(log(f(x))))
3. State that the optimal parameters are the argmax of the log likelihood function
4. Using an optimization technique (like calculus) to find the argmax.

This way we get to know the best parameters (Θ) possible to get the mostly likely distribution and that would be considered as the best fit for the PDF we have taken to represent the data.

Example 1

Suppose we have a Poisson distribution of data points ((x1,x2,x3……xn)) , step by step explanation to calculate the parameter by MLE goes like this…

So , we discovered that that the argmax of poison distribution comes out to be λ = mean of the data points by maximum likelihood estimation.

Example 2

So lets look upon the Bayesian distribution — In this section, our focus lies on determining the maximum likelihood estimates (MLE) for a distribution defined by two parameters. Given the prominence of normal distributions in this context, we will illustrate the procedure for obtaining MLEs for its two parameters — the mean (µ) and variance (σ²) , the whole method to get the parameter estimation on this distribution goes like -

1. Determine the Log(L(Θ)) that is LL(Θ)

2. Differentiate with respect to each Θ and equate it to 0.

3. Solving the resulting equation

Like this way calculated the argmax of Bayesian distribution which have 2 parameters , though calculating the argmax of this was a comparatively complex task , but we got our answer.

From here it’s the coding part…

### Example 1 Poisson's Distribution

# Import necessary libraries
import numpy as np
import math
import matplotlib.pyplot as plt
from scipy.stats import poisson
from scipy.optimize import minimize

# Generate Poisson distributed data points
np.random.seed(42)  # for reproducibility
lambda_true = 5
data = np.random.poisson(lambda_true, 1000)

# Define the log-likelihood function for Poisson distribution
def poisson_log_likelihood(lambda_estimated, data):
    # Calculate the log-likelihood: sum(x * log(lambda) - lambda - log(x!))
    log_likelihood = np.sum(data * np.log(lambda_estimated) - lambda_estimated - np.log([math.factorial(x) for x in data]))
    return log_likelihood

# Maximum Likelihood Estimation (MLE) for Poisson distribution
mle_result = minimize(lambda params: -poisson_log_likelihood(params[0], data), x0=[1])
lambda_mle = mle_result.x[0]

lambda_mle # This is the best value of parameter calculated through MLE

data_mean = np.mean(data)
print("Mean of the generated data:", data_mean)
# As for the poison's distribution the the best value of lambda comes out to be mean hence prooved.

# Calculate log-likelihood for a range of lambda values
lambda_values = np.linspace(0.1, 10, 100)
log_likelihood_values = [poisson_log_likelihood(l, data) for l in lambda_values]

# Plotting the log-likelihood function
plt.figure(figsize=(10, 6))
plt.plot(lambda_values, log_likelihood_values, label='Log-Likelihood Function')
plt.axvline(lambda_mle, color='r', linestyle='--', label=f'MLE (λ={lambda_mle:.2f})')
plt.scatter([lambda_mle], [poisson_log_likelihood(lambda_mle, data)], color='red')

plt.title('Log-Likelihood Function for Poisson Distribution')
plt.xlabel('λ (Lambda)')
plt.ylabel('Log-Likelihood')
plt.legend()
plt.grid(True)
plt.show()
# It's clear that the maximum liklihood is acheived at lambda that is estimated by MLE , 
# so this lambda is best choice to represnet our poisson's distribution curve.

## Example 2 

from scipy.stats import norm, invgamma
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# Generate random data
np.random.seed(42)
data = np.random.normal(loc=5, scale=2, size=100)

# Calculate MLE for mean and std deviation
mle_mean = np.mean(data)
mle_std = np.std(data, ddof=0)

# Define the log-likelihood function
def log_likelihood(mean, std, data):
    return np.sum(norm.logpdf(data, loc=mean, scale=std))

# Define the prior distributions
mu_0 = 0    # Prior mean
tau = 10    # Prior standard deviation for the mean
alpha = 1   # Shape parameter for inverse gamma (sigma prior)
beta = 1    # Scale parameter for inverse gamma (sigma prior)

# Bayesian distribution and the plot to verify

def log_prior(mean, std):
    log_prior_mu = norm.logpdf(mean, loc=mu_0, scale=tau)
    log_prior_sigma = invgamma.logpdf(std, a=alpha, scale=beta)
    return log_prior_mu + log_prior_sigma

# Define the log-posterior function
def log_posterior(mean, std, data):
    return log_likelihood(mean, std, data) + log_prior(mean, std)

# Calculate log-likelihood and log-posterior for MLE parameters
mle_log_likelihood = log_likelihood(mle_mean, mle_std, data)
mle_log_posterior = log_posterior(mle_mean, mle_std, data)

# Calculate log-likelihood and log-posterior for other arbitrary parameters
mean_values = np.linspace(4, 6, 100)
std_values = np.linspace(1, 3, 100)
log_likelihoods = np.zeros((len(mean_values), len(std_values)))
log_posteriors = np.zeros((len(mean_values), len(std_values)))

for i, mean in enumerate(mean_values):
    for j, std in enumerate(std_values):
        log_likelihoods[i, j] = log_likelihood(mean, std, data)
        log_posteriors[i, j] = log_posterior(mean, std, data)

# Plotting the log-likelihood values
plt.figure(figsize=(14, 6))

# Plot log-likelihood
plt.subplot(1, 2, 1)
X, Y = np.meshgrid(mean_values, std_values)
Z = log_likelihoods.T  # Transpose to align the axes correctly
plt.contourf(X, Y, Z, levels=50, cmap='viridis')
plt.colorbar(label='Log-Likelihood')
plt.scatter(mle_mean, mle_std, color='red', label='MLE (mean, std)', zorder=5)
plt.xlabel('Mean')
plt.ylabel('Standard Deviation')
plt.title('Log-Likelihood Function')
plt.legend()

# Plot log-posterior
plt.subplot(1, 2, 2)
Z = log_posteriors.T  # Transpose to align the axes correctly
plt.contourf(X, Y, Z, levels=50, cmap='viridis')
plt.colorbar(label='Log-Posterior')
plt.scatter(mle_mean, mle_std, color='red', label='MLE (mean, std)', zorder=5)
plt.xlabel('Mean')
plt.ylabel('Standard Deviation')
plt.title('Log-Posterior Distribution')
plt.legend()

plt.tight_layout()
plt.show()

# 3D Plot of the Posterior Distribution
fig = plt.figure(figsize=(14, 6))
ax = fig.add_subplot(111, projection='3d')
X, Y = np.meshgrid(mean_values, std_values)
Z = log_posteriors.T  # Transpose to align the axes correctly
ax.plot_surface(X, Y, Z, cmap='viridis', alpha=0.8)
ax.scatter(mle_mean, mle_std, mle_log_posterior, color='red', s=100, label='MLE (mean, std)', zorder=5)
ax.set_xlabel('Mean')
ax.set_ylabel('Standard Deviation')
ax.set_zlabel('Log-Posterior')
ax.set_title('3D View of the Log-Posterior Distribution')
ax.legend()

plt.show()

print(f"MLE Mean: {mle_mean}")
print(f"MLE Standard Deviation: {mle_std}")
print(f"Log-Likelihood at MLE: {mle_log_likelihood}")
print(f"Log-Posterior at MLE: {mle_log_posterior}")

# Over a range of different mean and std deviation there are different values of liklihood , by plotting it's verified
# that the max value of liklihood comes out at parameters by MLE , in the plot it is shown , the variable parameters involved 
# in the bayesain distribution are mean and std deviation and we can see the following results on the plot.

Conclusion

The purpose of this article was to see MLEs not as abstract functions, but as mesmerizing mathematical constructs that have their roots deeply seated under solid logical and conceptual foundations. I hope you enjoyed going through this guide!

In case you have any doubts or suggestions, do reply in the comment box. Please feel free to contact me via mail — amannagrawall002@gmail.com