Essential Math for Machine Learning: Confidence Level and Z-Score

Understanding Uncertainty in Your Models

4 min readJun 17, 2024

This article is part of the series Essential Math for Machine Learning.

Introduction

Welcome back, ML enthusiasts! Today, we’re diving into two fundamental concepts that are critical for interpreting your machine learning model’s predictions: confidence levels and Z-scores. While machine learning models are powerful, they’re not perfect. Understanding the uncertainty behind their predictions is key to making informed decisions.

Why Uncertainty Matters

Think of your model’s predictions like weather forecasts. A forecast might say there’s an 80% chance of rain, but that doesn’t mean it will rain. Similarly, a model might predict a customer will churn with 95% confidence, but there’s still a 5% chance they won’t.

In machine learning, this uncertainty is often quantified using confidence levels. Let’s break down what they mean and how they’re calculated.

Confidence Levels: A Measure of Certainty

A confidence level is a percentage that expresses how confident we are that a result falls within a certain range. For example, a 95% confidence level means that if we were to repeat our experiment or data collection many times, we’d expect 95% of the results to fall within the specified range.

Example: If our model predicts customer churn with 95% confidence, we believe that 95% of the time, customers with similar characteristics will indeed churn.

Z-Scores: Standardizing Uncertainty

Z-scores are a way to standardize data points to see how many standard deviations they are away from the mean. This helps us understand how unusual or typical a data point is within a distribution.

In machine learning, Z-scores can be used to assess the significance of model predictions. A higher Z-score (positive or negative) indicates a prediction that’s further away from the average, suggesting a greater degree of certainty.

Example: If our model predicts a customer’s spending with a Z-score of 2.5, it means that their predicted spending is 2.5 standard deviations above the average customer. This high Z-score suggests we can be relatively confident in this prediction.

Confidence Levels and Z-Scores in Normal Distributions

In a normal distribution (the bell curve), there’s a direct relationship between confidence levels and Z-scores:

68% Confidence: Approximately 68% of data falls within 1 standard deviation of the mean (Z-score between -1 and 1).
95% Confidence: Approximately 95% of data falls within 2 standard deviations of the mean (Z-score between -1.96 and 1.96).
99% Confidence: Approximately 99% of data falls within 3 standard deviations of the mean (Z-score between -2.58 and 2.58).

This relationship is crucial for calculating confidence intervals. The Z-score corresponding to your desired confidence level tells you how many standard deviations wide your interval needs to be to capture the specified percentage of data.

Calculating Confidence Levels and Z-Scores

The formulas for confidence levels and Z-scores can vary depending on the specific statistical test or machine learning algorithm being used. However, the general principle remains the same:

Confidence Level: Often involves calculating a confidence interval based on the sample mean, standard deviation, and desired level of confidence (e.g., 95%).
Z-Score: Calculated as: (data point — mean) / standard deviation

Practical Implications for Machine Learning

Understanding confidence levels and Z-scores is crucial for:

Decision Making: Helps you weigh the risks and benefits of acting on model predictions.
Model Evaluation: Allows you to assess the reliability and accuracy of your models.
Communication: Enables you to communicate model results and their uncertainty to stakeholders in a clear and understandable way.

Example Problem

Let’s illustrate confidence intervals and Z-scores with a practical example in Python. You’ve collected data on the daily commute times (in minutes) for a group of employees:

commute_times = [35, 42, 28, 50, 38, 45, 32, 39, 48]

You want to estimate the average commute time for all employees with 95% confidence. The code is available in this colab notebook.

import math

# Data
commute_times = [35, 42, 28, 50, 38, 45, 32, 39, 48]

# Calculations
n = len(commute_times)            # Sample size
mean = sum(commute_times) / n     # Sample mean
std_dev = math.sqrt(sum([(x - mean) ** 2 for x in commute_times]) / (n - 1))  # Sample standard deviation

# 95% Confidence (Z-score of 1.96 for two-tailed)
z = 1.96
margin_of_error = z * (std_dev / math.sqrt(n)) 
lower_bound = mean - margin_of_error
upper_bound = mean + margin_of_error

# Output
print(f"The average commute time is {mean:.2f} minutes with a 95% confidence interval of ({lower_bound:.2f}, {upper_bound:.2f}) minutes.")

Explanation:

Data: We start with the sample commute times.
Calculations:
Calculate the sample size (n), mean (mean), and standard deviation (std_dev).
For a 95% confidence level, we use a Z-score of 1.96 (from statistical tables).
Calculate the margin of error.
Determine the lower and upper bounds of the confidence interval.
Output: Print a clear statement summarizing the average commute time and the confidence interval.

Output:

The average commute time is 39.78 minutes with a 95% confidence interval of (34.44, 45.12) minutes.

Interpretation:

We are 95% confident that the true average commute time for all employees lies between 34.44 and 45.12 minutes.

Important Note:

This example uses a normal distribution assumption, which may not always be appropriate. In practice, consider the type of data and its distribution when choosing the right statistical test for your confidence interval calculation.
This example demonstrates the basic calculations; for more complex scenarios or larger datasets, you might use statistical libraries like SciPy for convenience and accuracy.

Key Takeaways

Confidence levels and Z-scores are essential tools for understanding the uncertainty inherent in machine learning predictions.
They help us quantify how sure we can be about a prediction and how unusual a data point is.
By considering uncertainty, we can make more informed decisions and build trust in our models.