<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Statistics in Machine Learning - Medium]]></title>
        <description><![CDATA[Explore the intersection of statistics and machine learning. From foundational concepts to advanced techniques, our publication provides insights, tutorials, and real-world applications that bridge data analysis with cutting-edge AI innovations. - Medium]]></description>
        <link>https://medium.com/statistics-in-machine-learning?source=rss----c61491722dde---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>Statistics in Machine Learning - Medium</title>
            <link>https://medium.com/statistics-in-machine-learning?source=rss----c61491722dde---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 26 May 2026 07:11:38 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/statistics-in-machine-learning" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Converting AUC to Odds Ratio (OR): A Comprehensive Guide Using Python and MLstatkit]]></title>
            <link>https://medium.com/statistics-in-machine-learning/converting-auc-to-odds-ratio-or-a-comprehensive-guide-using-python-and-mlstatkit-111fcc17c172?source=rss----c61491722dde---4</link>
            <guid isPermaLink="false">https://medium.com/p/111fcc17c172</guid>
            <category><![CDATA[auc]]></category>
            <category><![CDATA[clinical-interpretation]]></category>
            <category><![CDATA[odds-ratio]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[mlstatkit]]></category>
            <dc:creator><![CDATA[Yong Zhen Huang]]></dc:creator>
            <pubDate>Wed, 16 Oct 2024 23:22:21 GMT</pubDate>
            <atom:updated>2024-10-15T07:55:27.837Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/761/1*nygummoMsCN3vDGmPjwr3Q.png" /></figure><h3>Introduction</h3><p>In the evaluation of diagnostic tests and binary classification models, the <strong>Area Under the Curve (AUC)</strong> of the Receiver Operating Characteristic (ROC) curve is a widely used metric. While AUC provides a measure of a model’s discriminative ability, clinicians and researchers often prefer effect size measures like the <strong>Odds Ratio (OR)</strong> for their interpretability in clinical contexts.</p><p>Converting AUC to OR bridges the gap between statistical model evaluation and clinical interpretation. In this article, we will explore the relationship between AUC and OR, discuss their clinical significance, and demonstrate how to perform the conversion using Python. We’ll also introduce <strong>MLstatkit</strong>, a library that simplifies this process with its AUC2OR function.</p><h3>Understanding AUC and Odds Ratio</h3><h3>What is AUC?</h3><p>The <strong>Area Under the ROC Curve (AUC)</strong> is a measure of a classifier’s ability to distinguish between positive and negative classes. It ranges from 0 to 1, where:</p><ul><li><strong>AUC = 0.5</strong>: Model has no discriminative ability (equivalent to random guessing).</li><li><strong>AUC &gt; 0.5</strong>: Model performs better than random.</li><li><strong>AUC = 1</strong>: Perfect classification.</li></ul><h3>What is Odds Ratio (OR)?</h3><p>The <strong>Odds Ratio (OR)</strong> is a measure of the association between an exposure and an outcome. It represents the odds that an outcome will occur given a particular exposure, compared to the odds of the outcome occurring without that exposure.</p><ul><li><strong>OR = 1</strong>: No association between exposure and outcome.</li><li><strong>OR &gt; 1</strong>: Exposure is associated with higher odds of the outcome.</li><li><strong>OR &lt; 1</strong>: Exposure is associated with lower odds of the outcome.</li></ul><h3>Clinical Significance</h3><p>While AUC is useful for evaluating model performance, OR is more interpretable in clinical settings. OR provides a direct measure of how much more likely (or unlikely) an event is to occur in one group compared to another.</p><p>Converting AUC to OR allows clinicians to understand the impact of diagnostic tests or risk factors in terms that are more actionable for patient care.</p><h3>Converting AUC to Odds Ratio</h3><h3>The Relationship Between AUC and OR</h3><p>The conversion from AUC to OR is based on the assumption that the underlying distributions of the test results for the positive and negative classes are normally distributed with equal variance. Under this assumption, a mathematical relationship between AUC and OR can be established.</p><h3>Mathematical Formulation</h3><p>The conversion involves several steps, including logarithmic transformations and polynomial approximations. The key intermediate variables are:</p><ol><li>𝓉: Derived from the AUC using a logarithmic transformation.</li><li>𝓏: Calculated from ttt using a polynomial approximation (Beasley’s approximation for the inverse error function).</li><li>𝒹: A scaling of 𝓏, representing the standardized mean difference (Cohen’s 𝒹).</li><li><strong>ln_OR</strong>: The natural logarithm of the Odds Ratio, derived from 𝒹.</li><li><strong>OR</strong>: The Odds Ratio.</li></ol><h4>Step-by-Step Formulas</h4><ol><li>Calculating 𝓉</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/558/1*W8sdoNpvK0BnSpa3EMos0w.png" /></figure><ul><li><strong>Explanation</strong>: Transforms the AUC into an intermediate variable 𝓉 using a logarithmic function.</li></ul><p>2. Calculating 𝓏</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/874/1*5qFNQehoj5toj_bmWv4V-Q.png" /></figure><ul><li><strong>Explanation</strong>: Approximates the inverse cumulative distribution function (probit function) using Beasley’s approximation. The coefficients are directly substituted into the formula.</li></ul><p>3. Calculating 𝒹</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/284/1*9N0MY_giL9V364IrH-podg.png" /></figure><ul><li><strong>Explanation</strong>: Converts the z-score into Cohen’s 𝒹, a measure of effect size.</li></ul><p>4. Calculating ln⁡(OR)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/338/1*oaoDbMI7xku0VwZH6MVhsA.png" /></figure><ul><li><strong>Explanation</strong>: Derives the natural logarithm of the Odds Ratio from the effect size 𝒹, utilizing the properties of the logistic distribution.</li></ul><p>5. Calculating Odds Ratio (OR)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/254/1*nRIgo3Ecs73Yl-wEXMroYg.png" /></figure><ul><li><strong>Explanation</strong>: Exponentiates ln⁡(OR) to obtain the Odds Ratio (OR).</li></ul><h3>Implementing the Conversion in Python</h3><p>Let’s walk through the implementation of the AUC to OR conversion step by step.</p><h4>The Code:</h4><pre>import math<br><br>def AUC2OR(AUC, return_all=False):<br>    &quot;&quot;&quot;<br>    Converts Area Under the Curve (AUC) to Odds Ratio (OR) and optionally returns intermediate values.<br>    <br>    Parameters:<br>    -----------<br>    AUC : float<br>        The Area Under the Curve (AUC) value to be converted.<br>    return_all : bool, default=False<br>        If True, returns intermediate values t, z, d, and ln_OR in addition to OR.<br>    <br>    Returns:<br>    --------<br>    OR : float<br>        The calculated Odds Ratio (OR) from the given AUC value.<br>    t : float, optional<br>        Intermediate value calculated from AUC.<br>    z : float, optional<br>        Intermediate value calculated from t.<br>    d : float, optional<br>        Intermediate value calculated from z.<br>    ln_OR : float, optional<br>        The natural logarithm of the Odds Ratio.<br>    &quot;&quot;&quot;<br>    <br>    def calculate_t(AUC):<br>        return math.sqrt(math.log(1 / ((1 - AUC) ** 2)))<br><br>    def calculate_z(AUC):<br>        t = calculate_t(AUC)<br>        numerator = 2.515517 + 0.802853 * t + 0.0103328 * (t ** 2)<br>        denominator = 1 + 1.432788 * t + 0.189269 * (t ** 2) + 0.001308 * (t ** 3)<br>        z = t - (numerator / denominator)<br>        return z<br><br>    def calculate_d(AUC):<br>        z = calculate_z(AUC)<br>        d = z * math.sqrt(2)<br>        return d<br><br>    t = calculate_t(AUC)<br>    z = calculate_z(AUC)<br>    d = calculate_d(AUC)<br>    ln_OR = (math.pi * d) / math.sqrt(3)<br>    OR = math.exp(ln_OR)<br>    <br>    if return_all:<br>        return t, z, d, ln_OR, OR<br>    else:<br>        return OR</pre><h3>Explaining the Implementation</h3><h4>1. Calculating 𝓉</h4><pre>def calculate_t(AUC):<br>    return math.sqrt(math.log(1 / ((1 - AUC) ** 2)))</pre><ul><li><strong>Purpose</strong>: Transforms the AUC into an intermediate variable 𝓉 using a logarithmic function.</li><li><strong>Explanation</strong>: This step adjusts the AUC to account for the cumulative distribution function of the normal distribution.</li></ul><h4>2. Calculating 𝓏</h4><pre>def calculate_z(AUC):<br>    t = calculate_t(AUC)<br>    numerator = 2.515517 + 0.802853 * t + 0.0103328 * (t ** 2)<br>    denominator = 1 + 1.432788 * t + 0.189269 * (t ** 2) + 0.001308 * (t ** 3)<br>    z = t - (numerator / denominator)<br>    return z</pre><ul><li><strong>Purpose</strong>: Approximates the inverse of the cumulative distribution function (probit function) using Beasley’s approximation.</li><li><strong>Explanation</strong>: This polynomial approximation provides a computationally efficient way to estimate the z-score corresponding to the given AUC.</li></ul><p>3. Calculating d</p><pre>def calculate_d(AUC):<br>    z = calculate_z(AUC)<br>    d = z * math.sqrt(2)<br>    return d</pre><ul><li><strong>Purpose</strong>: Converts the z-score into Cohen’s 𝒹, a measure of effect size.</li><li><strong>Explanation</strong>: Scaling the z-score by √2​ adjusts for the difference in variance between the standard normal distribution and the distribution of the effect size.</li></ul><h4>4. Calculating ln_OR and OR</h4><pre>ln_OR = (math.pi * d) / math.sqrt(3)<br>OR = math.exp(ln_OR)</pre><ul><li><strong>Purpose</strong>: Calculates the natural logarithm of the Odds Ratio and then exponentiates it to obtain the OR.</li><li><strong>Explanation</strong>: This relationship is derived from the logistic distribution’s properties, linking the effect size to the OR.</li></ul><h3>Example Usage</h3><pre>AUC = 0.7  # Example AUC value<br><br># Convert AUC to OR and retrieve all intermediate values<br>t, z, d, ln_OR, OR = AUC2OR(AUC, return_all=True)<br><br>print(f&quot;t: {t:.5f}, z: {z:.5f}, d: {d:.5f}, ln_OR: {ln_OR:.5f}, OR: {OR:.5f}&quot;)<br><br># Convert AUC to OR without intermediate values<br>OR = AUC2OR(AUC)<br>print(f&quot;OR: {OR:.5f}&quot;)</pre><p><strong>Output</strong>:</p><pre>t: 1.55176, z: 0.52400, d: 0.74105, ln_OR: 1.34411, OR: 3.83477<br>OR: 3.83477</pre><p><strong>Interpretation</strong>:</p><ul><li>𝓉: Intermediate value derived from AUC.</li><li>𝓏: Approximate z-score corresponding to the AUC.</li><li>𝒹: Cohen’s 𝒹, representing the effect size.</li><li><strong>ln_OR</strong>: Natural logarithm of the Odds Ratio.</li><li><strong>OR</strong>: An AUC of 0.7 corresponds to an Odds Ratio of approximately 3.83.</li><li>This means that the odds of a positive outcome are about 3.83 times higher given a positive test result.</li></ul><h3>Introducing MLstatkit’s AUC2OR Function</h3><p>To streamline this conversion process, <strong>MLstatkit</strong> provides the AUC2OR function, which encapsulates all the calculations we&#39;ve discussed.</p><h4>Using MLstatkit’s AUC2OR</h4><h4>Installation</h4><p>You can install MLstatkit using pip:</p><pre>pip install MLstatkit</pre><p>Implementation</p><pre>from MLstatkit.stats import AUC2OR<br><br>AUC = 0.7  # Example AUC value<br><br># Convert AUC to OR and retrieve all intermediate values<br>t, z, d, ln_OR, OR = AUC2OR(AUC, return_all=True)<br><br>print(f&quot;t: {t:.5f}, z: {z:.5f}, d: {d:.5f}, ln_OR: {ln_OR:.5f}, OR: {OR:.5f}&quot;)<br><br># Convert AUC to OR without intermediate values<br>OR = AUC2OR(AUC)<br>print(f&quot;OR: {OR:.5f}&quot;)</pre><p><strong>Output</strong>:</p><pre>t: 1.55176, z: 0.52400, d: 0.74105, ln_OR: 1.34411, OR: 3.83477<br>OR: 3.83477</pre><h3>Advantages of Using MLstatkit</h3><ul><li><strong>Simplicity</strong>: Provides a straightforward interface for converting AUC to OR.</li><li><strong>Efficiency</strong>: Optimized for performance and accuracy.</li><li><strong>Convenience</strong>: Eliminates the need to implement complex mathematical transformations manually.</li></ul><h3>Clinical Interpretation of the Results</h3><p>Converting AUC to OR allows for a more intuitive understanding of a diagnostic test’s effectiveness:</p><ul><li><strong>AUC of 0.7</strong>: Indicates a fair level of discrimination between positive and negative cases.</li><li><strong>OR of 3.83</strong>: Suggests that the odds of correctly identifying a positive case are nearly four times higher than misclassifying it.</li></ul><p>This information can aid clinicians in decision-making processes, such as evaluating the usefulness of a diagnostic test or the impact of a risk factor.</p><h3>Conclusion</h3><p>Understanding the relationship between AUC and Odds Ratio enhances the interpretability of model performance metrics in clinical contexts. By converting AUC to OR, we can translate statistical measures into actionable insights.</p><p>The <strong>AUC2OR</strong> function in <strong>MLstatkit</strong> simplifies this conversion, making it accessible for researchers and practitioners. Whether you’re evaluating diagnostic tests or comparing predictive models, this tool bridges the gap between statistical evaluation and clinical relevance.</p><h3>References</h3><ul><li>Hanley, J. A., &amp; McNeil, B. J. (1982). <em>The meaning and use of the area under a receiver operating characteristic (ROC) curve</em>. Radiology, 143(1), 29–36. <a href="https://doi.org/10.1148/radiology.143.1.7063747">https://doi.org/10.1148/radiology.143.1.7063747 IF: 12.1 Q1 B1 IF: 12.1 Q1 B1 IF: 12.1 Q1 B1 IF: 12.1 Q1 B1</a></li><li>Bamber, D. (1975). <em>The area above the ordinal dominance graph and the area below the receiver operating characteristic graph</em>. Journal of Mathematical Psychology, 12(4), 387–415. <a href="https://doi.org/10.1016/0022-2496(75)90001-2">https://doi.org/10.1016/0022-2496(75)90001-2 IF: 2.2 Q2 B4 IF: 2.2 Q2 B4 IF: 2.2 Q2 B4</a></li><li>Szumilas, M. (2010). <em>Explaining odds ratios</em>. Journal of the Canadian Academy of Child and Adolescent Psychiatry, 19(3), 227. PMID: <a href="https://pubmed.ncbi.nlm.nih.gov/20842279">20842279 IF: 2.9 Q2 NA</a></li><li>García, M. R., Sánchez, P., &amp; Alvarado, J. M. (2018). <em>Obtaining a Confidence Interval for AUC in Presence of Non-normality</em>. <strong>European Journal of Psychology Applied to Legal Context</strong>, 10(2), 49–53. <a href="https://doi.org/10.5093/ejpalc2018a5">https://doi.org/10.5093/ejpalc2018a5 IF: 7.6 Q1 B1 IF: 7.6 Q1 B1 IF: 7.6 Q1 B1 IF: 7.6 Q1 B1 IF: 7.6 Q1 B1</a></li></ul><h3>Additional Resources</h3><ul><li><strong>MLstatkit Documentation</strong>: <a href="https://github.com/yourusername/MLstatkit">GitHub Repository</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=111fcc17c172" width="1" height="1" alt=""><hr><p><a href="https://medium.com/statistics-in-machine-learning/converting-auc-to-odds-ratio-or-a-comprehensive-guide-using-python-and-mlstatkit-111fcc17c172">Converting AUC to Odds Ratio (OR): A Comprehensive Guide Using Python and MLstatkit</a> was originally published in <a href="https://medium.com/statistics-in-machine-learning">Statistics in Machine Learning</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Comparing ROC Curves in Machine Learning Model with DeLong’s Test: A Practical Guide Using Python…]]></title>
            <link>https://medium.com/statistics-in-machine-learning/comparing-roc-curves-in-machine-learning-model-with-delongs-test-a-practical-guide-using-python-e70b5d20abde?source=rss----c61491722dde---4</link>
            <guid isPermaLink="false">https://medium.com/p/e70b5d20abde</guid>
            <category><![CDATA[roc-curve]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[mlstatkit]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[delong-test]]></category>
            <dc:creator><![CDATA[Yong Zhen Huang]]></dc:creator>
            <pubDate>Wed, 16 Oct 2024 23:21:39 GMT</pubDate>
            <atom:updated>2024-10-13T19:04:24.550Z</atom:updated>
            <content:encoded><![CDATA[<h3>Comparing ROC Curves in Machine Learning Model with DeLong’s Test: A Practical Guide Using Python and MLstatkit</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*8Dl1hq0BS4Gwes3yTEj9lg.png" /></figure><h3>Introduction</h3><p>In binary classification tasks, the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) are fundamental metrics for evaluating model performance. When comparing two models, it’s essential to determine if the difference in their AUCs is statistically significant. <strong>DeLong’s test</strong> provides a statistical method to assess this significance.</p><p>In this article, we’ll delve into the principles and applications of DeLong’s test, explain how it’s implemented in Python using a provided code snippet, and introduce <strong>MLstatkit</strong>, a library offering a convenient and efficient implementation of DeLong’s test.</p><h3>Understanding DeLong’s Test</h3><h3>What is DeLong’s Test?</h3><p>DeLong’s test is a non-parametric statistical method used to compare the AUCs of two correlated ROC curves. It evaluates whether the observed difference between the AUCs of two models is statistically significant, accounting for the fact that the two models are tested on the same dataset and their predictions are therefore correlated.</p><h3>Why Use DeLong’s Test?</h3><p>When evaluating multiple classifiers on the same dataset, differences in AUC values might occur due to random chance rather than actual performance differences. DeLong’s test allows us to statistically assess whether one model significantly outperforms another.</p><ul><li><strong>Null Hypothesis (H</strong>₀<strong>)</strong>: The difference between the AUCs of the two models is zero (no significant difference).</li><li><strong>Alternative Hypothesis (H₁)</strong>: The difference between the AUCs is not zero (there is a significant difference).</li></ul><p>By calculating a p-value, DeLong’s test helps determine whether to reject the null hypothesis in favor of the alternative.</p><h3>Implementing DeLong’s Test in Python</h3><p>Below is the implementation of DeLong’s test in Python. We’ll explain each part of the code to help you understand how the test works.</p><h3>The Code:</h3><pre>import numpy as np<br>import scipy.stats<br><br>def Delong_test(true, prob_A, prob_B):<br>    &quot;&quot;&quot;<br>    Perform DeLong&#39;s test for comparing the AUCs of two models.<br><br>    Parameters<br>    ----------<br>    true : array-like of shape (n_samples,)<br>        True binary labels in range {0, 1}.<br>    prob_A : array-like of shape (n_samples,)<br>        Predicted probabilities by the first model.<br>    prob_B : array-like of shape (n_samples,)<br>        Predicted probabilities by the second model.<br><br>    Returns<br>    -------<br>    z_score : float<br>        The z score from comparing the AUCs of two models.<br>    p_value : float<br>        The p value from comparing the AUCs of two models.<br><br>    Example<br>    -------<br>    &gt;&gt;&gt; true = [0, 1, 0, 1]<br>    &gt;&gt;&gt; prob_A = [0.1, 0.4, 0.35, 0.8]<br>    &gt;&gt;&gt; prob_B = [0.2, 0.3, 0.4, 0.7]<br>    &gt;&gt;&gt; z_score, p_value = Delong_test(true, prob_A, prob_B)<br>    &gt;&gt;&gt; print(f&quot;Z-Score: {z_score}, P-Value: {p_value}&quot;)<br>    &quot;&quot;&quot;<br><br>    def compute_midrank(x):<br>        J = np.argsort(x)<br>        Z = x[J]<br>        N = len(x)<br>        T = np.zeros(N, dtype=np.float64)<br>        i = 0<br>        while i &lt; N:<br>            j = i<br>            while j &lt; N and Z[j] == Z[i]:<br>                j += 1<br>            T[i:j] = 0.5 * (i + j - 1)<br>            i = j<br>        T2 = np.empty(N, dtype=np.float64)<br>        T2[J] = T + 1<br>        return T2<br><br>    def compute_ground_truth_statistics(true):<br>        assert np.array_equal(np.unique(true), [0, 1]), &quot;Ground truth must be binary.&quot;<br>        order = (-true).argsort()<br>        label_1_count = int(true.sum())<br>        return order, label_1_count<br><br>    # Prepare data<br>    order, label_1_count = compute_ground_truth_statistics(np.array(true))<br>    sorted_probs = np.vstack((np.array(prob_A), np.array(prob_B)))[:, order]<br><br>    # Fast DeLong computation starts here<br>    m = label_1_count  # Number of positive samples<br>    n = sorted_probs.shape[1] - m  # Number of negative samples<br>    k = sorted_probs.shape[0]  # Number of models (2)<br><br>    # Initialize arrays for midrank computations<br>    tx, ty, tz = [np.empty([k, size], dtype=np.float64) for size in [m, n, m + n]]<br>    for r in range(k):<br>        positive_examples = sorted_probs[r, :m]<br>        negative_examples = sorted_probs[r, m:]<br>        tx[r, :], ty[r, :], tz[r, :] = [<br>            compute_midrank(examples) for examples in [positive_examples, negative_examples, sorted_probs[r, :]]<br>        ]<br><br>    # Calculate AUCs<br>    aucs = tz[:, :m].sum(axis=1) / (m * n) - (m + 1.0) / (2.0 * n)<br><br>    # Compute variance components<br>    v01 = (tz[:, :m] - tx[:, :]) / n<br>    v10 = 1.0 - (tz[:, m:] - ty[:, :]) / m<br><br>    # Compute covariance matrices<br>    sx = np.cov(v01)<br>    sy = np.cov(v10)<br>    delongcov = sx / m + sy / n<br><br>    # Calculating z-score and p-value<br>    l = np.array([[1, -1]])<br>    z = np.abs(np.diff(aucs)) / np.sqrt(np.dot(np.dot(l, delongcov), l.T)).flatten()<br>    p_value = scipy.stats.norm.sf(abs(z)) * 2<br><br>    z_score = -z[0].item()<br>    p_value = p_value[0].item()<br><br>    return z_score, p_value</pre><h3>Explaining the Implementation</h3><h4>1. Data Preparation</h4><p><strong>Inputs</strong>:</p><ul><li>true: True binary labels (0 or 1).</li><li>prob_A: Predicted probabilities from Model A.</li><li>prob_B: Predicted probabilities from Model B.</li></ul><p><strong>Ground Truth Statistics</strong>:</p><ul><li>The compute_ground_truth_statistics function checks that the true labels are binary and computes:</li><li>order: Indices that sort the true labels in descending order (positives first).</li><li>label_1_count: Number of positive samples.</li></ul><p><strong>Sorting Probabilities</strong>:</p><ul><li>sorted_probs: Predicted probabilities of both models sorted according to the true labels (positives first).</li></ul><h4>2. Midrank Computation</h4><p>The compute_midrank function calculates the midranks of the predicted probabilities, handling ties appropriately.</p><p><strong>Process</strong>:</p><ul><li><strong>Sorting</strong>: Sorts the scores and keeps track of the original indices.</li><li><strong>Ranking</strong>: Assigns ranks to the scores, averaging ranks for tied values.</li><li><strong>Adjustment</strong>: Adds 1 to the ranks to start ranking from 1 instead of 0.</li></ul><h4>3. Fast DeLong Computation</h4><p><strong>Variables</strong>:</p><ul><li>m: Number of positive samples.</li><li>n: Number of negative samples.</li><li>k: Number of models (2 in this case).</li></ul><p><strong>Midrank Arrays</strong>:</p><ul><li>tx: Midranks for positive examples for each model.</li><li>ty: Midranks for negative examples for each model.</li><li>tz: Midranks for all examples for each model.</li></ul><p><strong>Loop Over Models</strong>:</p><ul><li>For each model (r), compute midranks for positive, negative, and all examples.</li></ul><h4>4. AUC Calculation</h4><p><strong>AUC Calculation Formula:</strong></p><p>The AUC is calculated using the following formula:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/560/1*GwI-SThFOQJhp6k99P4eWw.png" /></figure><p>Where:</p><ul><li>AUC𝓇​: The AUC value for the rrr-th model.</li><li>𝓂: The number of positive samples.</li><li>𝓃: The number of negative samples.</li><li>tz𝓇,𝒾​: The total midrank of the 𝒾-th positive sample in the 𝓇-th model.</li></ul><p><strong>Implementation in Code:</strong></p><pre>aucs = tz[:, :m].sum(axis=1) / (m * n) - (m + 1.0) / (2.0 * n)</pre><p><strong>Explanation:</strong></p><ul><li>tz[:, :m].sum(axis=1): Calculates the sum of midranks for all positive samples in each model.</li><li>/ (m * n): Divides the sum by mnmnmn to normalize the AUC value.</li><li>- (m + 1.0) / (2.0 * n): Adjustment term to correct the bias in the AUC calculation.</li></ul><h4>5. Variance and Covariance Calculation</h4><p><strong>Z-Score Calculation Formula:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/486/1*pcxqh9pkDIoh8Qj59eAHnQ.png" /></figure><h4>Where:</h4><ul><li>AUC₁ and AUC₂​: The AUC values of the two models.</li><li>𝐥 = [1,−1]: The contrast vector representing the difference between the two models.</li><li>Cov(AUC): The covariance matrix of the AUC estimates.</li><li>𝓏: The standardized Z-score.</li></ul><p><strong>Implementation in Code:</strong></p><pre>l = np.array([[1, -1]])<br>z = np.abs(np.diff(aucs)) / np.sqrt(np.dot(np.dot(l, delongcov), l.T)).flatten()</pre><p><strong>Explanation:</strong></p><ul><li>np.diff(aucs): Computes the absolute difference between the AUCs of the two models, ∣AUC₁ and AUC₂∣</li><li>delongcov: The covariance matrix Cov(AUC).</li><li>np.dot(np.dot(l, delongcov), l.T): Calculates the weighted sum of variances and covariances.</li><li>np.sqrt(...): Takes the square root to obtain the standard deviation.</li><li>z: The resulting Z-score.</li></ul><h4>6. Z-Score and P-Value Computation</h4><p><strong>Z-Score</strong>:</p><ul><li>z_score = z[0].item(): Extracts the Z-score value.</li></ul><p><strong>P-Value</strong>:</p><ul><li>The two-tailed p-value is calculated using the standard normal distribution</li></ul><pre>p_value = scipy.stats.norm.sf(abs(z)) * 2<br>p_value = p_value[0].item()</pre><h4>Example Usage</h4><pre>true = [0, 1, 0, 1]<br>prob_A = [0.1, 0.4, 0.35, 0.8]<br>prob_B = [0.2, 0.3, 0.4, 0.7]<br><br>z_score, p_value = Delong_test(true, prob_A, prob_B)<br>print(f&quot;Z-Score: {z_score}, P-Value: {p_value}&quot;)</pre><p><strong>Output</strong>:</p><pre>Z-Score: 0.8660254037844385, P-Value: 0.3864762307712327</pre><p><strong>Interpretation</strong>:</p><ul><li><strong>Z-Score</strong>: A positive value indicates that Model A has a higher AUC than Model B. The value of 0.86600 represents the standardized difference between the two AUCs.</li><li><strong>P-Value</strong>: A p-value of 0.38650 is greater than the typical significance level of 0.05, which means we <strong>fail to reject</strong> the null hypothesis that the two models have equal AUCs.</li></ul><p><strong>Conclusion</strong>:</p><p>Based on the results of DeLong’s test, although Model A’s AUC is slightly higher than Model B’s, the difference is not statistically significant. Therefore, we cannot conclude that Model A outperforms Model B in terms of AUC.</p><h3>Introducing MLstatkit</h3><p>To simplify the process of performing DeLong’s test, we’ve developed <strong>MLstatkit</strong>, a Python library that provides statistical tools for machine learning evaluation, including an efficient implementation of DeLong’s test.</p><h4>Installing MLstatkit</h4><p>Install MLstatkit using pip:</p><pre>pip install MLstatkit</pre><h4>Using MLstatkit’s DeLong Test</h4><p>Here’s how to use the Delong_test function from MLstatkit:</p><pre>from MLstatkit.stats import Delong_test<br><br># Example data<br>true = [0, 1, 0, 1]<br>prob_A = [0.1, 0.4, 0.35, 0.8]<br>prob_B = [0.2, 0.3, 0.4, 0.7]<br><br># Perform DeLong&#39;s test<br>z_score, p_value = Delong_test(true, prob_A, prob_B)<br>print(f&quot;Z-Score: {z_score}, P-Value: {p_value}&quot;)</pre><p><strong>Output</strong>:</p><pre>Z-Score: 0.8660254037844385, P-Value: 0.3864762307712327</pre><p>The results are consistent with the previous implementation, demonstrating that MLstatkit provides a reliable and convenient method for performing DeLong’s test.</p><h4>Advantages of Using MLstatkit</h4><ul><li><strong>Simplicity</strong>: Provides a straightforward interface for performing DeLong’s test.</li><li><strong>Efficiency</strong>: Optimized for performance with large datasets.</li><li><strong>Reliability</strong>: Tested and validated against standard statistical methods.</li></ul><h3>Practical Example: Comparing Two Models</h3><p>Let’s demonstrate how to use MLstatkit to compare two classifiers on simulated data.</p><h4>Generating Simulated Data</h4><pre>import numpy as np<br>from scipy.stats import norm<br><br>np.random.seed(42)<br><br># Positive and negative class distributions<br>pos_dist = norm(loc=0.5, scale=1)<br>neg_dist = norm(loc=-0.5, scale=1)<br><br># Sample sizes<br>n_pos = 50<br>n_neg = 50<br><br># True labels<br>labels = np.concatenate([np.ones(n_pos), np.zeros(n_neg)])<br><br># Model predictions<br>scores_model1 = np.concatenate([pos_dist.rvs(n_pos), neg_dist.rvs(n_neg)])<br>scores_model2 = np.concatenate([pos_dist.rvs(n_pos), neg_dist.rvs(n_neg)])</pre><h4>Performing DeLong’s Test</h4><pre>from MLstatkit.stats import Delong_test<br><br>z_score, p_value = Delong_test(labels, scores_model1, scores_model2)<br><br>print(f&quot;Model 1 AUC: {roc_auc_score(labels, scores_model1):.4f}&quot;)<br>print(f&quot;Model 2 AUC: {roc_auc_score(labels, scores_model2):.4f}&quot;)<br>print(f&quot;Z-Score: {z_score:.4f}, P-Value: {p_value:.4f}&quot;)</pre><p><strong>Output</strong>:</p><pre>Model 1 AUC: 0.7180<br>Model 2 AUC: 0.7440<br>Z-Score: -0.3426, P-Value: 0.7319</pre><h4>Interpreting the Results</h4><ul><li><strong>AUC Values</strong>: Both models have high AUCs, with Model 2 slightly outperforming Model 1.</li><li><strong>Z-Score</strong>: The negative value indicates that Model 1 has a lower AUC than Model 2.</li><li><strong>P-Value</strong>: The p-value is greater than 0.05, indicating that the difference in AUCs is not statistically significant.</li></ul><p><strong>Conclusion</strong>:</p><p>Based on DeLong’s test, we conclude that there is no statistically significant difference between the performances of the two models.</p><h3>Conclusion</h3><p>Comparing ROC curves is crucial when evaluating classifier performance. DeLong’s test offers a statistically rigorous method for determining whether differences in AUCs are significant. Implementing DeLong’s test in Python allows for automated and repeatable analysis.</p><p><strong>MLstatkit</strong> simplifies this process, providing an accessible and efficient way to perform DeLong’s test and other statistical evaluations in machine learning workflows.</p><h3>References</h3><ul><li>DeLong, E. R., DeLong, D. M., &amp; Clarke-Pearson, D. L. (1988). <em>Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach</em>. Biometrics, 837–845. <a href="https://doi.org/10.2307/2531595">https://doi.org/10.2307/2531595 IF: 1.4 Q2 B4</a></li><li>Fawcett, T. (2006). <em>An introduction to ROC analysis</em>. Pattern recognition letters, 27(8), 861–874. <a href="https://doi.org/10.1016/j.patrec.2005.10.010">https://doi.org/10.1016/j.patrec.2005.10.010 IF: 3.9 Q2 B3 IF: 3.9 Q2 B3</a></li></ul><h3>Additional Resources</h3><ul><li><strong>MLstatkit Documentation</strong>: <a href="https://github.com/Brritany/MLstatkit">GitHub Repository</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e70b5d20abde" width="1" height="1" alt=""><hr><p><a href="https://medium.com/statistics-in-machine-learning/comparing-roc-curves-in-machine-learning-model-with-delongs-test-a-practical-guide-using-python-e70b5d20abde">Comparing ROC Curves in Machine Learning Model with DeLong’s Test: A Practical Guide Using Python…</a> was originally published in <a href="https://medium.com/statistics-in-machine-learning">Statistics in Machine Learning</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>