<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Amar katare on Medium]]></title>
        <description><![CDATA[Stories by Amar katare on Medium]]></description>
        <link>https://medium.com/@amarkatare2004?source=rss-da1a8d067831------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*zZnjB9W7WlF7IDOC</url>
            <title>Stories by Amar katare on Medium</title>
            <link>https://medium.com/@amarkatare2004?source=rss-da1a8d067831------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 24 May 2026 02:26:17 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@amarkatare2004/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Mastering Logistic Regression: Step-by-Step Implementation in Python with Visualisation .]]></title>
            <link>https://medium.com/@amarkatare2004/mastering-logistic-regression-step-by-step-implementation-in-python-267570fc0934?source=rss-da1a8d067831------2</link>
            <guid isPermaLink="false">https://medium.com/p/267570fc0934</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[algortihms]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[logistic-regression]]></category>
            <dc:creator><![CDATA[Amar katare]]></dc:creator>
            <pubDate>Thu, 01 May 2025 20:17:11 GMT</pubDate>
            <atom:updated>2025-05-04T06:04:21.202Z</atom:updated>
            <content:encoded><![CDATA[<p>Logistic Regression is one of the most fundamental yet powerful algorithms in machine learning for binary classification. In this article, we’ll dive deep into how it works — from the theory and maths to a complete implementation in Python without using any ML libraries like Scikit-learn.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MVFxeivE7cBh6oQVVbaVVQ.png" /></figure><p>Before going to dive deep , we need to understand some related terminologies , these are following :</p><blockquote><strong>Sigmoid Function</strong>:<br> A mathematical function that maps any real number input value to a value between 0 and 1 — used to convert outputs into probabilities. The given formula is to calculate Sigmoid function.</blockquote><figure><img alt="Sigmoid Function" src="https://cdn-images-1.medium.com/max/198/1*ndex71fYRj9GpPiB2ShOgg.png" /></figure><p><strong><em>Where x</em></strong><em> = input feature vector .<br> </em><strong><em>w</em></strong><em> = weight vector.<br></em><strong><em>w^T .x</em></strong><em>= dot product of weights and inputs.<br></em>Eg. consider Input vector x=[1,75]x = [1, 75]x=[1,75]<br>Weight vector w=[0.5,0.01]w = [0.5, 0.01]w=[0.5,0.01]<br> wTx=(0.5×1)+(0.01×75)=0.5+0.75=1.25<br>Now apply to Sigmoid:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/527/1*RpJvndye9lPqk9yByrEJmQ.png" /></figure><p>Here we get output as 0.77 for our input 75.</p><blockquote><strong>2. Decision Boundary :<br></strong>In logistic regression, the <strong>decision boundary </strong>is the point where the model changes its prediction from one class to another (like from class 0 to class 1). It divides the feature space into two areas: one where the model predicts class 0, and the other where it predicts class 1.</blockquote><p>In logistic regression, a <strong>threshold value</strong> is set (typically 0.5). If the predicted probability is greater than or equal to this threshold, the input is classified as <strong>class 1</strong>. If the predicted probability is smaller than the threshold, the input is classified as <strong>class 0</strong>.<br> According to this the above input will be classified into <strong>class 1.</strong></p><blockquote><strong>3 </strong>.<strong>Cost Function and Gradient Descent:<br></strong>In logistic regression, we use a <strong>cost function</strong> to measure how well the model’s predicted values align with the actual outcomes. Since logistic regression deals with <strong>binary classification</strong>, we cannot use the normal squared error (as in linear regression). Instead, we use the <strong>log loss</strong> or <strong>binary cross-entropy</strong> cost function.</blockquote><p><strong>Cost Function Formula:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/666/1*YzbetD5jOO9q1Dqz5sDBHg.png" /></figure><p>To minimize this cost, we use <strong>Gradient Descent</strong>, an iterative optimization algorithm that adjusts the model’s parameters to reduce the cost.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/666/1*KxaJNqf7piNeHgoTUqsT1A.png" /></figure><h4>Implementation for Each step in Python:</h4><p>We will now implement each step of logistic regression from scratch, without using any external libraries like sklearn.</p><ol><li>To calculate <strong>Sigmoid Function</strong> (For hypothesis):</li></ol><pre>def hypothesis(X,theta):<br>    return Sigmoid(np.dot(X,theta))<br><br>def Sigmoid(z):<br>    return 1/(1+np.exp(-z))</pre><p>2 . classifies hypothesis based on <strong>Decision Boundary:</strong></p><pre>def predict(X,theta):<br>    prob=hypothesis(X,theta)<br>    return (prob&gt;=0.5)</pre><p>3. Updating Parameters by <strong>Gradient Descent :</strong></p><pre>def gradient_descent(X, y, theta, lr=100, epochs=1000):<br>    m = len(y)<br>    for i in range(epochs):<br>        h = hypothesis(X, theta)<br>        gradient = (1 / m) * np.dot(X.T, (h - y))<br>        theta -= lr * gradient<br>    return theta</pre><blockquote><em>Below is the complete implementation of Logistic Regression from scratch, combining all the steps we discussed earlier.</em></blockquote><pre>import numpy as np<br>import pandas as pd<br>from sklearn.model_selection import train_test_split<br>from sklearn.preprocessing import LabelEncoder, StandardScaler<br><br># Load dataset<br>df = pd.read_csv(&quot;Customer_Data.csv&quot;)<br><br># Encode &#39;Gender&#39;<br>le = LabelEncoder()<br>df[&#39;Gender_encoded&#39;] = le.fit_transform(df[&#39;Gender&#39;])  # Male=1, Female=0<br><br># Extract features (X) and label (y)<br>X = df[[&#39;Age&#39;, &#39;EstimatedSalary&#39;, &#39;Gender_encoded&#39;]].values<br>y = df[&#39;Purchased&#39;].values.reshape(-1, 1)<br><br># Normalize X<br>scaler = StandardScaler()<br>X = scaler.fit_transform(X)<br><br># Add intercept term<br>X = np.hstack([np.ones((X.shape[0], 1)), X])  # Shape becomes (m, n+1)<br><br># Initialize parameters (theta)<br>theta = np.zeros((X.shape[1], 1))  # Shape: (n+1, 1)<br># Sigmoid function<br>def sigmoid(z):<br>    return 1 / (1 + np.exp(-z))<br><br># Hypothesis<br>def hypothesis(X, theta):<br>    return sigmoid(np.dot(X, theta))<br><br><br># Gradient Descent<br>def gradient_descent(X, y, theta, lr=100, epochs=1000):<br>    m = len(y)<br>    for i in range(epochs):<br>        h = hypothesis(X, theta)<br>        gradient = (1 / m) * np.dot(X.T, (h - y))<br>        theta -= lr * gradient<br>    return theta<br><br># Predict using threshold = 0.5<br>def predict(X, theta):<br>    probs = hypothesis(X, theta)<br>    return (probs &gt;= 0.5) <br><br># Split data<br>X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)<br><br># Train<br>theta_final = gradient_descent(X_train, y_train, theta)<br><br># Predict<br>y_pred = predict(X_test, theta_final)<br><br># Accuracy<br>accuracy = np.mean(y_pred == y_test)<br>print(&quot;Final Accuracy:&quot;, accuracy.__round__(2)*100,&quot;%&quot;)</pre><p><strong><em>To download the CSV file used in this implementation </em></strong><a href="https://github.com/amarkatare/Machine-Learning/blob/main/Customer_Data.csv"><strong><em>click Here</em></strong></a></p><p>Evaluation of Model:</p><pre>Final Accuracy: 91.0 %</pre><p>The accuracy of the model is 91%, meaning approximately 91% of the predictions made by the model on the test data are correct.</p><p>The model predicts whether a customer will make a purchase based on features such as <strong>Gender</strong>, <strong>Age</strong>, and <strong>Estimated Salary</strong>. To enhance understanding and visualisation, we can utilize the <strong>Weka tool</strong> for graphical representation and analysis.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/556/1*Kg8NUCweiJcrvH1ibNvYPg.png" /></figure><blockquote>The plot matrix shows how features like <strong>Gender</strong>, <strong>Age</strong>, and <strong>Estimated Salary</strong> relate to the <strong>Purchased</strong> outcome. Generated using Weka, it highlights how <strong>logistic regression</strong> separates customers who purchased (orange) from those who didn’t (blue).<br> <strong>Age</strong> and <strong>Estimated Salary</strong> are strong predictors — purchasers tend to be older and earn more.<br> <strong>Gender</strong> has little impact, as data points are evenly spread.<br>The model’s decision boundary is visible through the clear separation in the scatterplots.</blockquote><p><strong>Advantages of Logistic Regression :</strong></p><ul><li>Simple and easy to implement.</li><li>Interpretable coefficients for understanding feature impact.</li><li>Works well for binary classification problems.</li><li>Efficient to train and computationally less expensive.</li></ul><p><strong>Disadvantages of Logistic Regression:</strong></p><ul><li>Assumes linear relationship between features and the log-odds.</li><li>Not suitable for complex relationships without feature engineering.</li></ul><p><strong>Applications of Logistic Regression:</strong></p><ul><li>Medical diagnosis (e.g., predicting diseases like diabetes or cancer).</li><li>Credit scoring and risk assessment in finance.</li><li>Customer purchase prediction in marketing.</li><li>Email spam detection.</li><li>Customer churn prediction.</li><li>Voting behaviour analysis.</li><li>Fraud detection in banking and e-commerce.</li></ul><p><strong>Thank you for reading!</strong> If you found this article helpful, feel free to show your support by clapping (up to 50 times). Let’s connect on<em> </em><a href="https://www.linkedin.com/in/amar-katare-20337a315/"><strong><em>LinkedIn</em></strong></a><strong><em> .</em></strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=267570fc0934" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>