<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Atul Kumar on Medium]]></title>
        <description><![CDATA[Stories by Atul Kumar on Medium]]></description>
        <link>https://medium.com/@atulgoswami310?source=rss-e5ef82a1ed50------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/0*DZqProDYTtMSg5oj</url>
            <title>Stories by Atul Kumar on Medium</title>
            <link>https://medium.com/@atulgoswami310?source=rss-e5ef82a1ed50------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Fri, 22 May 2026 18:19:44 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@atulgoswami310/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[K-Nearest Neighbors (KNN)]]></title>
            <link>https://medium.com/@atulgoswami310/k-nearest-neighbors-knn-c039b235a983?source=rss-e5ef82a1ed50------2</link>
            <guid isPermaLink="false">https://medium.com/p/c039b235a983</guid>
            <dc:creator><![CDATA[Atul Kumar]]></dc:creator>
            <pubDate>Tue, 30 Dec 2025 19:11:31 GMT</pubDate>
            <atom:updated>2025-12-30T19:11:31.176Z</atom:updated>
            <content:encoded><![CDATA[<h3>Definition :</h3><p><strong>KNN (</strong>K-Nearest Neighbors)<strong>is a supervised machine learning algorithm</strong> used for:</p><ul><li><strong>Classification</strong> (most common)</li><li><strong>Regression</strong></li></ul><p>The idea behind KNN is simple:</p><p><em>Tell me who your neighbors are, and I will tell you who you are.</em></p><p>KNN does not create a mathematical model.<br>Instead, it stores the entire training data and makes decisions only when a<strong> </strong>prediction is needed.</p><p>What does “K” mean in KNN?</p><ul><li><strong>K</strong> = number of nearest neighbors to consider</li></ul><p>Example:</p><ul><li>K = 3 → look at 3 nearest points</li><li>K = 5 → look at 5 nearest points</li></ul><p>Usually, <strong>odd values of K</strong> are chosen to avoid tie.</p><p><strong>How KNN Works :</strong></p><p>When a new data point comes, KNN follow this steps:</p><ol><li>Calculate the distance between the new point and all training points</li><li>Select the <strong>K nearest neighbors</strong></li><li>Take a <strong>majority vote</strong> (classification)</li><li>Assign the most common class</li></ol><h4>Example:</h4><p><strong>Real-Life Intuition</strong></p><p>Imagine you move to a new city and want to know if a place is <strong>safe or unsafe</strong>.</p><p>You ask your <strong>5 nearest neighbors</strong>:</p><ul><li>If most say “safe” → you believe it is safe</li><li>If most say “unsafe” → you believe it is unsafe</li></ul><p>This is exactly how <strong>KNN works</strong>.</p><h4>Important Note About Distance:</h4><p>KNN depends completely on <strong>distance</strong>.</p><p>The most common distance used is Euclidean distance:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/303/1*L4mpXW7QQSQG1rJ8Eyg-cA.png" /></figure><p>In simple words:</p><ul><li>Smaller distance → more similarity</li><li>Larger distance → less similarity</li></ul><p><strong>Let’s understand this using a problem:</strong></p><p>Predict whether a student will <strong>Pass or Fail</strong> based on <strong>hours studied</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/626/1*5eLSAZOilOBPaxSeJ8ZHSw.png" /></figure><p># A new student studies <strong>5 hours</strong>.<br> Will the student pass or fail?</p><h4><strong>Solve this problem using following steps:</strong></h4><p><strong>Step 1: Import Required Libraries</strong></p><pre>import numpy as np<br>from sklearn.neighbors import KNeighborsClassifier</pre><p>Explanation:</p><ul><li>numpy → handles numerical data</li><li>Kneighboursclassifier → KNN classification model</li></ul><p><strong>Step 2: Prepare the Data</strong></p><pre>X = np.array([[1], [2], [3], [6], [7], [8]])<br>y = np.array([&#39;Fail&#39;, &#39;Fail&#39;, &#39;Fail&#39;, &#39;Pass&#39;, &#39;Pass&#39;, &#39;Pass&#39;])</pre><p>Explanation:</p><ul><li>x contains input features (hours studied)</li><li>y contains output labels</li><li>Double brackets create a 2D array</li></ul><p><strong>Step 3: Choose the Value of K</strong></p><pre>knn = KNeighborsClassifier(n_neighbors=3)</pre><p>Explanation:</p><ul><li>We choose K = 3</li><li>Model will consider 3 nearest neighbors</li></ul><p><strong>Step 4: Train the Model</strong></p><pre>knn.fit(X, y)</pre><p>Explanation:</p><ul><li>KNN simply stores the dataset</li><li>No real “learning” happens here</li></ul><p><strong>Step 5: Make a Prediction</strong></p><pre>new_student = np.array([[5]])<br>prediction = knn.predict(new_student)<br><br>print(&quot;Prediction:&quot;, prediction[0])</pre><p>Explanation:</p><ul><li>Input = 5 hours studied</li><li>Model finds nearest neighbors</li><li>Majority vote decides the result</li></ul><p>How KNN Makes This Decision</p><p>Nearest values to <strong>5 hours</strong>:</p><ul><li>3 → Fail</li><li>6 → Pass</li><li>7 → Pass</li></ul><p>Votes:</p><ul><li>Pass → 2</li><li>Fail → 1</li></ul><p>Final Prediction: <strong>PASS</strong></p><h4><strong>Some important points about KNN:</strong></h4><p><strong>Pros:</strong></p><ul><li>Easy to understand</li><li>No training time</li><li>Works well with small datasets</li><li>No assumptions about data</li></ul><p><strong>Cons:</strong></p><ul><li>Slow for large datasets</li><li>High memory usage</li><li>Sensitive to noisy data</li><li>Needs feature scaling</li></ul><h4>conclusion:</h4><p>KNN is not the fastest or smartest algorithm, but it is <strong>one of the best teachers</strong> in Machine Learning.</p><p>If you truly understand KNN:</p><ul><li>You understand distance</li><li>You understand classification</li><li>You understand prediction logic</li></ul><p>And that makes learning other algorithms much easier.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c039b235a983" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Decision Tree]]></title>
            <link>https://medium.com/@atulgoswami310/decision-tree-49c461dde00a?source=rss-e5ef82a1ed50------2</link>
            <guid isPermaLink="false">https://medium.com/p/49c461dde00a</guid>
            <category><![CDATA[decision-tree]]></category>
            <dc:creator><![CDATA[Atul Kumar]]></dc:creator>
            <pubDate>Mon, 29 Dec 2025 18:56:53 GMT</pubDate>
            <atom:updated>2025-12-29T18:56:53.287Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Introduction:</strong></h3><p>A <strong>Decision Tree</strong> is a <strong>supervised machine learning algorithm</strong> used for <strong>classification and prediction</strong>.</p><p>It works by:</p><ul><li>Asking <strong>simple questions</strong></li><li>Splitting data based on answers</li><li>Reaching a final decision</li></ul><p>Because of its <strong>easy structure and logic</strong>, decision trees are widely used in <strong>real-world applcation</strong></p><h3>Dfinition:</h3><p>A <strong>Decision Tree</strong> is a <strong>machine learning algorithm</strong> that makes decisions <strong>step by step</strong>, just like how humans think.</p><p>It works by asking <strong>simple questions</strong> and splitting data based on the answers.</p><h3>Real-life example:</h3><p><em>Should I play cricket today?</em></p><ul><li>Is it raining?</li><li>Yes → Don’t play</li><li>No →</li><li>Do I have free time?</li><li>Yes → Play</li><li>No → Don’t play</li></ul><p>This question–answer structure is exactly how a <strong>decision tree</strong> works</p><p><strong>Why Is It Called a “Tree”?</strong></p><p>Because it looks like a tree</p><ul><li><strong>Root node</strong> → first question</li><li><strong>Decision nodes</strong> → middle questions</li><li><strong>Leaf nodes</strong> → final answer (Yes / No)</li></ul><p><strong>Visuals graph:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/519/1*fi_iKrum6xrfmucg-Ps2DQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*epBrKokOHXm2dGegOg3EVg.png" /></figure><p><strong>It is Decision Trees Used in:</strong></p><ul><li>Exam pass / fail prediction</li><li>Loan approval</li><li>Spam detection</li><li>Medical diagnosis</li><li>Customer behavior analysis</li></ul><p>They are popular because they are <strong>easy to understand</strong>.</p><p><strong>Simple Problem We Will Solve</strong></p><p>Predict whether a student will PASS or FAIL based on hours studied</p><ul><li>1 → Pass</li><li>0 → Fail</li></ul><p><strong>Now we solve a problem and explain the code:</strong></p><p>Step 1: Import Required Libraries</p><pre>import numpy as np<br>import matplotlib.pyplot as plt<br>from sklearn.tree import DecisionTreeClassifier, plot_tree</pre><p>Explanation :</p><ul><li>numpy → helps work with numbers and arrays</li><li>matplotlib → used to draw the tree</li><li>decisiontreeclassifier → decision tree model</li><li>plot_tree → shows the tree visually</li></ul><p><strong>Step 2: Create Dataset</strong></p><pre># Input feature: Hours studied<br>X = np.array([[1], [2], [3], [4], [5], [6]])<br># Output label: 0 = Fail, 1 = Pass<br>y = np.array([0, 0, 0, 1, 1, 1])</pre><p><strong>Explanation:</strong></p><ul><li>X → hours studied</li><li>y → result (fail or pass)</li><li>Each row in X matches one value in y</li></ul><p><strong>Step 3: Create Decision Tree Model</strong></p><pre>model = DecisionTreeClassifier()</pre><p>Explanation:</p><ul><li>This line <strong>creates the decision tree</strong></li><li>At this point, the model is empty</li><li>It has not learned anything yet</li></ul><p><strong>Step 4: Train the Model</strong></p><pre>model.fit(X, y)</pre><p><strong>Explanation:</strong></p><ul><li>The model looks at x and y</li><li>Finds the <strong>best questions</strong> to split data</li><li>Learns rules like: If hours studied ≥ 4 → Pass”</li></ul><p><strong>Step 5: Make Predictions</strong></p><pre>predictions = model.predict(X)<br>print(predictions)</pre><ul><li>Model predicts pass/fail for given data</li><li>Output will be something like:</li></ul><p>[0 0 0 1 1 1]</p><ul><li>This matches our real data</li></ul><p><strong>Step 6: Predict for a New Student</strong></p><pre>hours = [[4.5]]<br>result = model.predict(hours)<br>print(&quot;Prediction:&quot;, result)</pre><p>Explanation:</p><ul><li>Predicts result for <strong>4.5 hours of study</strong></li><li>Output:</li><li>1 → Pass</li><li>0 → Fail</li></ul><p>Step 7: Visualize the Decision Tree :</p><pre>plt.figure(figsize=(10,6))<br>plot_tree(<br>    model,<br>    feature_names=[&quot;Hours Studied&quot;],<br>    class_names=[&quot;Fail&quot;, &quot;Pass&quot;],<br>    filled=True<br>)<br>plt.show()</pre><p><strong>Explanation (Line by Line):</strong></p><p>plt.figure(figsize=(10,6))</p><ul><li>Sets the size of the figure</li></ul><p>plt.figure(figsize=(10,6))</p><ul><li>Draws the decision tree</li></ul><p>feature_names=[“Hours Studied”]</p><ul><li>Names the input feature</li></ul><p>class_names=[“Fail”, “Pass”]</p><ul><li>Names output classes</li></ul><p>filled=True</p><ul><li>Adds colors for easy understanding</li></ul><p>plt.show()</p><ul><li>Displays the tree</li></ul><h3>What the Tree Shows</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/631/0*nWtmw3tDBCxxKDaJ.png" /></figure><ul><li>Top box → first decision</li><li>Left branch → Fail</li><li>Right branch → Pass</li><li>Final boxes → predictions</li></ul><p>How Decision Tree Makes Decisions</p><ol><li>Looks at all possible questions</li><li>Chooses the <strong>best split</strong></li><li>Repeats until data is clear</li><li>Gives final decision</li></ol><p>No math tension — just <strong>logic and comparison</strong>.</p><h4><strong>Some important point about Decision Tree</strong></h4><ul><li>Very easy to understand</li><li>No complex math</li><li>Works with numbers &amp; categories</li><li>Can overfit data</li><li>Not good for very large datasets</li><li>Small change in data can change tree</li></ul><h3>Thank you</h3><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=49c461dde00a" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Confusion Matrix]]></title>
            <link>https://medium.com/@atulgoswami310/confusion-matrix-6fa1be20f99c?source=rss-e5ef82a1ed50------2</link>
            <guid isPermaLink="false">https://medium.com/p/6fa1be20f99c</guid>
            <category><![CDATA[confusion-matrix]]></category>
            <dc:creator><![CDATA[Atul Kumar]]></dc:creator>
            <pubDate>Mon, 29 Dec 2025 17:40:28 GMT</pubDate>
            <atom:updated>2025-12-29T17:40:28.450Z</atom:updated>
            <content:encoded><![CDATA[<h3>Definition:</h3><p>A confusion<strong> matrix</strong> is a <strong>table</strong> used to check how good a classification<strong> </strong>model is.</p><p>When a machine learning model makes predictions, it can:</p><ul><li>Predict correctly</li><li>Predict wrongly</li></ul><h4>A confusion matrix helps us <strong>see these results clearly</strong>.</h4><p><strong>Why Is It Called “Confusion” Matrix?</strong></p><p>Because it shows:</p><ul><li>Where the model is <strong>confused</strong></li><li>Where it is <strong>correct</strong></li><li>Which type of mistake it makes more</li></ul><p>It is mainly used in <strong>classification problems</strong> like:</p><ul><li>Spam / Not Spam</li><li>Pass / Fail</li><li>Disease / No Disease</li></ul><h4>Let’s understand using diagram:</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*xck4D8o4BIf1LJoD-QgK_w.png" /></figure><p><strong>Example:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/805/1*KEV57SAUXMkNEsCQqVSwxQ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/879/1*iBkS9vh-ERp2bsc80VRLYQ.png" /></figure><h4>Understanding the 4 Terms</h4><p>1. True Positive (TP)</p><ul><li>Model says <strong>YES</strong></li><li>Actual answer is also <strong>YES</strong></li><li>Correct prediction</li></ul><p>Example:<br> Patient has disease → Model predicts disease</p><p>2. True Negative (TN)</p><ul><li>Model says <strong>NO</strong></li><li>Actual answer is also <strong>NO</strong></li><li>Correct prediction</li></ul><p>Example:<br> Email is not spam → Model predicts not spam</p><p>3. False Positive (FP)</p><ul><li>Model says <strong>YES</strong></li><li>Actual answer is <strong>NO</strong></li><li>Wrong prediction</li></ul><p>Example:<br> Email is not spam → Model predicts spam</p><p>4. False Negative (FN)</p><p>Model says <strong>NO</strong></p><ul><li>Actual answer is <strong>YES</strong></li><li>Wrong prediction</li></ul><p>Example:<br> Patient has disease → Model predicts no disease</p><h4><strong>Example:</strong></h4><p>lets understand with some problem</p><h4>Dataset :</h4><p>Let’s say we have:</p><ul><li>Actual results (y_true)</li><li>Model predictions (y_pred)</li></ul><pre>y_true = [1, 0, 1, 1, 0, 1, 0, 0]<br>y_pred = [1, 0, 0, 1, 0, 1, 1, 0]</pre><p>Where:</p><ul><li>1 → Positive</li><li>0 → Negative</li></ul><h4>Python Code for Confusion Matrix</h4><p>Step 1: Import Libraries</p><pre>import matplotlib.pyplot as plt<br>from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay</pre><h4>Explanation:</h4><ul><li>matplotlib → for plotting</li><li>confusion_ matrix → creates confusion matrix</li><li>confusionmatrixdosplay → displays it visually</li></ul><p>Step 2: Define Actual and Predicted Values</p><pre>y_true = [1, 0, 1, 1, 0, 1, 0, 0]<br>y_pred = [1, 0, 0, 1, 0, 1, 1, 0]</pre><h4>Explanation:</h4><ul><li>y_true → real answers</li><li>y_pred → model predictions</li></ul><p>Step 3: Create Confusion Matrix</p><pre>cm = confusion_matrix(y_true, y_pred)<br>print(cm)</pre><h4>Explanation:</h4><ul><li>This line compares <strong>actual vs predicted</strong></li><li>Output will look like:</li></ul><pre>[[3 1]<br> [1 3]]</pre><p>How to Read This Output</p><pre>[[TN  FP]<br> [FN  TP]]</pre><p>So here:</p><ul><li>TN = 3</li><li>FP = 1</li><li>FN = 1</li><li>TP = 3</li></ul><p>Step 4: Visualize Confusion Matrix</p><pre>disp = ConfusionMatrixDisplay(confusion_matrix=cm)<br>disp.plot()<br>plt.title(&quot;Confusion Matrix Example&quot;)<br>plt.show()</pre><ul><li>confusionmatrixdisplay→ prepares matrix for display</li><li>.plot → draws the matrix</li><li>plt.title → adds title</li><li>plt.shoe() → shows the graph</li></ul><h4>What the Graph Shows</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/533/0*y6s7MK6meVSUjvAZ.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/693/0*O5NXzRjaHIjGokYc.png" /></figure><ul><li>Each box shows <strong>number of predictions</strong></li><li>Diagonal boxes → correct predictions</li><li>Other boxes → mistakes</li></ul><h3>Thankyou</h3><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6fa1be20f99c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Logistic Regression]]></title>
            <link>https://medium.com/@atulgoswami310/logistic-regression-1114292b53c7?source=rss-e5ef82a1ed50------2</link>
            <guid isPermaLink="false">https://medium.com/p/1114292b53c7</guid>
            <dc:creator><![CDATA[Atul Kumar]]></dc:creator>
            <pubDate>Sun, 28 Dec 2025 18:03:04 GMT</pubDate>
            <atom:updated>2025-12-28T18:03:04.630Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Introduction:</strong></h3><p>Logistic Regression is one of the most important algorithms in <strong>Machine Learning</strong>, especially for <strong>classification problems</strong>. Even though its name contains the word “regression,” Logistic Regression is actually used to <strong>predict categories</strong>, not continuous values.</p><h4>Definition:</h4><p>Logistic Regression is a <strong>supervised machine learning algorithm</strong> used for <strong>binary classification</strong> problems.</p><p>Binary classification means:</p><ul><li>Output has <strong>only two possible values</strong></li><li>Examples:</li><li>Pass / Fail</li><li>Yes / No</li><li>Spam / Not Spam</li><li>Disease / No Disease</li><li>0 / 1</li></ul><p>Logistic Regression predicts the <strong>probability</strong> of an event happening and then converts it into a class.</p><h3>#<strong> Mathematical Foundation of Logistic Regression</strong></h3><h4>1️. Linear Equation (Same as Linear Regression):</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/148/1*-fmM0a7_65AkLQYapgybRA.png" /></figure><p>for multiple features:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/310/1*4IUBQYIY0uFe3hq7JkmmeA.png" /></figure><p>Where:</p><ul><li>x → input features</li><li>w → weights</li><li>b → bias</li><li>z → linear output (can be any real number)</li></ul><h4>2. The Logistic (Sigmoid) Function:</h4><p>Logistic Regression uses a special function called the <strong>Sigmoid Function</strong>.</p><h4>Sigmoid Function:</h4><p>The <strong>sigmoid function</strong> is a <strong>mathematical function</strong> that converts any real-valued number into a value <strong>between 0 and 1</strong>.<br> It has an <strong>S-shaped curve</strong> and is widely used in <strong>machine learning</strong>, especially in <strong>logistic regression</strong> and <strong>neural networks</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/199/1*oZmACj-kd6Y5gyIWb3bkwQ.png" /></figure><p><strong>Graph:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/970/1*5D_M0mBX5eUX7XGDJ1F4_A.png" /></figure><h3>How Logistic Regression Works :</h3><p><strong>Step 1: Take Input Features</strong></p><p>Example:</p><ul><li>Hours studied</li><li>Attendance</li><li>Previous marks</li></ul><p><strong>Step 2: Apply Linear Equation</strong></p><p>The model calculates a weighted sum of inputs.</p><p><strong>Step 3: Apply Sigmoid Function</strong></p><p>The result is converted into a <strong>probability</strong>.</p><p><strong>Step 4: Make Final Decision</strong></p><p>Based on threshold, output is <strong>0 or 1</strong>.</p><h3>Graphical Interpretation</h3><p>Unlike linear regression (straight line), logistic regression produces an <strong>S-shaped curve</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*7NUZOvBKi5YO--HU.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/725/0*RH2gI7KFnaC_WtWY.png" /></figure><p><strong>Why S-Shape?</strong></p><ul><li>Probability slowly increases at first</li><li>Then increases rapidly</li><li>Finally saturates near 1</li></ul><p>This behavior is perfect for <strong>classification problems</strong>.</p><h4><strong># Some important points abut logistic regression:</strong></h4><ul><li>Simple and easy to understand</li><li>Fast to train</li><li>Outputs probabilities</li><li>Works well for linearly separable data</li><li>Less computational power required</li><li>Cannot handle complex non-linear data</li><li>Sensitive to outliers</li><li>Requires feature engineering</li><li>Not suitable for multi-class problems (without extensions)</li></ul><p><strong>Some real-World Applications:</strong></p><ul><li>Medical diagnosis (disease: yes/no)</li><li>Credit approval systems</li><li>Spam email detection</li><li>Customer churn prediction</li><li>Fraud detection</li></ul><h4><strong># Here the problem statement and explaination using logistic regression:</strong></h4><p>We want to predict:</p><blockquote><strong><em>Will a student pass the exam or not based on hours studied?</em></strong></blockquote><ul><li>Input (x) → Hours studied</li><li>Output (y) → pass or fail</li><li>1 = Pass</li><li>0 = Fail</li></ul><p>This is a <strong>binary classification problem</strong>, so we use <strong>logistic regression</strong>.</p><h4>Step 1: Import Required Libraries:</h4><pre>import numpy as np<br>import matplotlib.pyplot as plt<br>from sklearn.linear_model import LogisticRegression</pre><ul><li>numpy → used for numerical calculations</li><li>matplotlib → used for plotting graph</li><li>logisticregression → logistic regression model from sklearn</li></ul><p><strong>Step 2: Create Dataset:</strong></p><pre># Input data (Hours studied)<br>X = np.array([1, 2, 3, 4, 5, 6]).reshape(-1, 1)<br><br># Output data (0 = Fail, 1 = Pass)<br>y = np.array([0, 0, 0, 1, 1, 1])</pre><ul><li>x contains hours studied</li><li>y contains result (fail or pass)</li><li>.reshape(-1, 1) is required because sklearn expects 2D input</li></ul><p><strong>🔹 Step 3: Create Logistic Regression Model:</strong></p><pre>model = LogisticRegression()</pre><h4>Explanation:</h4><ul><li>This line creates the logistic regression model</li><li>At this point, the model knows <strong>how</strong> to learn, but not <strong>what</strong> to learn</li></ul><h3>Step 4: Train the Model</h3><pre>model.fit(X, y)</pre><h4>Explanation:</h4><ul><li>The model learns the relationship between x and y</li></ul><p>Internally:</p><ul><li>Applies linear equation</li><li>Uses sigmoid function</li><li>Adjusts weights using gradient descent</li></ul><h4>Step 5: Make Predictions</h4><pre>y_pred = model.predict(X)</pre><h4>Explanation:</h4><ul><li>Predicts class values (0 or 1)</li><li>Uses probability + threshold (0.5)</li></ul><h4>Step 6: Get Prediction Probabilities</h4><pre>y_prob = model.predict_proba(X)</pre><h4>Explanation:</h4><ul><li>Gives probability for both classes</li><li>Example: [0.2 , 0.8)</li><li>20% → Fail</li><li>80% → Pass</li></ul><h4>Step 7: Visualize Logistic Regression Curve</h4><pre># Generate smooth values for curve<br>X_test = np.linspace(0, 7, 100).reshape(-1, 1)<br>y_test_prob = model.predict_proba(X_test)[:, 1]<br># Plot<br>plt.scatter(X, y, color=&#39;blue&#39;, label=&quot;Actual Data&quot;)<br>plt.plot(X_test, y_test_prob, color=&#39;red&#39;, label=&quot;Logistic Curve&quot;)<br>plt.xlabel(&quot;Hours Studied&quot;)<br>plt.ylabel(&quot;Probability of Passing&quot;)<br>plt.title(&quot;Logistic Regression Example&quot;)<br>plt.legend()<br>plt.show()</pre><h4>Explanation:</h4><ul><li>linespace() creates smooth input values</li><li>predict_proba(x_test)[: , 1 ] selects probability of class 1</li><li>Blue dots → real data</li><li>Red curve → logistic regression curve</li></ul><h4>Logistic Regression Curve</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/959/1*Kuj1dYHa24Z0D0Po8rqzgw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/947/1*kQEvO8org-80iPyI_SanMA.png" /></figure><h3>Curve Meaning:</h3><ul><li>Left side → low probability of passing</li><li>Middle → decision boundary</li><li>Middle → decision boundary</li><li>Right side → high probability of passing</li></ul><h4>Step 8: Predict for New Student</h4><pre>hours = [[4.5]]<br>result = model.predict(hours)<br>probability = model.predict_proba(hours)<br>print(&quot;Prediction:&quot;, result)<br>print(&quot;Probability:&quot;, probability)</pre><h4>✅ Explanation:</h4><ul><li>Predicts result for a student who studied <strong>4.5 hours</strong></li><li>Output:</li><li>1 → Pass</li><li>Probability shows confidence level</li></ul><h3>Final Output Meaning</h3><p>Example output:</p><pre>Prediction: [1]<br>Probability: [[0.18 0.82]]</pre><p>Model says:(output):</p><ul><li><strong>82% chance student will pass</strong></li><li>Final decision → <strong>PASS</strong></li></ul><h3>Thank you</h3><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1114292b53c7" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Linear regression]]></title>
            <link>https://medium.com/@atulgoswami310/linear-regression-051676fcee15?source=rss-e5ef82a1ed50------2</link>
            <guid isPermaLink="false">https://medium.com/p/051676fcee15</guid>
            <dc:creator><![CDATA[Atul Kumar]]></dc:creator>
            <pubDate>Sun, 28 Dec 2025 14:09:23 GMT</pubDate>
            <atom:updated>2025-12-28T14:09:23.070Z</atom:updated>
            <content:encoded><![CDATA[<h3>Introduction:</h3><p>Linear Regression is the <strong>foundation of Machine Learning</strong>. It helps beginners understand how data relationships work and how predictions are made. Even though it is simple, Linear Regression is still widely used in real-world applications. Learning it properly makes advanced machine learning concepts much easier to understand</p><h3>Definition<strong> :</strong></h3><p>Linear Regression is a <strong>supervised learning algorithm</strong> used to predict <strong>continuous values</strong>. Continuous values are numbers that can change smoothly, such as price, salary, marks, temperature, or distance.</p><h4>The main idea behind Linear Regression is very simple:</h4><p><strong>Find a straight line that best represents the relationship between input and output data</strong></p><h4>Examples are :</h4><ul><li>Study hours → Exam marks</li><li>Area of house → House price</li><li>Experience → Salary</li></ul><p>If we know the input, It helps us <strong>predict the output</strong></p><p><strong># Equation of linear Regression :</strong></p><p>It is based on simple mathematic equation which is straight line equation</p><h4><strong>y=mx+c</strong></h4><p>Let’s understand each term clearly:</p><ul><li><strong>y</strong> → Output value (what we want to predict)</li><li><strong>x</strong> → Input value (feature)</li><li><strong>m</strong> → Slope of the line</li><li><strong>c</strong> → Y-intercept (value of y when x = 0)</li></ul><p>The slope (<strong>m</strong>) tells us how much y changes when x increases by one unit.</p><h3>How Linear Regression Actually Works</h3><p>The algorithm follows these basic steps:</p><ol><li>Take input data (X) and output data (Y)</li><li>Assume a straight line</li><li>Predict output values using the line</li><li>Calculate the error (difference between actual and predicted values)</li><li>Adjust the line to reduce the error</li><li>Repeat the process until the error becomes minimum</li><li>This method of reducing error is called the <strong>Least Squares Method</strong></li></ol><p><strong>Visual Understanding of Linear Regression:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/666/1*fZN9VMykDGCii1jAA2BwdA.png" /></figure><p>In the graph:</p><ul><li>The dots represent <strong>actual data points</strong></li><li>The straight line represents the <strong>best-fit line</strong></li><li>The goal is to keep the line as close as possible to all points</li></ul><h3>Types of Linear Regression</h3><h4>1️. Simple Linear Regression</h4><ul><li>Only one input variable</li><li>Example: Study hours → Marks</li></ul><h4>2️. Multiple Linear Regression</h4><ul><li>More than one input variable</li><li>Example: Area + Rooms + Location → House price</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/938/1*wN0_oJoF1dTz3ouvGpaupA.png" /></figure><h3>Python Implementation of Linear Regression</h3><p><strong>Step 1: Import Required Libraries</strong></p><pre>import numpy as np<br>import matplotlib.pyplot as plt<br>from sklearn.linear_model import LinearRegression</pre><ul><li>numpy → Used for numerical calculations and arrays</li><li>matplotlib.pyplot → Used for data visualization (graphs)</li><li>linear Regression→ A built-in Linear Regression model from Scikit-learn</li></ul><p><strong>Step 2: Create Sample Dataset:</strong></p><pre>X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)<br>y = np.array([2, 4, 6, 8, 10])</pre><ul><li>x represents the <strong>input feature</strong></li><li>y represents the <strong>output values</strong></li><li>reshape(-1, 1) is required because Scikit-learn expects input data in a 2D format</li></ul><p>This dataset represents a simple relationship:</p><ul><li>When X increases, Y increases proportionally</li></ul><p><strong>Step 3: Create the Linear Regression Model:</strong></p><pre>model = LinearRegression()</pre><ul><li>This line creates a Linear Regression object</li><li>The model will automatically calculate the best values of <strong>slope (m)</strong> and <strong>intercept c</strong></li></ul><p><strong>Step 4: Train the Model:</strong></p><pre>model.fit(X, y)</pre><ul><li>fit() trains the model using input data (X) and output data (y)</li><li>During training, the model learns the relationship between X and y</li><li>It finds the best-fit line by minimizing error</li></ul><p><strong>Step 5: Make Predictions:</strong></p><pre>y_pred = model.predict(X)</pre><ul><li>This line uses the trained model to predict output values</li><li>y_pred contains predicted values based on the best-fit line</li></ul><p><strong>Step 6: Visualize the Result:</strong></p><pre>plt.scatter(X, y)<br>plt.plot(X, y_pred)<br>plt.xlabel(&quot;Input Feature (X)&quot;)<br>plt.ylabel(&quot;Output Value (Y)&quot;)<br>plt.title(&quot;Simple Linear Regression&quot;)<br>plt.show()</pre><ul><li>scatter() plots the actual data points</li><li>plot() draws the best-fit line</li><li>Labels and title make the graph readable</li><li>show() displays the graph</li></ul><p>I<strong>mportant points about Linear Regression:</strong></p><ul><li>Simple and easy to understand</li><li>Works well with small datasets</li><li>Fast and efficient</li><li>Good for trend prediction</li><li>Assumes a linear relationship</li><li>Sensitive to outliers</li><li>Cannot handle complex data patterns</li><li>Performance drops with non-linear data</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=051676fcee15" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[City Traffic Signal Violation Analysis]]></title>
            <link>https://medium.com/@atulgoswami310/city-traffic-signal-violation-analysis-827b0e3c69f3?source=rss-e5ef82a1ed50------2</link>
            <guid isPermaLink="false">https://medium.com/p/827b0e3c69f3</guid>
            <dc:creator><![CDATA[Atul Kumar]]></dc:creator>
            <pubDate>Fri, 26 Dec 2025 18:17:58 GMT</pubDate>
            <atom:updated>2025-12-26T18:17:58.413Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>overview :</strong></h3><p>This project analyzes city traffic signal violations using Python, Pandas, and Matplotlib to understand common traffic rule breaks and high-risk areas and vehicle-related patterns. The dataset includes details like date, city, signal ID, location, violation type, vehicle type, and violation count. After cleaning and inspecting the data, violations were analyzed by type, vehicle, signal, and location. Traffic signal violations play a significant role in causing road accidents in urban areas.<br> The goal of this analysis is to understand real-world traffic behavior and highlight how data-driven insights can help improve traffic safety and encourage better rule compliance.</p><p><strong>Dataset :</strong></p><p>A dataset is a collection of related data stored together in a structured form, usually in rows and columns, so that it can be easily analyzed</p><p><strong>To start, I created a small dataset in CSV format.</strong></p><h4><strong>It contains:</strong></h4><p>Date <br>City <br>Signal ID <br>Location <br>Violation type <br>Vehicle type <br>Violation count</p><h4><strong>Here the code with complete explanation :</strong></h4><pre>import pandas as pd<br>import matplotlib.pyplot as plt</pre><p>This code is used to import the required libraries for the project.</p><h4>pandas :</h4><p>It is used to work with the dataset. It helps in reading the data and performing analysis such as counting and summarizing traffic violations, in the form of table. The name pd is just a short form to make the code easier to write.</p><h4>matplotlib.pyplot:</h4><p>It is used to create graphs. It helps in drawing bar charts and other plots so that the traffic violation data can be understood visually. The name plt is a commonly used short form.<br>These two libraries are used together to analyze the data and show the results in a simple and clear way.<br> — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>df=pd.read_csv(&quot;indian_Traffic_violation.csv&quot;)</pre><p><strong>This line is used to read the dataset file.</strong></p><ul><li>df=pd.read_csv(“indian_Traffic_violation.csv”),loads data from a CSV file</li><li>The file (indian_Traffic_violation.csv )contains traffic signal violation records</li><li>The data is stored in a Pandas DataFrame named df</li><li>A DataFrame works like a table with rows and columns</li><li>After loading the data, it becomes easy to analyze and visualize</li><li>— — — — — — — — — — — — — — — — — — — — — — — — — — — — —</li></ul><pre>df.head()</pre><ul><li>This function is used to display the first few rows of the dataset.</li><li>By default, it shows the first 5 rows.</li><li>It helps in quickly checking whether the data has loaded correctly.</li><li>It gives an idea about the columns and sample values.</li></ul><p><strong>It looks like:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/864/1*KYi9AvRAfPXdyQG2MKzfjg.png" /></figure><p><strong>In this table:</strong><br> → date : shows the day when the traffic violation was recorded.<br> → city : shows the name of the city where the violation occurred.<br> → signal_id : represents the unique ID of the traffic signal.<br> → location : shows the area where the traffic signal is located.<br> → violation_type : tells the type of traffic rule that was broken.<br> → vehicle_type : shows the type of vehicle involved in the violation.<br> → violation_count : shows the total number of violations recorded.<br> — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>df.info()</pre><p><strong>This function is used to :</strong></p><ul><li>Shows the total number of rows and columns in the dataset</li><li>Displays the names of all columns</li><li>Shows the data type of each column</li><li>Tells how many non-null values are present in each column</li><li>Helps in identifying missing values</li><li>Gives a clear overview of the dataset structure</li></ul><p><strong>It looks like:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/445/1*5RfqoaJrWuEBqqQsCHIGRQ.png" /></figure><p>→ The dataset is a Pandas DataFrame<br> → It contains a total of 20 rows<br> → The row index starts from 0 and ends at 19<br> → There are 7 columns in the dataset<br> → All columns have 20 non-null values, so there are no missing values<br> — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>violation_counts = df[&#39;violation_type&#39;].value_counts()<br>violation_counts</pre><ul><li>This code counts how many times each type of traffic violation appears -in the dataset</li><li>value_count is used to calculate the frequency of each violation type</li><li>The result is stored in a variable named violation_count</li><li>Writing violation_count displays the counted values</li><li>It helps identify the most common traffic violations</li></ul><p>It looks like:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/258/1*C_nROOAU4dz_sPw3afdnhg.png" /></figure><p>— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>vehicle_counts = df[&#39;vehicle_type&#39;].value_counts()<br>vehicle_counts</pre><p><strong>This code counts how many times each type of vehicle appears in the dataset</strong></p><ul><li>value_count calculates the frequency of each vehicle type</li><li>The result is stored in a variable named vehicle_counts</li><li>Writing vehicle_count displays the counted values</li><li>It helps understand which vehicles are involved in more traffic violations</li></ul><p>The output is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/264/1*PbWn8DZefC3t8FCDzzQLYw.png" /></figure><p>— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>signal_counts = df[&#39;signal_id&#39;].value_counts()<br>signal_counts</pre><p><strong>This code counts how many times each traffic signal ID appears in the dataset</strong></p><ul><li>value_count is used to calculate the frequency of each signal ID</li><li>The result is stored in a variable named signal_counts</li><li>Writing signal_counts displays the counted values</li><li>It helps identify which traffic signals have more violations</li></ul><p>The output is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/263/1*Ex8ZjRzwtFzxOcoECPhuBA.png" /></figure><p>— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>location_counts = df[&#39;location&#39;].value_counts()<br>location_counts</pre><p><strong>This code counts how many times each location appears in the dataset</strong></p><ul><li>value_counts is used to calculate the frequency of each location</li><li>The result is stored in a variable named location_counts</li><li>Writing Location_counts displays the counted values</li><li>It helps identify locations where traffic violations occur more frequently</li></ul><p>The output is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/254/1*qIq98oYc1lfhV_Ctw6e4kA.png" /></figure><p>— — — — — — — — — — — — — — — — — — — — — — — — — — — —-</p><pre>violation_counts.plot(kind=&#39;bar&#39;)<br>plt.title(&quot;Traffic Violations by Type&quot;)<br>plt.xlabel(&quot;Violation Type&quot;)<br>plt.ylabel(&quot;Number of Violations&quot;)<br>plt.xticks(rotation=45)<br>plt.tight_layout()<br>plt.show()</pre><ul><li><strong>violation_counts.plot(kind=’bar’)</strong><br> Creates bar chart</li><li><strong>plt.title(“Traffic Violations by Type”)<br></strong> Sets chart title</li><li><strong>plt.xlabel(“Violation Type”)</strong><br> Labels X-axis</li><li><strong>plt.ylabel(“Number of Violations”)</strong><br>Labels Y-axis</li><li><strong>plt.xticks(rotation=45)<br></strong>Rotates labels</li><li><strong>plt.tight_layout()<br></strong>Adjusts spacing</li><li><strong>plt.show()<br></strong>Displays chart</li></ul><p>Graph:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/936/1*9APPTB2Sni5cBWdLyiLtbg.png" /></figure><p><strong>The graph shows different types of traffic violations.</strong></p><ul><li>Speeding is the most common violation.</li><li>Red light jumping is the second most common violation.</li><li>No helmet, signal jumping, and wrong lane violations occur at similar levels.</li><li>No seatbelt is the least common violation.</li></ul><p>— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>vehicle_counts.plot(kind=&#39;bar&#39;)<br>plt.title(&quot;Violations by Vehicle Type&quot;)<br>plt.xlabel(&quot;Vehicle Type&quot;)<br>plt.ylabel(&quot;Count&quot;)<br>plt.show()</pre><ul><li><strong>vehicle_counts.plot(kind=’bar’)</strong><br> Creates a bar chart using vehicle violation data</li><li><strong>plt.title(“Violations by Vehicle Type”)<br></strong> Sets the title of the graph</li><li><strong>plt.xlabel(“Vehicle Type”)<br></strong> Labels the X-axis as vehicle type</li><li><strong>plt.ylabel(“Count”)</strong><br> Labels the Y-axis as number of violations</li><li><strong>plt.show()</strong><br> Displays the graph on the screen</li></ul><p>Graph :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*zokKLtLmUS88MKYm1J-E8g.png" /></figure><p><strong>The graph shows traffic violations by vehicle type.</strong></p><ul><li>Bikes are involved in the highest number of violations.</li><li>Cars are the second highest in violation count.</li><li>Autos have fewer violations compared to bikes and cars.</li><li>Buses have the least number of violations.</li></ul><p>— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><pre>violation_counts.idxmax()</pre><p><strong>This code tells us which traffic violation happens the most.</strong></p><ul><li>according to graph speeding is the most traffic violation happend <br>-— — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</li></ul><pre>location_counts.idxmax()</pre><p><strong>This code tells us which location has the most traffic violations.</strong><br>And according to graph this location is ‘Connaught Place’<br> — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><h3><strong>Conclusions :</strong></h3><p>Red Light Jumping is the most common traffic violation.<br>Bikes are involved in the highest number of violations.<br>Connaught Place and Kashmere Gate are high-risk locations.<br>Repeated violations at the same signals indicate the need for stricter monitoring.<br>Traffic safety awareness and enforcement should be improved during peak hours.</p><p>- — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=827b0e3c69f3" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>