<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Ayushman chaurasia on Medium]]></title>
        <description><![CDATA[Stories by Ayushman chaurasia on Medium]]></description>
        <link>https://medium.com/@ayushmanchaurasia7366?source=rss-6e4a1e552b93------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*dmbNkD5D-u45r44go_cf0g.png</url>
            <title>Stories by Ayushman chaurasia on Medium</title>
            <link>https://medium.com/@ayushmanchaurasia7366?source=rss-6e4a1e552b93------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 27 May 2026 23:11:47 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@ayushmanchaurasia7366/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Transformation architecture]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/transformation-architecture-6b06a08f85e9?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/6b06a08f85e9</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[transformation]]></category>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Tue, 30 Dec 2025 18:07:41 GMT</pubDate>
            <atom:updated>2025-12-31T04:02:21.952Z</atom:updated>
            <content:encoded><![CDATA[<h3>Transformer Networks: A Simple Guide to How AI Understands Language</h3><p>Transformers have completely changed how artificial intelligence understands and works with language. They are used in translation apps, chatbots, and smart tools like GPT. What makes transformers special is that they try to understand language the way humans do — by focusing on the most important words instead of treating every word the same.</p><h3>1. What Are Transformer Networks?</h3><p>Transformers are a type of neural network designed to work with sequences, such as sentences or paragraphs. Older models processed text word by word, but transformers look at <strong>the entire sentence at once</strong>.</p><p>They do this using something called <strong>self-attention</strong>, which allows the model to understand relationships between words, even if those words are far apart in the sentence.</p><p>Transformers were introduced in <strong>2017</strong> in a famous research paper called <strong>“Attention Is All You Need”</strong>. This paper changed how AI models are built for tasks like translation, summarizing text, and generating sentences.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TuJmwwWKal1kU9WW1xWpMg.png" /></figure><h3>🔎 Why Are Transformers So Important?</h3><ul><li>They process text faster by working in parallel</li><li>They understand long and complex sentences better</li><li>They are the base of modern AI models like <strong>GPT and BERT</strong></li></ul><h3>2. The Main Idea: Attention</h3><p>The key idea behind transformers is <strong>attention</strong>.</p><p>Attention means the model learns to decide <strong>which words matter the most</strong> in a sentence.</p><h3>Example</h3><p>When you read the sentence:<br> <strong>“I saw a huge dog at the park”</strong>,<br> you naturally focus more on <strong>“huge dog”</strong> because those words carry the main meaning.</p><p>Transformers try to do the same thing — they give more importance to meaningful words.</p><h3>3. Transformer Architecture (Big Picture)</h3><p>Transformers are built using two main parts:</p><ul><li><strong>Encoder</strong> — understands the input sentence</li><li><strong>Decoder</strong> — creates the output sentence</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/316/1*fluyCogPR1eEIdoA67yyAA.png" /></figure><h3>How It Works:</h3><ol><li>The input sentence goes into the <strong>encoder</strong></li><li>The encoder understands the meaning and context</li><li>The <strong>decoder</strong> uses that understanding to generate output step by step</li></ol><p>The encoder and decoder communicate using attention layers.</p><h3>4. Inside the Encoder</h3><h3>🧱 Encoder Structure</h3><p>Each encoder block has three main parts:</p><ol><li><strong>Multi-Head Self-Attention</strong> — finds relationships between words</li><li><strong>Feed-Forward Neural Network</strong> — processes the information</li><li><strong>Add &amp; Normalize Layers</strong> — keep the model stable and accurate</li></ol><p>A standard transformer uses <strong>6 encoder layers</strong>, stacked one after another. Each layer improves the understanding of the sentence.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*AdJHoVeHX94vbDfH7-HMKQ.png" /></figure><h3>5. Preparing Text for the Encoder</h3><p>Before the encoder can work, raw text must be converted into numbers.</p><h3>✉️ A. Tokenization</h3><p>The sentence is broken into smaller pieces called tokens.</p><p>Example:<br> “I love reading books”<br> → [&quot;I&quot;, &quot;love&quot;, &quot;reading&quot;, &quot;books&quot;]</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/794/1*46N03pTWXSHm1MaLIdCCvg.jpeg" /></figure><h3>🔢 B. Word Embeddings</h3><p>Computers don’t understand words — only numbers.<br> So, each word is turned into a <strong>numeric vector</strong> that represents its meaning.</p><p>Similar words have similar vectors. These vectors can be large, like 512 or 768 numbers long.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uATTt40gbJ1HJQgIqE-VPA.png" /></figure><h3>📍 C. Positional Encoding</h3><p>Because transformers read all words at once, they don’t automatically know word order.</p><p>Positional encoding adds information about <strong>where each word appears</strong> in the sentence, so the model knows which word comes first, second, and so on.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*VG8fWHIy9wJpxa-NYXra2Q.png" /></figure><h3>6. Multi-Head Self-Attention (The Heart of Transformers)</h3><p>This is the most important part of a transformer.</p><p>For every word, the model creates three vectors:</p><ul><li><strong>Query (Q)</strong> — what the word is looking for</li><li><strong>Key (K)</strong> — what the word offers</li><li><strong>Value (V)</strong> — the actual information</li></ul><p>The model compares these vectors to decide <strong>how much attention one word should give to another</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BQ_UGdiWWgapJniSFcGNkA.png" /></figure><h3>Example</h3><p>In “I love reading books”, the word <strong>“I”</strong> is more related to <strong>“love”</strong> than <strong>“books”</strong>, so it gives more attention to “love”.</p><h3>7. Add &amp; Normalization Layers</h3><p>After attention and feed-forward steps:</p><ul><li><strong>Add</strong> keeps original information using shortcut connections</li><li><strong>Normalize</strong> keeps numbers stable during training</li></ul><p>This helps the model learn better and avoid mistakes.</p><h3>8. Understanding the Decoder</h3><p>The decoder is similar to the encoder but has one extra job — <strong>generating text word by word</strong>.</p><h3>Decoder Layers Include:</h3><ul><li><strong>Masked Self-Attention</strong> — stops the model from seeing future words</li><li><strong>Cross-Attention</strong> — connects decoder with encoder output</li><li><strong>Feed-Forward Network</strong></li><li><strong>Add &amp; Normalize layers</strong></li></ul><p>Masked attention ensures the model predicts words <strong>one at a time</strong>, just like humans speak.</p><h3>9. Final Output: Softmax Layer</h3><p>At the end, the decoder suggests possible next words.</p><ul><li><strong>Softmax</strong> converts scores into probabilities</li><li>The word with the highest probability is chosen</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/570/1*Hcueagt-gthZzD7f1ZPjOw.png" /></figure><h3>Example:</h3><ul><li>“padhna” → 50% chance</li><li>“pasand” → 4% chance</li></ul><p>The model selects <strong>“padhna”</strong>.</p><pre>| Component            | What It Does                  |<br>| -------------------- | ----------------------------- |<br>| Attention            | Focuses on important words    |<br>| Positional Encoding  | Adds word order               |<br>| Encoder              | Understands input             |<br>| Decoder              | Generates output              |<br>| Multi-Head Attention | Finds different relationships |<br>| Softmax              | Chooses next word             |</pre><h3>Why Transformers Changed AI</h3><p>Transformers removed the need for older models like RNNs and LSTMs. They made AI faster, smarter, and better at understanding long sentences. Today, almost all modern language AI systems are built using transformers</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6b06a08f85e9" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Hostel Occupancy Data Analysis Using Python (Beginner Project)]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/hostel-occupancy-data-analysis-using-python-beginner-project-0a3f45ff30bb?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/0a3f45ff30bb</guid>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Tue, 30 Dec 2025 17:12:12 GMT</pubDate>
            <atom:updated>2025-12-30T17:12:12.843Z</atom:updated>
            <content:encoded><![CDATA[<p>Data analysis doesn’t always require huge datasets or complex models.<br> Sometimes, <strong>simple data + clear logic</strong> can already provide meaningful insights.</p><p>In this beginner-friendly project, I analyze <strong>hostel room occupancy data</strong> using Python.<br> The goal is to understand:</p><ul><li>How many beds are occupied</li><li>How many are vacant</li><li>Which rooms are underutilized</li><li>Overall occupancy trends</li></ul><p>This project is ideal for students who are starting with <strong>Python, NumPy, Pandas, and Matplotlib</strong>.</p><h3>Tools and Libraries Used</h3><p>We use three core Python libraries:</p><ul><li><strong>NumPy</strong> — for numerical and statistical calculations</li><li><strong>Pandas</strong> — for handling and analyzing tabular data</li><li><strong>Matplotlib</strong> — for data visualization</li></ul><pre>import pandas as pd<br>import numpy as np<br>import matplotlib.pyplot as plt</pre><h3>Creating the Hostel Dataset</h3><p>First, we create a <strong>sample hostel dataset</strong>.</p><p>Each room has:</p><ul><li>A fixed <strong>capacity</strong></li><li>A number of <strong>occupied beds</strong></li></ul><pre>data = {<br>    &quot;RoomID&quot;: [&quot;R101&quot;,&quot;R102&quot;,&quot;R103&quot;,&quot;R104&quot;,&quot;R105&quot;,&quot;R106&quot;,&quot;R107&quot;,&quot;R108&quot;],<br>    &quot;Capacity&quot;: [4,4,3,2,4,3,2,4],<br>    &quot;Occupied&quot;: [3,4,1,0,2,3,1,4]<br>}<br><br>df = pd.DataFrame(data)<br>df</pre><pre><br>RoomID Capacity Occupied<br>R101     4        3<br>R102     4        4<br>R103     3        1<br>R104     2        0<br>R105     4        2<br>R106     3        3<br>R107     2        1<br>R108     4        4</pre><h3>Dataset Explanation</h3><ul><li>RoomID → Unique room number</li><li>Capacity → Maximum beds available</li><li>Occupied → Students currently staying</li></ul><p>The data is stored in a <strong>Pandas DataFrame</strong>, which makes analysis easier.</p><h3>Calculating Vacant Beds</h3><p>To find how many beds are empty in each room, we use a simple formula:</p><p><strong>Vacant = Capacity − Occupied</strong></p><pre>df[&quot;Vacant&quot;] = df[&quot;Capacity&quot;] - df[&quot;Occupied&quot;]<br>df</pre><p>OUTPUT:</p><pre>RoomID Capacity Occupied Vacant<br>R101     4         3      1<br>R102     4         4      0<br>R103     3         1      2<br>R104     2         0      2<br>R105     4         2      2<br>R106     3         3      0<br>R107     2         1      1<br>R108     4         4      0</pre><p>This helps us immediately identify:</p><ul><li>Fully occupied rooms</li><li>Partially filled rooms</li><li>Completely empty rooms</li></ul><h3>Mean Occupancy Statistics (Using NumPy)</h3><p>Next, we calculate average values to understand overall trends.</p><pre>mean_occupied = np.mean(df[&quot;Occupied&quot;])<br>mean_vacant = np.mean(df[&quot;Vacant&quot;])<br><br>print(&quot;Mean Occupancy:&quot;, mean_occupied)<br>print(&quot;Mean Vacancy:&quot;, mean_vacant)</pre><p>OUTPUT :</p><pre>Mean Occupancy: 2.25<br>Mean Vacancy: 1.0</pre><h3>What This Tells Us</h3><ul><li><strong>Mean Occupancy</strong> → Average number of students per room</li><li><strong>Mean Vacancy</strong> → Average number of empty beds per room</li></ul><p>NumPy is used here because it is fast and optimized for numerical calculations.</p><h3>Total Occupied vs Vacant Beds</h3><p>To analyze hostel usage at a global level, we calculate totals.</p><pre>total_occupied = df[&quot;Occupied&quot;].sum()<br>total_vacant = df[&quot;Vacant&quot;].sum()<br><br>print(total_occupied, total_vacant)</pre><p>OUTPUT:</p><pre>18        8</pre><p>These values represent:</p><ul><li>Total students staying in the hostel</li><li>Total beds currently unused</li></ul><h3>Visualizing Occupancy Using a Pie Chart</h3><p>Numbers are good — visuals are better.</p><p>We use a <strong>pie chart</strong> to compare occupied and vacant beds.</p><pre>labels = [&quot;Occupied Beds&quot;, &quot;Vacant Beds&quot;]<br>sizes = [total_occupied, total_vacant]<br><br>plt.figure(figsize=(6,6))<br>plt.pie(sizes, labels=labels, autopct=&#39;%1.1f%%&#39;, startangle=90)<br>plt.title(&quot;Hostel Occupancy Distribution&quot;)<br>plt.show()</pre><p>OUTPUT:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/544/1*G3VaT6LOW-cD2Yx-KhzT4w.png" /></figure><h3>Why a Pie Chart?</h3><ul><li>Clearly shows proportion</li><li>Easy to understand at a glance</li><li>Useful for non-technical audiences</li></ul><h3>Analyzing Vacancy Patterns</h3><p>Now we identify rooms that are <strong>completely vacant</strong>.</p><pre>vacant_rooms = df[df[&quot;Occupied&quot;] == 0]<br>vacant_rooms</pre><p>OUTPUT :</p><pre> RoomID Capacity Occupied Vacant<br>  R104     2       0        2</pre><p>This is important because:</p><ul><li>Empty rooms indicate poor space utilization</li><li>Management can redistribute students if needed</li></ul><h3>Calculating Occupancy Percentage</h3><p>Finally, we calculate <strong>occupancy percentage per room</strong>.</p><pre>df[&quot;Occupancy_Percentage&quot;] = (df[&quot;Occupied&quot;] / df[&quot;Capacity&quot;]) * 100<br>df</pre><p>OUTPUT :</p><pre>RoomID Capacity Occupied Vacant Occupancy_Percentage<br>R101       4      3          1            75.000000<br>R102       4      4          0            100.000000<br>R103       3      1          2            33.333333<br>R104       2      0          2            0.000000<br>R105       4      2          2            50.000000<br>R106       3      3          0             100.000000<br>R107      2       1          1             50.000000<br>R108      4       4          0             100.000000</pre><h3>Why This Matters</h3><ul><li>Normalizes data across rooms of different sizes</li><li>Helps compare utilization fairly</li><li>Makes insights more meaningful</li></ul><h3>Key Insights from the Project</h3><ul><li>Some rooms are <strong>fully occupied</strong>, while others are underutilize</li><li>At least one room is <strong>completely vacant</strong></li><li>Average occupancy is lower than total capacity</li><li>Visualization makes trends easy to interpret</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=0a3f45ff30bb" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Support Vector Machine (SVM) With Decision Boundary Visualization]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/support-vector-machine-svm-with-decision-boundary-visualization-e573db6280e4?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/e573db6280e4</guid>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Fri, 26 Dec 2025 14:09:22 GMT</pubDate>
            <atom:updated>2025-12-26T14:09:22.476Z</atom:updated>
            <content:encoded><![CDATA[<p>Definition : Support Vector Machine (SVM) is a <strong>supervised machine learning algorithm</strong> mainly used for <strong>classification</strong>.<br> It works by finding the <strong>best separating line (or boundary)</strong> between different classes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*rLTtHlfQJv9nliIGfyfNIA.png" /></figure><h3>CODE STARTED</h3><p><strong>we import the necessary libraries.</strong></p><pre>from sklearn.svm import SVC<br>from sklearn.model_selection import train_test_split<br>from sklearn.datasets import load_iris<br>import matplotlib.pyplot as plt<br>from sklearn.inspection import DecisionBoundaryDisplay</pre><p><strong>Load the Dataset.</strong></p><p>The Breast Cancer dataset contains features related to tumor measurements and a target variable indicating whether the tumor is <strong>malignant or benign</strong>.</p><pre>cancer = load_breast_cancer()</pre><p><strong>Select Features and Target.</strong></p><p>Here:</p><ul><li>Only the <strong>first two features</strong> are selected for visualization</li><li>y contains the class labels</li></ul><pre>x = cancer.data[:, :2]<br>y = cancer.target</pre><p><strong>Create and train the SVM Model .</strong></p><ul><li>kernel=&#39;linear&#39; creates a straight decision boundary</li><li>C=1 controls the trade-off between margin size and misclassification</li></ul><pre>svm = SVC(kernel=&#39;linear&#39;, C=1)<br>svm.fit(x, y)</pre><p><strong>Plot Decision Boundary</strong></p><p>This displays the decision boundary learned by the SVM model.</p><pre>DecisionBoundaryDisplay.from_estimator(<br>    svm,<br>    x,<br>    response_method=&quot;predict&quot;,<br>    alpha=0.99,<br>    cmap=&quot;Pastel1&quot;,<br>    xlabel=cancer.feature_names[0],<br>    ylabel=cancer.feature_names[1],<br>)</pre><p><strong>Plot Data Points</strong></p><p>This scatter plot shows :</p><ul><li>Data points colored by class</li><li>Black edges for better visibility</li><li>The decision boundary separating the classes</li></ul><pre>plt.scatter(x[:, 0], x[:, 1], c=y, s=20, edgecolors=&quot;k&quot;)<br>plt.show()</pre><p><strong>Output:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/562/1*eTflsJ71bb7mjOOYQutj2Q.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e573db6280e4" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[K-Nearest Neighbors(KNN) Model]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/k-nearest-neighbors-knn-model-00a7373f81ce?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/00a7373f81ce</guid>
            <category><![CDATA[k-nearest-neighbours]]></category>
            <category><![CDATA[mls]]></category>
            <category><![CDATA[nearest-neighbour-model]]></category>
            <category><![CDATA[knn-model]]></category>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Fri, 26 Dec 2025 12:00:47 GMT</pubDate>
            <atom:updated>2025-12-26T12:00:47.800Z</atom:updated>
            <content:encoded><![CDATA[<p>K-Nearest Neighbors (KNN) is a supervised learning algorithm used for classification. It predicts the class of a new (test) data point using the following steps:</p><ol><li><strong>Calculate the distance</strong> between the test data point and all existing data points (usually using Euclidean distance).</li><li><strong>Select the K nearest neighbors</strong>, where <strong>K</strong> is a number chosen by the user.</li><li><strong>Count the classes</strong> of these K nearest neighbors.</li><li><strong>Assign the majority class</strong> among them to the test data point.</li></ol><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*4wRWm3wgzVtGadmTbFV3hg.png" /></figure><p>We use the Euclidean Distance formula to measure how far the target point is from each existing data point.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/720/1*580E94CCtjwEgYKMUEQYIg.png" /></figure><p><strong>Code Part</strong></p><h3>1st option : by creating own KNN Function ( model ).</h3><p>First, we import the <strong>NumPy</strong> library, which is used to calculate the Euclidean distance.<br> Then, we import <strong>Counter</strong> to count how many times each class appears among the K nearest neighbors.</p><pre>import numpy as np<br>from collections import Counter</pre><p>then we define a fuction for euclidean distance formula</p><pre>def euclidean_distance(point1,point2):<br>  return np.sqrt(np.sum((np.array(point1)-np.array(point2))**2))</pre><h3>KNN Function Explanation (Simple and Clear)</h3><ol><li><strong>Define a KNN function</strong> where the user provides:</li></ol><ul><li>training data</li><li>training labels</li><li>target point</li><li>value of K</li></ul><p><strong>2. Create an empty list</strong> to store distances along with their corresponding labels.</p><p><strong>3. Loop through the training data</strong>:</p><ul><li>Calculate the Euclidean distance between each training point and the target point.</li><li>Store both the label and the calculated distance in the list using append().</li></ul><p><strong>4. Sort the distance list</strong> based on distance (from smallest to largest).</p><p><strong>5. Select the labels of the K nearest data points</strong>.</p><p><strong>6. Find the most common label</strong> among those K labels and return it as the predicted class.</p><pre>def knn_predict(training_data,training_labels,target_point,k):<br>  distance = []<br>  for i in range(len(training_data)):<br>    dist = euclidean_distance(training_data[i],target_point)<br>    distance.append((training_labels[i],dist))<br>  distance.sort(key=lambda x:x[1])<br>  k_nearest_labels = [label for label,_ in distance[:k]]<br>  return Counter(k_nearest_labels).most_common(1)[0][0]</pre><p>we have write our dataset with same variable name as we defined in out knn function above so ease of use.</p><pre>training_data = [[1,2],[2,3],[3,4],[6,7],[7,8]]<br>training_labels = [&#39;B&#39;,&#39;A&#39;,&#39;B&#39;,&#39;B&#39;,&#39;A&#39;]<br>target_point = [4,5]<br>k = 3</pre><p>Our model is ready. Now we need to train this model by our dataset and then we get our output by our target point</p><pre>predicted_label = knn_predict(training_data,training_labels,target_point,k)<br>print(predicted_label)</pre><p><strong><em>OUTPUT :- B</em></strong></p><p>#— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — #</p><h3>2nd option: by Directly using the existing KNN model from sklearn.</h3><p>Same Dataset with training data &amp; label, target point &amp; value of k</p><pre>training_data = [[1,2],[2,3],[3,4],[6,7],[7,8]]<br>training_labels = [&#39;B&#39;,&#39;A&#39;,&#39;B&#39;,&#39;B&#39;,&#39;A&#39;]<br>target_point = [4,5]<br>k = 3</pre><p>we just have to import it from neighbors from sklearn and giving value of n_neighbors(K) and assign it in a variable called knn.</p><pre>from sklearn.neighbors import KNeighborsClassifier<br>knn = KNeighborsClassifier(n_neighbors=k)</pre><p>then we train the model by giving <em>Labeled Data</em>. &amp; then it predict the class of Target Value.</p><pre>knn.fit(training_data,training_labels)<br>predicted_label2 = knn.predict([target_point])<br>print(predicted_label2)</pre><p><strong><em>OUTPUT:- [‘B’]</em></strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=00a7373f81ce" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Random Forest]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/random-forest-7544ec61ad16?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/7544ec61ad16</guid>
            <category><![CDATA[random-forest]]></category>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Fri, 19 Dec 2025 06:14:38 GMT</pubDate>
            <atom:updated>2025-12-19T06:14:38.570Z</atom:updated>
            <content:encoded><![CDATA[<p>Random Forest is a <strong>supervised machine learning algorithm</strong> used for <strong>classification and regression</strong> problems.<br> It works by creating <strong>multiple decision trees</strong> and combining their results to make a final prediction.</p><p>In this post, we will use <strong>Random Forest Classifier</strong> to predict passenger survival using the <strong>Titanic dataset</strong>.</p><p>Import Required Libraries</p><pre>import pandas as pd<br>from sklearn.model_selection import train_test_split<br>from sklearn.ensemble import RandomForestClassifier<br>from sklearn.metrics import accuracy_score, classification_report<br>import warnings<br>warnings.filterwarnings(&#39;ignore&#39;)</pre><p>Load the Dataset</p><pre>tt = pd.read_csv(&quot;titanic.csv&quot;)</pre><p>Remove rows where the target column is missing:</p><pre>tt = tt.dropna(subset=[&quot;Survived&quot;])</pre><p>Select Features and Target Variable</p><p>Here</p><ul><li><strong>X</strong> contains input features</li><li><strong>y</strong> contains the survival status</li></ul><pre>x = tt[[&#39;Pclass&#39;,&#39;Sex&#39;,&#39;Age&#39;,&#39;SibSp&#39;,&#39;Parch&#39;,&#39;Fare&#39;]]<br>y = tt[&#39;Survived&#39;]</pre><p>Data processing</p><p>Convert categorical values into numerical form and handle missing values.</p><pre>x[&#39;Sex&#39;] = x[&#39;Sex&#39;].map({&#39;female&#39;: 0, &#39;male&#39;: 1})<br>x[&#39;Age&#39;] = x[&#39;Age&#39;].fillna(x[&#39;Age&#39;].median())</pre><p>Split the Dataset</p><p>The data is split into training and testing sets to evaluate model performance.</p><pre>x_train, x_test, y_train, y_test = train_test_split(<br>    x, y, test_size=0.2, random_state=42<br>)</pre><p>Train the Random Forest Model</p><pre>rf_class = RandomForestClassifier(n_estimators=100, random_state=42)<br>rf_class.fit(x_train, y_train)</pre><p>Make Predictions</p><pre>y_pred = rf_class.predict(x_test)</pre><p>Model Evaluation</p><ul><li><strong>Accuracy</strong> shows how many predictions were correct</li><li><strong>Classification report</strong> provides precision, recall, and F1-score</li></ul><pre>acc = accuracy_score(y_test, y_pred)<br>class_rep = classification_report(y_test, y_pred)<br><br>print(f&quot;Accuracy: {acc:.2f}&quot;)<br>print(&quot;Classification Report &quot;, class_rep)</pre><p>output</p><pre>Accuracy: 0.80<br>Classification Report<br>                 precision   recall  f1-score   support<br><br>           0       0.82      0.85      0.83       105<br>           1       0.77      0.73      0.75        74<br><br>    accuracy                           0.80       179<br>   macro avg       0.79      0.79      0.79       179<br>weighted avg       0.80      0.80      0.80       179</pre><p>Predict Survival for a Sample Passenger</p><p>This demonstrates how the trained model predicts survival for an individual passenger.</p><pre>sample = x_test.iloc[0:1]<br>pred = rf_class.predict(sample)<br><br>sample_dict = sample.iloc[0].to_dict()<br>print(&quot;Sample Passenger:&quot;, sample_dict)<br>print(f&quot;Predicted Survival: {&#39;Survived&#39; if pred[0] == 1 else &#39;Did Not Survive&#39;}&quot;)</pre><pre>Sample Passender: {sample_dict}<br>Predicted Survival: Did Not Survive</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7544ec61ad16" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Decision Tree]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/decision-tree-9a3602a15f6b?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/9a3602a15f6b</guid>
            <category><![CDATA[decision-tree]]></category>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Fri, 19 Dec 2025 05:56:30 GMT</pubDate>
            <atom:updated>2025-12-19T05:56:30.843Z</atom:updated>
            <content:encoded><![CDATA[<p>Decision Tree is a <strong>supervised machine learning algorithm</strong> used for <strong>classification and regression</strong>.<br> It works by splitting data into different conditions and forming a tree-like structure to make decisions.</p><p># In this post, we will build a <strong>Decision Tree Classifier</strong> using the <strong>Iris dataset</strong> with Python and scikit-learn.</p><p>import required libraries</p><pre>from sklearn.datasets import load_iris<br>from sklearn.preprocessing import MinMaxScaler<br>import pandas as pd<br>from sklearn.model_selection import train_test_split<br>from sklearn.tree import DecisionTreeClassifier<br>from sklearn.metrics import accuracy_score,f1_score,confusion_matrix,classification_report<br>from sklearn.model_selection import GridSearchCV</pre><p>Load the iris dataset</p><p>Convert the dataset into a DataFrame for better understanding</p><pre>iris = load_iris()<br><br>x = iris.data<br>y = iris.target<br>iris[&#39;feature_names&#39;]<br>data = pd.DataFrame(iris[&quot;data&quot;],columns=iris[&quot;feature_names&quot;])<br>data<br></pre><p>output</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/618/1*NdLLHV1gS57jKIjtnBeVQQ.png" /></figure><p>We use MinMaxScaler to normalize the feature values between 0 and 1</p><pre>scaler=MinMaxScaler()<br>x_normalized = scaler.fit_transform(x)</pre><p>Split the dataset into training and testing sets.</p><pre>x_train,x_test,y_train,y_test = train_test_split(x_normalized,y,test_size=0.2,random_state=42)</pre><p>Train the decision Tree Model</p><pre>clf=DecisionTreeClassifier()<br>clf.fit(x_train,y_train)</pre><p>Make Predictions</p><pre>y_pred = clf.predict(x_test)<br>y_pred</pre><p>output</p><pre>array([1, 0, 2, 1, 1, 0, 1, 2, 1, 1, 2, 0, 0, 0, 0, 1, 2, 1, 1, 2, 0, 2,<br>       0, 2, 2, 2, 2, 2, 0, 0])</pre><p>The confusion matrix shows how many predictions were correct and where the model made mistakes.</p><pre>con = confusion_matrix(y_test,y_pred)<br>print(con)</pre><p>output</p><pre>[[10  0  0]<br> [ 0  9  0]<br> [ 0  0 11]]</pre><p>This report gives precision, recall, F1-score, and support for each class.</p><pre>cl= classification_report(y_test,y_pred)<br>print(cl)</pre><p>output</p><pre>           precision    recall  f1-score   support<br><br>           0       1.00      1.00      1.00        10<br>           1       1.00      1.00      1.00         9<br>           2       1.00      1.00      1.00        11<br><br>    accuracy                           1.00        30<br>   macro avg       1.00      1.00      1.00        30<br>weighted avg       1.00      1.00      1.00        30</pre><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9a3602a15f6b" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Logistic Regression]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/logistic-regression-668c75051738?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/668c75051738</guid>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Sun, 14 Dec 2025 18:09:15 GMT</pubDate>
            <atom:updated>2025-12-14T18:09:15.500Z</atom:updated>
            <content:encoded><![CDATA[<p>Definition : Logistic regression predicts a binary outcome, giving a yes or no answer, by classifying data into categories</p><h3>Code Started</h3><ol><li>1st line import the ‘load_breast_cancer’ function, which gives you access to the breast cancer data set from scikit-learn.</li><li>2nd line import ‘LogisticRegression’ Model from scikit-learn’s Linear_model module.</li><li>‘train_test_split’ divides dataset into training and testing sets. This helps evaluate your model’s performance on unseen data.</li><li>4th line line imports the accuracy_score function, which measures how often your model correctly classifies data points</li></ol><pre>from sklearn.datasets import load_breast_cancer<br>from sklearn.linear_model import LogisticRegression<br>from sklearn.model_selection import train_test_split<br>from sklearn.metrics import accuracy_score</pre><ol><li>1st line loads the Breast Cancer dataset. The argument return_X_y=True ensures that the data is returned directly as two separate arrays: X holds the input features (data), and y holds the target labels (answers).</li><li>2nd line code splits the data. 80% is kept for learning (Train) and 20% is kept for the exam (Test).</li><li>3rd line creates the empty model. We name it clf. We give it extra steps (max_iter) to make sure it learns properly.</li></ol><blockquote><em>‘max_iter’ refers to the number of iterations the logistic regression model will use during training.</em></blockquote><p>4. 4th line is the training step. The model studies the training data to find the pattern between X and y.</p><p>5. 5th line is the exam step. The model tries to predict the answers for the new test data (X_test)</p><p>6. 6th line, This calculates the score. It compares the <em>actual</em> answers (y_test) with the <em>predicted</em> answers (y_pred).</p><pre>x(return_X_y=True),y = load_breast_cancer<br>x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,random_state=23)<br>clf = LogisticRegression(max_iter=10000,random_state=0)<br>clf.fit(x_train,y_train)<br>y_pred = clf.predict(x_test)<br>acc = accuracy_score(y_test,clf.predict(y_test,clf.predic() ))</pre><p>this line shows the output</p><pre>print(y_pred)</pre><h3>Code Ended</h3><p><strong>output </strong>: [1 0 0 1 0 0 0 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 1 0 1 0 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 1 0 1 1 1 0]</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=668c75051738" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Linear Regression]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/linear-regression-3211a9592d0c?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/3211a9592d0c</guid>
            <category><![CDATA[linear-regression]]></category>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Sun, 14 Dec 2025 18:02:17 GMT</pubDate>
            <atom:updated>2025-12-20T17:18:45.376Z</atom:updated>
            <content:encoded><![CDATA[<p>Definition : Linear Regression is helps us find the average relationship between two factors in our data. In our code we consider X &amp; Y variable for finding the average relations between them means finding Linear Regression between X &amp; Y.</p><h3>CODE STARTED</h3><p><strong>we import the necessary libraries.</strong></p><ol><li>we import numpy library which is essential for numerical operation and data manipulation.</li><li>next we import matplotlib.pyplot as plt for visualization.</li><li>finally we LinearRession Model from sklearn which is main algorithm for the Linear Regression</li></ol><pre>import numpy as np<br>import matplotlib.pyplot as plt<br>from sklearn.linear_model import LinearRegression</pre><ol><li>The ‘np.random.seed(42)’ line sets the seed for the random number generator. This is important for reproducibility, ensuring that the same sequence that the same sequence of random numbers is generated every time you run the code</li><li>This line ‘X = np.random.rand(50,1) * 100’ creates a NumPy array ‘X’ , containing 50 random value between 0 and 100.</li><li>The ‘Y’ variable is created by establishing a linear relationship with ‘X’ and adding noise.</li></ol><pre>np.random.seed(42)<br>X = np.random.rand(50,1) * 100<br>Y = 3.5 * X + np.random.rand(50,1) * 20</pre><ol><li>We create an instance of the “LinearRegression” Model, names “hello”.</li><li>Next line trains the model using X and Y data</li></ol><pre>hello = LinearRegression()<br>hello.fit(X,Y)</pre><p>We then use the trained model to make predictions on the ‘x’ values, and store these predictions in ‘y_pred’.</p><pre>Y_pred  = hello.predict(X)</pre><ol><li>We set the plot size, create a scatter plot for data points, and then plot the linear regression line with labels and grid.</li><li>then plt.show() confirms the plot is displayed</li></ol><pre>plt.figure(figsize=(10,6))<br>plt.scatter(X,Y,color=&#39;blue&#39;,label=&#39;Data points&#39;)<br>plt.plot(X,Y_pred,color=&#39;red&#39;,linewidth=2,label=&#39;Linear Regression&#39;)<br>plt.title(&quot;Regression Analysis&quot;)<br>plt.xlabel(&#39;X&#39;)<br>plt.ylabel(&#39;Y&#39;)<br>plt.legend()<br>plt.grid(True)<br>plt.show()</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/850/1*ZDqtDaybP4DcPovcb0teag.png" /></figure><h3>Code Completed.</h3><p>. . . The blue dots shows your actual data points<br>_ _ _ The Red line shows the prediction based on linear regression.</p><p>We can calculate Mean Squared Error ( MSE ) :</p><pre>from sklearn.metrics import mean_squared_error<br>mse = mean_squared_error(Y,Y_pred)<br>print(mse)</pre><p>Output is 36.764638874837246</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3211a9592d0c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Lab 1: Importing Data and Creating Visualizations Using Python]]></title>
            <link>https://medium.com/@ayushmanchaurasia7366/lab-1-importing-data-and-creating-visualizations-using-python-f88b57c9183f?source=rss-6e4a1e552b93------2</link>
            <guid isPermaLink="false">https://medium.com/p/f88b57c9183f</guid>
            <dc:creator><![CDATA[Ayushman chaurasia]]></dc:creator>
            <pubDate>Sun, 23 Nov 2025 18:35:53 GMT</pubDate>
            <atom:updated>2025-11-23T18:35:53.723Z</atom:updated>
            <content:encoded><![CDATA[<ul><li><strong>random</strong> → it helps us to creates random data.</li><li><strong><em>pandas </em>(pd)</strong> →it helps us to store and work with data.</li><li><strong>matplotlib (plt)</strong> → it is used to create different types of pictures , charts and graphs in Python.</li></ul><pre>import random<br>import pandas as pd<br>import matplotlib.pyplot as plt</pre><ul><li>This code creates a small sample dataset by generating random ages, genders, and incomes for 100 people. Then it turns that data into a table using pandas and saves it as a CSV file named data.csv. It’s a quick way to make your own dataset when you don’t have real data.</li></ul><pre>data = {<br><br>&#39;age&#39;: [random.randint(20, 60) for _ in range(100)],<br>&#39;gender&#39;: [random.choice ([&#39;Male&#39;, &#39;Female&#39;]) for _ in range(100)],<br>&#39;income&#39;: [random.randint (20000, 100000) for _ in range(100)]<br>}<br><br>#Convert data to a pandas dataframe and save to CSV file<br><br>df = pd.DataFrame (data)<br>df.to_csv(&#39;data.csv&#39;, index=False)</pre><ul><li>df is the DataFrame that stores all the generated data in a table format. When you write df, it simply displays the full table so we can see the age, gender, and income values clearly.</li></ul><pre>df</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/243/1*d3RSMonHb04ggIB6a5GfRA.png" /></figure><ul><li>this code draws a chart that shows how many people fall into each age group. It uses the age data, adds labels, gives the chart a title, and then shows the graph on the screen.</li></ul><pre>plt.hist(data[&#39;age&#39;])<br>plt.xlabel(&#39;Age&#39;)<br>plt.ylabel(&#39;Frequency&#39;)<br>plt.title(&#39;Age Distribution&#39;)<br>plt.show(</pre><p>Output</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*CoTLQa4goK2huLfuNMnNuQ.png" /></figure><p>This code creates a histogram of the age data using 10 bins and sets the bar color to black.</p><pre>plt.hist(data[&#39;age&#39;],color=&#39;black&#39;,bins = 10)<br>#plt.hist([mens_age, femail_age],  color=[&#39;Black,&#39;Red&#39;], label=[&#39;Male,&#39;Femail&#39;])<br>plt.xlabel(&#39;Age&#39;)<br>plt.ylabel(&#39;Frequency&#39;)<br>plt.title(&#39;Age Distribution&#39;)<br>plt.show()<br></pre><p>Output</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*v1-cFXjEsMzw8CW3Zz4UAA.png" /></figure><ul><li>The error usually happens because data is a dictionary, not a DataFrame, so data[&#39;gender&#39;].value_counts() doesn’t work unless you use df[&#39;gender&#39;] instead.</li></ul><pre>plt.bar(data[&#39;gender&#39;].unique(), data[&#39;gender&#39;].value_counts())<br>plt.xlabel(&#39;Gender&#39;)<br>plt.ylabel(&#39;Count&#39;)<br>plt.title(&#39;Gender Comparison&#39;)<br>plt.show()</pre><p>output</p><pre>---------------------------------------------------------------------------<br>AttributeError                            Traceback (most recent call last)<br>/tmp/ipython-input-475948260.py in &lt;cell line: 0&gt;()<br>----&gt; 1 plt.bar(data[&#39;gender&#39;].unique(), data[&#39;gender&#39;].value_counts())<br>      2 plt.xlabel(&#39;Gender&#39;)<br>      3 plt.ylabel(&#39;Count&#39;)<br>      4 plt.title(&#39;Gender Comparison&#39;)<br>      5 plt.show()<br><br>AttributeError: &#39;list&#39; object has no attribute &#39;unique&#39;</pre><ul><li>by using df[&#39;gender&#39;] instead of data[&#39;gender&#39;]</li></ul><pre>plt.bar(df[&#39;gender&#39;].unique(), df[&#39;gender&#39;].value_counts())<br>plt.xlabel(&#39;Gender&#39;)<br>plt.ylabel(&#39;Count&#39;)<br>plt.title(&#39;Gender Comparison&#39;)<br>plt.show()</pre><p>Output</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*GhOR_3FEi8NYvi1zBEVPEw.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f88b57c9183f" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>