<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Dr Sujoy K Goswami, hc on Medium]]></title>
        <description><![CDATA[Stories by Dr Sujoy K Goswami, hc on Medium]]></description>
        <link>https://medium.com/@sujoykumargoswami?source=rss-5d3cb8c43298------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*1Ba9d0sEMZu-tmiOboCW9Q.png</url>
            <title>Stories by Dr Sujoy K Goswami, hc on Medium</title>
            <link>https://medium.com/@sujoykumargoswami?source=rss-5d3cb8c43298------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 26 May 2026 07:39:17 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@sujoykumargoswami/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[RAG vs. Fine-Tuning in LLM]]></title>
            <link>https://medium.com/analytics-vidhya/rag-vs-fine-tuning-in-llm-a8534d5ec30a?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/a8534d5ec30a</guid>
            <category><![CDATA[gen-ai-revolution]]></category>
            <category><![CDATA[large-language-models]]></category>
            <category><![CDATA[llm]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Sun, 10 Nov 2024 15:40:57 GMT</pubDate>
            <atom:updated>2024-11-10T16:08:30.806Z</atom:updated>
            <content:encoded><![CDATA[<p>In recent years, as large language models (LLMs) have grown in size and complexity, two prominent techniques — <strong>Retrieval-Augmented Generation (RAG)</strong> and <strong>Fine-Tuning</strong> — have emerged to improve their relevance, accuracy, and applicability across diverse fields. These methods address key limitations in LLMs: RAG enables real-time data retrieval, providing contextually accurate information from external knowledge bases, while fine-tuning specializes LLMs for specific tasks or domains, resulting in responses that align with specialized terminology and task requirements.</p><p>These methods not only improve the models’ utility and domain expertise but also extend their lifespans and reduce the need for frequent, costly retraining, giving them significant advantages in production environments.</p><h3>Retrieval-Augmented Generation (RAG)</h3><p><strong>Definition and Purpose</strong>:<br>Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with text generation, enhancing language models by giving them access to external knowledge bases, documents, or databases. This approach helps the model provide more accurate, up-to-date, and contextually relevant information. In RAG, an initial retrieval step is used to pull relevant documents or snippets from a knowledge base, and then a generative model (such as GPT) synthesizes the retrieved information into a coherent answer.</p><p><strong>How it Works</strong>:</p><ol><li><strong>Retrieval Step</strong>: The RAG model uses a retriever (often based on models like BERT or specialized retrieval models) to identify the top-k most relevant documents or pieces of information in response to a user’s query.</li><li><strong>Generation Step</strong>: A generative model then uses this retrieved information to generate a response, often improving the factual accuracy and relevance of the generated output.</li></ol><p><strong>Example</strong>:<br>Imagine you’re asking about the history of the Eiffel Tower. A RAG model would first retrieve relevant passages from a knowledge database or documents on Paris landmarks. Then, it generates an answer combining these details, which could result in a more accurate and informative response than a standalone LLM (which might be limited by the static training data available to it).</p><h3>Fine-Tuning</h3><p><strong>Definition and Purpose</strong>:<br>Fine-tuning is the process of training a pre-existing large language model (LLM) on additional, domain-specific data to improve its performance for a specific application or task. By exposing the model to custom data, fine-tuning enables it to learn the language, style, terminology, and nuances of the target domain. Fine-tuning can be either supervised, where the model learns from labeled data, or unsupervised, using relevant but unlabeled text.</p><p><strong>How it Works</strong>:</p><ol><li><strong>Data Preparation</strong>: Curate a dataset specific to the target task or domain (e.g., medical records for a healthcare model).</li><li><strong>Training</strong>: The model is trained on this dataset, adjusting its parameters to better align with the new, specific data.</li><li><strong>Evaluation and Tuning</strong>: The model is evaluated, and additional adjustments are made as needed to improve its accuracy and alignment with the target use case.</li></ol><p><strong>Example</strong>:<br>Suppose a model is being fine-tuned to assist medical professionals. The base LLM is exposed to medical literature, guidelines, and research data, allowing it to become proficient in understanding and responding with medical terminology and evidence-based information. This fine-tuned model would then deliver responses that are more accurate for medical questions, compared to a general-purpose LLM.</p><h3>Comparison Between RAG and Fine-Tuning</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/902/1*4eFpulJhT0CnYee_Abo1Gg.jpeg" /><figcaption>RAG vs. Fine-Tuning</figcaption></figure><p><strong>Summary</strong>:<br>RAG is highly effective for tasks requiring current, context-specific responses and can adapt quickly to new information. Fine-tuning, however, excels when there is a need for deep, domain-specific expertise that a model must consistently demonstrate.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a8534d5ec30a" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/rag-vs-fine-tuning-in-llm-a8534d5ec30a">RAG vs. Fine-Tuning in LLM</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[MediaPipe with Python for Dummies]]></title>
            <link>https://medium.com/analytics-vidhya/mediapipe-with-python-for-dummies-3d3021da6705?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/3d3021da6705</guid>
            <category><![CDATA[mediapipe]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[ml-so-good]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[augmented-reality]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Mon, 22 Aug 2022 12:20:46 GMT</pubDate>
            <atom:updated>2022-09-04T21:33:36.660Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Hgg6bLceoIjubE2hBiJK4g.png" /></figure><p>MediaPipe is a project by Google that offers “open-source, cross-platform, customizable ML solutions for live and streaming media”. In other words, MediaPipe provides access to a wide variety of powerful Machine Learning models built with the hardware limitations of mobile devices in mind.</p><p>MediaPipe is available for C++, Android, and more; but, in this tutorial, we will be working only with Python. For basic ideas, you can see reference [1]. Here, we will present a few examples with simple codes. Please note that, we have used <strong>MediaPipe version 0.8.3</strong>.</p><p>Example-1: 3D Face Mesh</p><blockquote>Here we will capture the face-mesh (3D); &amp; redraw it in a blank canvas to get an output like below:</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/802/1*VyPTcSjAfKaXkprIUC0ZVA.png" /></figure><pre><strong>#Code </strong>with comments<br>import cv2 as cv<br>import mediapipe as mp<br>import numpy as np</pre><pre>mpfacemesh = mp.solutions.face_mesh<br>FaceMesh = mpfacemesh.FaceMesh(max_num_faces=1)<br>mpdraw = mp.solutions.drawing_utils<br>drawspec1 = mpdraw.DrawingSpec(color = (255,255,0), circle_radius = 0, thickness = 1)<br>drawspec2 = mpdraw.DrawingSpec(color = (0,255,0), circle_radius = 0, thickness = 1)<br>webcam = cv.VideoCapture(0)</pre><pre>while True:<br>  <br> scc,img = webcam.read()<br> img = cv.flip(img,1)<br> h,w,c = img.shape<br> blank_img = np.zeros((h,w,c), np.uint8)<br> results = FaceMesh.process(img)<br> <br> if results.multi_face_landmarks:<br>  for face_lm in results.multi_face_landmarks:<br>   img = blank_img<br>   mpdraw.draw_landmarks(img,face_lm,<br>         mpfacemesh.FACE_CONNECTIONS,<br>         drawspec1,drawspec2)<br> k = cv.waitKey(1)<br> if k == ord(&#39;q&#39;):<br>  break<br> cv.imshow(&#39;face mesh 3d&#39;, img)</pre><pre>webcam.release()  <br>cv.destroyAllWindows()</pre><p>Example-2: Simple Augmented Reality</p><blockquote>Here first we will detect the eyes &amp; eyebrows; then finally draw a virtual spectacles (2D) to get an output like below:</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/802/1*-8exLucKlS4fu2OQVLKB1g.png" /></figure><pre><strong>#Code </strong>with comments<br>import cv2 as cv<br>import mediapipe as mp<br>import numpy as np</pre><pre>mpfacemesh = mp.solutions.face_mesh<br>FaceMesh = mpfacemesh.FaceMesh(max_num_faces=1)<br>mpdraw = mp.solutions.drawing_utils<br>drawspec1 = mpdraw.DrawingSpec(color = (255,255,0), circle_radius = 0, thickness = 1)<br>drawspec2 = mpdraw.DrawingSpec(color = (0,255,0), circle_radius = 0, thickness = 1)<br>webcam = cv.VideoCapture(0)</pre><pre>#following indices are available in mediapipe dev site<br>EYE_LEFT_CONTOUR = [<br>    249, 263, 362, 373, 374,<br>    380, 381, 382, 384, 385,<br>    386, 387, 388, 390, 398, 466]<br>EYE_RIGHT_CONTOUR = [<br>    7, 33, 133, 144, 145,<br>    153, 154, 155, 157, 158,<br>    159, 160, 161, 163, 173, 246]<br>LEFT_EYEBROW = [<br>    276, 282, 283, 285, 293, 295, 296, 300, 334, 336]<br> <br>RIGHT_EYEBROW = [<br>    46, 52, 53, 55, 63, 65, 66, 70, 105, 107]</pre><pre>while True:<br>  <br> scc,img = webcam.read()<br> img = cv.flip(img,1)<br> h,w,c = img.shape<br> results = FaceMesh.process(img)<br> <br> if results.multi_face_landmarks:<br>  for face_lm in results.multi_face_landmarks:<br>   X=[]<br>   Y=[]<br>   for lm in face_lm.landmark:<br>    X.append(int(lm.x*w))<br>    Y.append(int(lm.y*h))<br>   #left eye center<br>   xl = int(np.mean([X[i] for i in EYE_LEFT_CONTOUR]))<br>   yl = int(np.mean([Y[i] for i in EYE_LEFT_CONTOUR]))<br>   cv.circle(img,(xl,yl),9,(255,0,255),7)<br>   #right eye center<br>   xr = int(np.mean([X[i] for i in EYE_RIGHT_CONTOUR]))<br>   yr = int(np.mean([Y[i] for i in EYE_RIGHT_CONTOUR]))<br>   cv.circle(img,(xr,yr),9,(255,0,255),7)<br>   cv.line(img,(xl,yl),(xr,yr),(255,0,255),3)<br>   #eyebrows<br>   xlb = int(np.mean([X[i] for i in LEFT_EYEBROW]))<br>   ylb = int(np.mean([Y[i] for i in LEFT_EYEBROW]))<br>   xrb = int(np.mean([X[i] for i in RIGHT_EYEBROW]))<br>   yrb = int(np.mean([Y[i] for i in RIGHT_EYEBROW]))<br>   #final drawing<br>   cv.putText(img,&#39;*&#39;,(xl-9,yl+9),cv.FONT_HERSHEY_SIMPLEX,1,(0,255,0),3)<br>   cv.putText(img,&#39;*&#39;,(xr-9,yr+9),cv.FONT_HERSHEY_SIMPLEX,1,(0,255,0),3)<br>   cv.putText(img,&#39;^&#39;,(xlb-9,ylb),cv.FONT_HERSHEY_SIMPLEX,1,(0,255,0),3)<br>   cv.putText(img,&#39;^&#39;,(xrb-9,yrb),cv.FONT_HERSHEY_SIMPLEX,1,(0,255,0),3)<br>   <br> k = cv.waitKey(1)<br> if k == ord(&#39;q&#39;):<br>  break<br> cv.imshow(&#39;augmented reality&#39;, img)</pre><pre>webcam.release()  <br>cv.destroyAllWindows()</pre><p>Note that, Example-1 gives 3D ouput while, Example-2 gives 2D output. If you like the post, please do clap. Stay connected for more posts on Vision. Thanks.</p><pre>References:</pre><pre>[1] <a href="https://google.github.io/mediapipe/">https://google.github.io/mediapipe/</a></pre><p><a href="https://medium.com/mlearning-ai/mlearning-ai-submission-suggestions-b51e2b130bfb">Mlearning.ai Submission Suggestions</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=3d3021da6705" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/mediapipe-with-python-for-dummies-3d3021da6705">MediaPipe with Python for Dummies</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[SUJOY Filter: A Generic First- Derivative Filter For Image Edge Detection]]></title>
            <link>https://medium.com/analytics-vidhya/sujoy-filter-a-better-first-derivative-approach-for-image-edge-detection-db8f19c8c502?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/db8f19c8c502</guid>
            <category><![CDATA[sobel-filter]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[open-source]]></category>
            <category><![CDATA[image-processing]]></category>
            <category><![CDATA[edge-detection]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Sun, 12 Jun 2022 16:11:23 GMT</pubDate>
            <atom:updated>2025-04-13T13:26:22.448Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/689/1*DTNOvZfez6-xRk76WDG5Ww.png" /></figure><p>SUJOY filter gives a better approach (first derivative) for image edge detection than the other commonly used first derivative methods (like Robert operator, Prewitt operator, Sobel operator etc.).</p><p>The most general masks for SUJOY filter to detect image edges are given below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/480/1*yYrtv9RgzurbK16rUv4alA.png" /><figcaption>horizontal &amp; vertical masks of SUJOY filter</figcaption></figure><p>Full paper can be found <a href="https://www.ijert.org/research/a-better-first-derivative-approach-for-edge-detection-IJERTV2IS110616.pdf">here</a>. Open-source code is available <a href="https://juliaimages.org/stable/examples/contours/sujoy_edge_demo/#Edge-detection-using-Sujoy-Filter">here</a>.</p><p>[<a href="https://1drv.ms/u/c/e4ee02a8467815ba/ERUTvSsLQ-pLsZWaEdYYq-sB75K7Pi1x1g-R4vR7n1ih-A?e=8FxSIx">Android App</a> for Sujoy Filter]</p><p>Also note, averages, medians or weighted-averages of the neighbors around pixel (r-1,c) &amp; pixel (r+1,c) (for horizontal mask; similarly, pixel (r,c-1) &amp; pixel (r,c+1) for vertical mask; see figure below) make SUJOY filter generic &amp; robust.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/552/1*Bqy4hjHqzMfrh6T_L0EaWA.png" /><figcaption>(r,c) — candidate pixel</figcaption></figure><p><strong><em>Note:</em></strong><em> [1] SUJOY filter has been accepted by a few open-source communities. [2] </em><strong><em>Please cite the publication</em></strong><em> as given </em><a href="https://www.ijert.org/a-better-first-derivative-approach-for-edge-detection-2"><em>here</em></a><em>. [3] I am seeking developers proficient in any programming language to help contribute this algorithm to various other open-source platforms.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=db8f19c8c502" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/sujoy-filter-a-better-first-derivative-approach-for-image-edge-detection-db8f19c8c502">SUJOY Filter: A Generic First- Derivative Filter For Image Edge Detection</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Multilevel thresholding for image segmentation]]></title>
            <link>https://medium.com/analytics-vidhya/multilevel-thresholding-for-image-segmentation-d5805ad596b7?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/d5805ad596b7</guid>
            <category><![CDATA[image-segmentation]]></category>
            <category><![CDATA[otsu]]></category>
            <category><![CDATA[image-processing]]></category>
            <category><![CDATA[scikit-learn]]></category>
            <category><![CDATA[thresholding]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Tue, 07 Sep 2021 11:52:48 GMT</pubDate>
            <atom:updated>2021-09-17T04:50:17.149Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/235/1*eQz9o4eKEh3wfqEZjr6LuQ.jpeg" /></figure><p>Thresholding techniques can be divided into bi-level and multi-level category, depending on number of image segments. In bi-level thresholding, image is segmented into two different regions. The pixels with gray values greater than a certain value T are classified as object pixels, and the others with gray values lesser than T are classified as background pixels.</p><p>Multilevel thresholding is a process that segments a gray level image into several distinct regions. This technique determines more than one threshold for the given image and segments the image into certain brightness regions, which correspond to one background and several objects. The method works very well for objects with colored or complex backgrounds, on which bi-level thresholding fails to produce satisfactory results.</p><p>The full paper can be found <a href="https://people.ece.cornell.edu/acharya/papers/mlt_thr_img.pdf">here</a>. Here the authors used mean and the variance of the image to find optimum thresholds for segmenting the image into multiple levels. The algorithm is applied recursively on sub-ranges computed from the previous step so as to find a threshold and a new sub-range for the next step.</p><p>The Python (&gt;3.0) code for the above approach for n Thresholds is given below:</p><pre>import cv2<br>import numpy as np<br>import math<br><br>img = cv2.imread(&#39;path-to-image&#39;)<br>img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)<br>a = 0<br>b = 255<br>n = 6 # number of thresholds (better choose even value)<br>k = 0.7 # free variable to take any positive value<br>T = [] # list which will contain &#39;n&#39; thresholds<br><br>def multiThresh(img, a, b):<br>    if a&gt;b:<br>        s=-1<br>        m=-1<br>        return m,s<br><br>    img = np.array(img)<br>    t1 = (img&gt;=a)<br>    t2 = (img&lt;=b)<br>    X = np.multiply(t1,t2)<br>    Y = np.multiply(img,X)<br>    s = np.sum(X)<br>    m = np.sum(Y)/s<br>    return m,s<br><br>for i in range(int(n/2-1)):<br>    img = np.array(img)<br>    t1 = (img&gt;=a)<br>    t2 = (img&lt;=b)<br>    X = np.multiply(t1,t2)<br>    Y = np.multiply(img,X)<br>    mu = np.sum(Y)/np.sum(X)<br><br>    Z = Y - mu<br>    Z = np.multiply(Z,X)<br>    W = np.multiply(Z,Z)<br>    sigma = math.sqrt(np.sum(W)/np.sum(X))<br><br>    T1 = mu - k*sigma<br>    T2 = mu + k*sigma<br><br>    x, y = multiThresh(img, a, T1)<br>    w, z = multiThresh(img, T2, b)<br><br>    T.append(x)<br>    T.append(w)<br><br>    a = T1+1<br>    b = T2-1<br>    k = k*(i+1)<br><br>T1 = mu<br>T2 = mu+1<br>x, y = multiThresh(img, a, T1)<br>w, z = multiThresh(img, T2, b)    <br>T.append(x)<br>T.append(w)<br>T.sort()<br>print(T)</pre><p>You can find another approach, <a href="https://scikit-image.org/docs/dev/auto_examples/segmentation/plot_multiotsu.html">Multi-Otsu Thresholding</a>, by scikit-image library. Several other approaches are there. Thank you !!!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d5805ad596b7" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/multilevel-thresholding-for-image-segmentation-d5805ad596b7">Multilevel thresholding for image segmentation</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[“AI Based COVID Social Distance Monitoring System”: Cost effective & easy deployable approach!]]></title>
            <link>https://medium.com/analytics-vidhya/ai-based-covid-social-distance-monitoring-system-cost-effective-easy-deployable-approach-cbee1dbd37c7?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/cbee1dbd37c7</guid>
            <category><![CDATA[coronavirus]]></category>
            <category><![CDATA[perspective]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[social-distance]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Fri, 03 Sep 2021 07:49:00 GMT</pubDate>
            <atom:updated>2024-11-02T15:52:40.027Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/760/1*YlJhSVuIuOH3bIBVTbEA1Q.jpeg" /></figure><p>The Coronavirus pandemic posed an unprecedented global threat, making physical distancing and mask-wearing essential in curbing the virus’s spread. Last year, I developed and deployed an <strong>AI-Based COVID Social Distance Monitoring and Mask Detection System</strong> across various facilities of my employer, <a href="https://www.tvsmotor.com/">TVS Motor</a> and its <a href="https://en.wikipedia.org/wiki/TVS_Group">Group of Companies</a>, both in India and abroad. Today, this system continues to function effectively, alerting teams to any safety violations in real-time. I am deeply honored to have received multiple awards and recognition globally for this work, especially given its importance during such a critical time.</p><p><em>Prerequisites: Computer Vision, Pedestrian Detection, Deep Learning, YOLO, OpenCV, COVID Protocol</em></p><p>The base for Social Distance Monitoring System was taken from Andrew Ng’s <a href="https://landing.ai/landing-ai-creates-an-ai-tool-to-help-customers-monitor-social-distancing-in-the-workplace/">Landing-AI</a>. There, it is mentioned to go for bird’s eye view by morphing the perspective view; to find the actual distances between persons.<br>However, this is an expensive process due to high computation; as every frame need to be morphed. We don’t need very accurate distance between two persons, right? It should be roughly 6 ft.<br>Also, there camera calibration needs presence of the deployment team at the site.<br>So, to remove above challenges, I tweaked the idea a bit as below; &amp; with this I deployed at multiple places (India/ Abroad) remotely i.e. without going to the site actually.<br>The fundamental is, as the person goes away from the camera, his height appears smaller &amp; it varies linearly (mostly) with distance. Off course, I am assuming that, world coordinate system &amp; camera coordinate system have same axes directions, which happens usually.</p><p>Let, h-&gt; minimum height of the person in screen to monitor (i.e. person with height in screen less than h would not be considered); d-&gt; minimum safe distance in screen (== 6 ft. in real world);<br>H-&gt; average of heights in screen of the 2 persons detected;<br>D-&gt; projected safe distance in screen between 2 above persons detected;</p><p>Then,</p><blockquote><em>(d/h) == (D/H)</em></blockquote><p>We have bounding boxes’ coordinates, so H is available. So, if we know the ratio (d/h), we can get D. For most of the scenerios (I worked with ~15 different types/ brands of cameras placed at different locations), I found (h/d) ~2.5 (assuming that no kid is present in working area; kids have less height) gives pretty good results. You may fine tune this value after observing a few alerts’ snaps.<br>Now if, E-&gt; Euclidean distance (we can get it from bounding boxes’ coordinates) between the above 2 persons, then when, E &lt; D, alert for safe distance violation will arise.</p><p>I have deployed the solution with this tweak to ~100 cameras &amp; succeeded; still the system is working fine in all the places giving alerts for violations. No need to say, there will be a few false positives (Could you say the reason? Write in comment.); but, in this system, we are worried about false negatives, right? False negetives will be none, as long as the persons get detected.</p><p>The python class for “PeopleDetector” using OPENCV DNN module is given below:</p><pre>import itertools<br>import cv2<br>import numpy as np<br><br>class PeopleDetector:<br>    flag = 0<br>    def __init__(self, mindist, minheight,<br>                yolocfg=&#39;yolo_weights/yolov3.cfg&#39;,<br>                yoloweights=&#39;yolo_weights/yolov3.weights&#39;,<br>                labelpath=&#39;yolo_weights/coco.names&#39;,<br>                confidence=0.5,<br>                nmsthreshold=0.5,<br>                ):<br>        self._yolocfg = yolocfg<br>        self._yoloweights = yoloweights<br>        self._confidence = confidence<br>        self._nmsthreshold = nmsthreshold<br>        self._labels = open(labelpath).read().strip().split(&quot;\n&quot;)<br>        self._colors = np.random.randint(<br>            0, 255, size=(len(self._labels), 3), dtype=&quot;uint8&quot;)<br>        self._net = None<br>        self._layer_names = None<br>        self._boxes = []<br>        self._confidences = []<br>        self._classIDs = []<br>        self._centers = []<br>        self._layerouts = []<br>        self._MIN_DIST = mindist<br>        self._mindistances = {}<br>        self._heights = []<br>        self._MIN_HEIGHT = minheight<br><br>    def load_network(self):<br>        self._net = cv2.dnn.readNetFromDarknet(<br>            self._yolocfg, self._yoloweights)<br>        self._net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)<br>        self._net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)<br>        self._layer_names = [self._net.getLayerNames()[i[0] - 1]<br>                             for i in self._net.getUnconnectedOutLayers()]<br>        print(&quot;people-detector model loaded successfully\n&quot;)<br><br>    def predict(self, image):<br>        blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (416, 416),<br>                                     [0, 0, 0], 1, crop=False)<br>        self._net.setInput(blob)<br>        self._layerouts = self._net.forward(self._layer_names)<br>        return(self._layerouts)<br><br>    def process_preds(self, image, outs, bbox_flag):<br>        (frameHeight, frameWidth) = image.shape[:2]<br>        for out in outs:<br>            for detection in out:<br>                scores = detection[5:]<br>                classId = np.argmax(scores)<br>                if classId != 0:  # filter person class<br>                    continue<br>                confidence = scores[classId]<br>                if confidence &gt; self._confidence:<br>                    center_x = int(detection[0] * frameWidth)<br>                    center_y = int(detection[1] * frameHeight)<br>                    width = int(detection[2] * frameWidth)<br>                    height = int(detection[3] * frameHeight)<br>                    left = int(center_x - width / 2.0)<br>                    top = int(center_y - height / 2.0)<br>                    if height&gt;self._MIN_HEIGHT and width&lt;frameWidth/2.0 and height&lt;frameHeight/2.0:<br>                        self._classIDs.append(classId)<br>                        self._confidences.append(float(confidence))<br>                        self._boxes.append([left, top, width, height])<br>                        #self._centers.append((center_x, center_y))<br>                        #self._heights.append(height)<br>        indices = cv2.dnn.NMSBoxes(<br>            self._boxes, self._confidences, self._confidence, self._nmsthreshold)<br><br>        for j in indices:<br>            i = j[0]<br>            box = self._boxes[i]<br>            left = box[0]<br>            top = box[1]<br>            width = box[2]<br>            height = box[3]<br>            center_x = int(left + width/2.0)<br>            center_y = int(top + height/2.0)<br>            self._centers.append((center_x, center_y))<br>            self._heights.append(height)<br>            self.find_min_distance(self._centers, self._heights)<br>            if len(self._mindistances)&gt;0: PeopleDetector.flag = 1<br>            else: PeopleDetector.flag = 0<br>            if bbox_flag:<br>                self.draw_pred(image, self._classIDs[i], self._confidences[i], left,<br>                           top, left + width, top + height)<br><br>        return PeopleDetector.flag #self._centers<br><br>    def clear_preds(self):<br>        self._boxes = []<br>        self._confidences = []<br>        self._classIDs = []<br>        self._centers = []<br>        self._layerouts = []<br>        self._mindistances = {}<br>        self._heights = []<br>        PeopleDetector.flag = 0<br><br>    def draw_pred(self, frame, classId, conf, left, top, right, bottom):<br>        cv2.rectangle(frame, (left, top), (right, bottom), (255, 178, 50), 2)<br>        for k in self._mindistances:<br>            cv2.line(frame, k[0], k[1], (0, 0, 255), 3)<br><br>    def find_min_distance(self, centers, heights):<br>        centers = self._centers<br>        heights = self._heights<br>        temp = list(itertools.combinations(heights, 2))<br>        comp = list(itertools.combinations(centers, 2))<br>        ecdist = []<br>        avghgt = []<br>        for pts in comp:<br>            ecdist.append(np.linalg.norm(np.asarray(pts[0])-np.asarray(pts[1])))<br>        for hts in temp:<br>            avghgt.append((hts[0]+hts[1])/2.0)<br>        for i in range(len(avghgt)):<br>            rel_dist = self._MIN_DIST*avghgt[i]/self._MIN_HEIGHT<br>            if ecdist[i] &lt; rel_dist:<br>                self._mindistances.update({comp[i]: ecdist[i]})</pre><p><strong>Sample output results can be found </strong><a href="https://drive.google.com/drive/folders/131P928vQmHApWGeonw9MgHZDhYhJebKd?usp=sharing"><strong>here</strong></a><strong>. Please note that the RED lines/ boxes in images/ videos are representing COVID protocol violations.</strong></p><p>One can take width of the person instead of height, however, that is not recommended; could you say the reason? Write in comment!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=cbee1dbd37c7" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/ai-based-covid-social-distance-monitoring-system-cost-effective-easy-deployable-approach-cbee1dbd37c7">“AI Based COVID Social Distance Monitoring System”: Cost effective &amp; easy deployable approach!</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Video Classification Based On Action (from scratch without GPU)]]></title>
            <link>https://medium.com/analytics-vidhya/video-classification-based-on-action-without-gpu-f96ec9555197?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/f96ec9555197</guid>
            <category><![CDATA[action-recognition]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[lstm]]></category>
            <category><![CDATA[video-classification]]></category>
            <category><![CDATA[video-processing]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Thu, 16 Apr 2020 08:56:07 GMT</pubDate>
            <atom:updated>2021-10-03T07:40:43.765Z</atom:updated>
            <content:encoded><![CDATA[<h3>Video Classification Based On Action (from scratch &amp; without GPU support)</h3><p>NO GPU!! NO EXTERNAL HEAVY DATA-SET!! Read to learn &amp; implement the basic video classification technique based on temporal action in any machine.</p><p>Here I shall create own video data where, a rectangle is moving in different directions. The sample code (use <strong>Jupyter Notebook</strong>) is below:</p><pre>import numpy as np<br>import skvideo.io as sk</pre><pre># creating sample video data<br>num_vids = 5<br>num_imgs = 100<br>img_size = 50<br>min_object_size = 1<br>max_object_size = 5<br> <br>for i_vid in range(num_vids):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> vid_name = ‘vid’ + str(i_vid) + ‘.mp4’<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while x&gt;0:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> x = x-1<br> i_img = i_img+1<br> sk.vwrite(vid_name, imgs.astype(np.uint8))</pre><pre># play a video<br>from IPython.display import Video<br>Video(“vid3.mp4”) # the script &amp; video should be in same folder</pre><p>Now I shall create 4 different types of videos where, a rectangle is moving in 4 directions: left, right, up, down. Accordingly there will be 4 classes which I shall classify based on these video data by Deep Learning. Go through the below code (with <strong>python 3.6.9, keras 2.2.4</strong> in <strong>Jupyter Notebook</strong>); read the comments for sure.</p><pre>import numpy as np</pre><pre><strong># preparing dataset</strong><br>X_train = []<br>Y_train = []<br>labels = enumerate([‘left’, ‘right’, ‘up’, ‘down’]) #4 classes</pre><pre>num_vids = 30<br>num_imgs = 30<br>img_size = 20<br>min_object_size = 1<br>max_object_size = 5</pre><pre># video frames with left moving object<br>for i_vid in range(num_vids):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> #vid_name = ‘vid’ + str(i_vid) + ‘.mp4’<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while x&gt;0:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> x = x-1<br> i_img = i_img+1<br> X_train.append(imgs)<br>for i in range(0,num_imgs):<br> Y_train.append(0)</pre><pre># video frames with right moving object<br>for i_vid in range(num_vids):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> #vid_name = ‘vid’ + str(i_vid) + ‘.mp4’<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while x&lt;img_size:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> x = x+1<br> i_img = i_img+1<br> X_train.append(imgs)<br>for i in range(0,num_imgs):<br> Y_train.append(1)</pre><pre># video frames with up moving object<br>for i_vid in range(num_vids):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> #vid_name = ‘vid’ + str(i_vid) + ‘.mp4’<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while y&gt;0:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> y = y-1<br> i_img = i_img+1<br> X_train.append(imgs)<br>for i in range(0,num_imgs):<br> Y_train.append(2)<br> <br># video frames with down moving object<br>for i_vid in range(num_vids):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> #vid_name = ‘vid’ + str(i_vid) + ‘.mp4’<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while y&lt;img_size:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> y = y+1<br> i_img = i_img+1<br> X_train.append(imgs)<br>for i in range(0,num_imgs):<br> Y_train.append(3)</pre><pre># data pre-processing<br>from keras.utils import np_utils<br>X_train=np.array(X_train, dtype=np.float32) /255<br>X_train=X_train.reshape(X_train.shape[0], num_imgs, img_size, img_size, 1)<br>print(X_train.shape)<br>Y_train=np.array(Y_train, dtype=np.uint8)<br>Y_train = Y_train.reshape(X_train.shape[0], 1)<br>print(Y_train.shape)<br>Y_train = np_utils.to_categorical(Y_train, 4)</pre><blockquote>(120, 30, 20, 20, 1)<br>(120, 1)</blockquote><pre><strong># building model</strong><br>from keras.models import Sequential<br>from keras.layers import Dense, Conv2D, Flatten, Dropout<br>from keras.layers.pooling import MaxPooling2D<br>from keras.layers.recurrent import LSTM<br>from keras.layers.wrappers import TimeDistributed</pre><pre>model = Sequential()<br># TimeDistributed layer is to pass temporal information to the n/w<br>model.add(TimeDistributed(Conv2D(8, (3, 3), strides=(1, 1), activation=’relu’, padding=’same’), input_shape=(num_imgs, img_size, img_size, 1)))<br>model.add(TimeDistributed(Conv2D(8, (3,3), kernel_initializer=”he_normal”, activation=’relu’)))<br>model.add(TimeDistributed(MaxPooling2D((1, 1), strides=(1, 1))))<br>model.add(TimeDistributed(Flatten()))<br>model.add(Dropout(0.3))<br>model.add(LSTM(64, return_sequences=False, dropout=0.3))<br>model.add(Dense(4, activation=’softmax’))<br>model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])<br>model.summary()</pre><pre><strong># model training</strong><br>model.fit(X_train, Y_train, nb_epoch=50, verbose=1)</pre><pre><strong># model testing with new data (4 videos)</strong><br>X_test=[]<br>Y_test=[]<br>for i_vid in range(2):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while x&lt;img_size:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> x = x+1<br> i_img = i_img+1<br> X_test.append(imgs)<br># 2nd class — ‘right’</pre><pre>for i_vid in range(2):<br> imgs = np.zeros((num_imgs, img_size, img_size)) # set background to 0<br> w, h = np.random.randint(min_object_size, max_object_size, size=2)<br> x = np.random.randint(0, img_size — w)<br> y = np.random.randint(0, img_size — h)<br> i_img = 0<br> while y&lt;img_size:<br> imgs[i_img, y:y+h, x:x+w] = 255 # set rectangle as foreground<br> y = y+1<br> i_img = i_img+1<br> X_test.append(imgs)<br># 4th class — ‘down’</pre><pre>X_test=np.array(X_test, dtype=np.float32) /255<br>X_test=X_test.reshape(X_test.shape[0], num_imgs, img_size, img_size, 1)</pre><pre>pred=model.predict_classes(X_test)<br>pred</pre><blockquote>array([1, 1, 3, 3], dtype=int64)</blockquote><p>Here the 4 test videos are getting classified correctly.</p><p>Thanks for reading. Also go through my very first related post <a href="https://medium.com/tvs-motors-technology-blog/learning-cnn-using-simple-keras-python-programs-7be7b9efa852">here</a>.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f96ec9555197" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/video-classification-based-on-action-without-gpu-f96ec9555197">Video Classification Based On Action (from scratch without GPU)</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Ensemble Learning : Simple Techniques Implemented On Image Data]]></title>
            <link>https://medium.com/analytics-vidhya/ensemble-learning-simple-techniques-implemented-on-image-data-4885797e12a2?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/4885797e12a2</guid>
            <category><![CDATA[image-data]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[convolutional-neural-net]]></category>
            <category><![CDATA[ensemble-learning]]></category>
            <category><![CDATA[computer-vision]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Fri, 10 Apr 2020 19:44:41 GMT</pubDate>
            <atom:updated>2020-04-14T14:45:43.274Z</atom:updated>
            <content:encoded><![CDATA[<h3>Ensemble Learning : Simple Techniques Implemented On Image Data</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*sSSHJeUE2WHp3xD35NoJ9w.png" /></figure><p>Ensemble models in machine learning combine the decisions from multiple models to improve the overall performance. This can be achieved in various ways. Here I will implement two simple ways (on Image Data):</p><ol><li><strong>Averaging:</strong> Multiple models are used to make predictions for each data point. Average of predictions from all the models is used to make the final prediction</li><li><strong>Max Voting:</strong> Multiple models are used to make predictions for each data point. The predictions by each model are considered as a ‘vote’. The predictions which we get from the majority of the models are used as the final prediction.</li></ol><p><strong>Implementation on MNIST data (python 3.6.9, keras 2.2.4)</strong></p><pre><strong>#CNN models</strong></pre><pre>from keras.callbacks import ModelCheckpoint<br>from keras.datasets import mnist<br>from keras.layers import Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dropout, Activation, Average<br>from keras.losses import categorical_crossentropy<br>from keras.models import Model, Input<br>from keras.optimizers import Adam<br>from keras.utils import to_categorical</pre><pre>from tensorflow.python.framework.ops import Tensor<br>from scipy.stats import mode<br>from typing import List<br>import glob<br>import numpy as np<br>import os</pre><pre># data processing<br>def load_data():<br>    <br>    (x_train, y_train), (x_test, y_test) = mnist.load_data()<br>    x_train = x_train / 255.<br>    x_test = x_test / 255.<br>    y_train = to_categorical(y_train, num_classes=10)<br>    return x_train, x_test, y_train, y_test</pre><pre>x_train, x_test, y_train, y_test = load_data()<br>x_train = x_train.reshape((x_train.shape[0], 28, 28, 1))<br>x_test = x_test.reshape((x_test.shape[0], 28, 28, 1))<br>input_shape = x_train[0].shape<br>model_input = Input(shape=input_shape)</pre><pre># models(3) building<br>def first(model_input: Tensor):<br>    <br>    x = Conv2D(96, kernel_size=(3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(model_input)<br>    x = Conv2D(96, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(96, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = MaxPooling2D(pool_size=(3, 3), strides = 2)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = MaxPooling2D(pool_size=(3, 3), strides = 2)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(192, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = Conv2D(10, (1, 1))(x)<br>    x = GlobalAveragePooling2D()(x)<br>    x = Activation(activation=&#39;softmax&#39;)(x)<br>    <br>    model = Model(model_input, x, name=&#39;first&#39;)<br>    return model</pre><pre>def second(model_input: Tensor):<br>    <br>    x = Conv2D(96, kernel_size=(3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(model_input)<br>    x = Conv2D(96, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(96, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;, strides = 2)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;, strides = 2)(x)<br>    x = Conv2D(192, (3, 3), activation=&#39;relu&#39;, padding = &#39;same&#39;)(x)<br>    x = Conv2D(192, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = Conv2D(10, (1, 1))(x)<br>    x = GlobalAveragePooling2D()(x)<br>    x = Activation(activation=&#39;softmax&#39;)(x)<br>        <br>    model = Model(model_input, x, name=&#39;second&#39;)<br>    return model</pre><pre>def third(model_input: Tensor):<br>    <br>    #mlpconv block 1<br>    x = Conv2D(32, (5, 5), activation=&#39;relu&#39;,padding=&#39;valid&#39;)(model_input)<br>    x = Conv2D(32, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = Conv2D(32, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = MaxPooling2D((2,2))(x)<br>    x = Dropout(0.5)(x)<br>    <br>    #mlpconv block2<br>    x = Conv2D(64, (3, 3), activation=&#39;relu&#39;,padding=&#39;valid&#39;)(x)<br>    x = Conv2D(64, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = Conv2D(64, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = MaxPooling2D((2,2))(x)<br>    x = Dropout(0.5)(x)<br>    <br>    #mlpconv block3<br>    x = Conv2D(128, (3, 3), activation=&#39;relu&#39;,padding=&#39;valid&#39;)(x)<br>    x = Conv2D(32, (1, 1), activation=&#39;relu&#39;)(x)<br>    x = Conv2D(10, (1, 1))(x)<br>    <br>    x = GlobalAveragePooling2D()(x)<br>    x = Activation(activation=&#39;softmax&#39;)(x)<br>    <br>    model = Model(model_input, x, name=&#39;third&#39;)<br>    return model</pre><pre>first_model = first(model_input)<br>second_model = second(model_input)<br>third_model = third(model_input)</pre><pre># models compilation &amp; training<br>def compile_and_train(model: Model, num_epochs: int): <br>    <br>    model.compile(loss=categorical_crossentropy, optimizer=Adam(), metrics=[&#39;acc&#39;]) <br>    filepath = &#39;weights/&#39; + model.name + &#39;.hdf5&#39;<br>    checkpoint = ModelCheckpoint(filepath, monitor=&#39;loss&#39;, verbose=0, save_weights_only=True,<br>                                                 save_best_only=True, mode=&#39;auto&#39;, period=1)<br>    history = model.fit(x=x_train, y=y_train, batch_size=32, <br>                     epochs=num_epochs, verbose=1, callbacks=[checkpoint], validation_split=0.2)<br>    return filepath</pre><pre>NUM_EPOCHS = 5<br>first_weight_file = compile_and_train(first_model, NUM_EPOCHS)<br>second_weight_file = compile_and_train(second_model, NUM_EPOCHS)<br>third_weight_file = compile_and_train(third_model, NUM_EPOCHS)</pre><pre># models evaluation<br>def evaluate_error(model: Model):<br>    pred = model.predict(x_test, batch_size = 32)<br>    pred = np.argmax(pred, axis=1)<br>    error = np.sum(np.not_equal(pred, y_test))/ y_test.shape[0]  <br>    return error</pre><pre>e1=evaluate_error(first_model); print(e1)<br>e2=evaluate_error(second_model); print(e2)<br>e3=evaluate_error(third_model); print(e3)</pre><blockquote><em>Output errors:</em></blockquote><blockquote><em>0.0083<br>0.0112<br>0.0113</em></blockquote><pre><strong>#Ensemble models</strong></pre><pre>all_models = [first_model, second_model, third_model]<br>first_model.load_weights(first_weight_file)<br>second_model.load_weights(second_weight_file)<br>third_model.load_weights(third_weight_file)</pre><pre>def ensemble_average(models: List [Model]): # averaging<br>    <br>    outputs = [model.outputs[0] for model in all_models]<br>    y = Average()(outputs)<br>    <br>    model = Model(model_input, y, name=&#39;ensemble_average&#39;)<br>    E = evaluate_error(model)<br>    return E</pre><pre>def ensemble_vote(models: List [Model]): # max-voting<br>    <br>    pred = []<br>    yhats = [model.predict(x_test) for model in all_models]<br>    yhats = np.argmax(yhats, axis=2)<br>    yhats = np.array(yhats)<br>    #print(yhats.shape)<br>    for i in range(0,len(x_test)):<br>        m = mode([yhats[0][i], yhats[1][i], yhats[2][i]])<br>        pred = np.append(pred, m[0])<br>    E = np.sum(np.not_equal(pred, y_test))/ y_test.shape[0]  <br>    return E</pre><pre>E1 = ensemble_average(all_models); print(E1)<br>E2 = ensemble_vote(all_models); print(E2)</pre><blockquote><em>Output errors:</em></blockquote><blockquote><em>0.0061<br>0.0068</em></blockquote><p>Clearly Ensemble Learning gives better accuracy here.</p><p>References:</p><ol><li><a href="https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/">https://www.analyticsvidhya.com/blog/2018/06/comprehensive-guide-for-ensemble-models/</a></li><li><a href="https://towardsdatascience.com/ensembling-convnets-using-keras-237d429157eb">https://towardsdatascience.com/ensembling-convnets-using-keras-237d429157eb</a></li><li><a href="https://machinelearningmastery.com/horizontal-voting-ensemble/">https://machinelearningmastery.com/horizontal-voting-ensemble/</a></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4885797e12a2" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/ensemble-learning-simple-techniques-implemented-on-image-data-4885797e12a2">Ensemble Learning : Simple Techniques Implemented On Image Data</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Learning CNN (with Image Data) using Simple PYTHON Programs]]></title>
            <link>https://medium.com/analytics-vidhya/learning-cnn-using-simple-keras-python-programs-7be7b9efa852?source=rss-5d3cb8c43298------2</link>
            <guid isPermaLink="false">https://medium.com/p/7be7b9efa852</guid>
            <category><![CDATA[image-classification]]></category>
            <category><![CDATA[convolution-neural-net]]></category>
            <category><![CDATA[keras]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[computer-vision]]></category>
            <dc:creator><![CDATA[Dr Sujoy K Goswami, hc]]></dc:creator>
            <pubDate>Sat, 07 Apr 2018 14:58:18 GMT</pubDate>
            <atom:updated>2023-07-25T16:08:06.022Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/259/1*b4YRt9UfsjJgymC6ljVZ2A.png" /><figcaption>CNN</figcaption></figure><p>[Edited &amp; revised on July, 2023]</p><p>Here I shall try to share my experience while I was learning CNN. I have put simple small examples (codes) to get understood quickly. Python (≥3.6) &amp; Tensorflow (≥2.3) are used. Jupyter notebook is necessary to run these examples. What’s more? Run the codes &amp; have fun!</p><h4>1. Handwriting Recognition</h4><p><em>Here MNIST dataset is getting downloaded. After training &amp; validating the model, performance is getting estimated using test data. GPU/ higher-RAM is required to run the code. Internet connection is also required.</em></p><pre>#importing libraries<br>import numpy<br>from tensorflow.keras.datasets import mnist<br>from tensorflow.keras.models import Sequential<br>from tensorflow.keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D<br>from tensorflow.keras import utils<br>from tensorflow.keras import backend as K<br>from random import *</pre><pre>#loading MNIST data &amp; reshaping<br>(X_train, y_train), (X_test, y_test) = mnist.load_data()<br>X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype(&#39;float32&#39;)<br>X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype(&#39;float32&#39;)</pre><pre>#data pre-processing<br>X_train = X_train / 255<br>X_test = X_test / 255<br>y_train = utils.to_categorical(y_train)<br>y_test = utils.to_categorical(y_test)<br>num_classes = y_test.shape[1]</pre><pre>#function for creating deep network model<br>def create_model():<br> model = Sequential()<br> model.add(Conv2D(32, (3, 3), input_shape=(28, 28,1), activation=&#39;relu&#39;))<br> model.add(MaxPooling2D(pool_size=(2, 2)))<br> model.add(Dropout(0.2))<br> model.add(Flatten())<br> model.add(Dense(128, activation=&#39;relu&#39;))<br> model.add(Dense(num_classes, activation=&#39;softmax&#39;))<br> model.compile(loss=&#39;categorical_crossentropy&#39;, optimizer=&#39;adam&#39;, metrics=[&#39;accuracy&#39;])<br> return model</pre><pre>#training, validating &amp; testing<br>model = create_model()<br>model.summary()<br>model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=1)<br>scores = model.evaluate(X_test, y_test, verbose=1)<br>print(&quot;CNN Error: %.2f%%&quot; % (100-scores[1]*100))</pre><h4>2. Object Recognition</h4><p><em>Here VGG16 network pre-trained with IMAGENET dataset is used to recognize an object (real life common object). GPU is not required. Internet connection is required.</em></p><pre>#importing libraries<br>import numpy as np<br>from IPython.display import Image, display<br>from tensorflow.keras.applications import VGG16, imagenet_utils<br>from tensorflow.keras.preprocessing.image import img_to_array, load_img</pre><pre>#pre-processing input<br>inputShape = (224, 224)<br>preprocess = imagenet_utils.preprocess_input</pre><pre>#loading VGG16 with &#39;imagenet&#39; pre-trained weights<br>model = VGG16(weights=&quot;imagenet&quot;)</pre><pre>#displaying, loading &amp; pre-processing test image (one needs to give path for his test image)<br>display(Image(&#39;./test.jpg&#39;))<br>image = load_img(&quot;./test.jpg&quot;, target_size=inputShape)<br>image = img_to_array(image)<br>image = np.expand_dims(image, axis=0)<br>image = preprocess(image)</pre><pre>#predicting the output<br>preds = model.predict(image)<br>P = imagenet_utils.decode_predictions(preds)<br>for (i, (imagenetID, label, prob)) in enumerate(P[0]):<br> print(&quot;{}. {}: {:.2f}%&quot;.format(i + 1, label, prob * 100))</pre><h4>3. Single Object Detection (with Bounding Box)</h4><p><em>Here dataset is getting created. Each image contains a rectangle as the object. A simple Neural Network is used. GPU/ Internet is not required.</em></p><pre>#importing libraries<br>import numpy as np<br>import matplotlib.pyplot as plt<br>import matplotlib</pre><pre>#creating database<br>num_imgs = 1000<br>img_size = 8<br>min_object_size = 1<br>max_object_size = 4<br>num_objects = 1<br>bboxes = np.zeros((num_imgs, num_objects, 4))<br>imgs = np.zeros((num_imgs, img_size, img_size))  # set background to 0<br>for i_img in range(num_imgs):<br>    for i_object in range(num_objects):<br>        w, h = np.random.randint(min_object_size, max_object_size, size=2)<br>        x = np.random.randint(0, img_size - w)<br>        y = np.random.randint(0, img_size - h)<br>        imgs[i_img, x:x+w, y:y+h] = 1.  # set rectangle to 1<br>        bboxes[i_img, i_object] = [x, y, w, h]<br>        <br>imgs.shape, bboxes.shape</pre><pre>#plotting sample data<br>i = 0<br>plt.imshow(imgs[i].T, cmap=&#39;Greys&#39;, interpolation=&#39;none&#39;, origin=&#39;lower&#39;, extent=[0, img_size, 0, img_size])<br>for bbox in bboxes[i]:<br>    plt.gca().add_patch(matplotlib.patches.Rectangle((bbox[0], bbox[1]), bbox[2], bbox[3], ec=&#39;r&#39;, fc=&#39;none&#39;))<br>    <br>#reshaping input<br>X = (imgs.reshape(num_imgs, -1) - np.mean(imgs)) / np.std(imgs)<br>X.shape, np.mean(X), np.std(X)</pre><pre>#reshaping output<br>y = bboxes.reshape(num_imgs, -1) / img_size<br>y.shape, np.mean(y), np.std(y)</pre><pre>#final training &amp; testing data<br>i = int(0.8 * num_imgs)<br>train_X = X[:i]<br>test_X = X[i:]<br>train_y = y[:i]<br>test_y = y[i:]<br>test_imgs = imgs[i:]<br>test_bboxes = bboxes[i:]</pre><pre>#creating deep network model<br>from tensorflow.keras.models import Sequential<br>from tensorflow.keras.layers import Dense, Activation, Dropout, Convolution2D, MaxPooling2D <br>from tensorflow.keras.optimizers import SGD<br>model = Sequential([<br>        Dense(500, input_dim=X.shape[-1]),<br>        Activation(&#39;relu&#39;),<br>        Dense(300), <br>        Activation(&#39;relu&#39;), <br>        Dense(100), <br>        Activation(&#39;relu&#39;), <br>        Dropout(0.2), <br>        Dense(y.shape[-1])<br>    ])<br>model.compile(&#39;adadelta&#39;, &#39;mse&#39;)</pre><pre>#training &amp; validating<br>model.fit(train_X, train_y, epochs=100, validation_data=(test_X, test_y), verbose=2)</pre><pre>#predicting on test data<br>pred_y = model.predict(test_X)<br>pred_bboxes = pred_y * img_size<br>pred_bboxes = pred_bboxes.reshape(len(pred_bboxes), num_objects, -1)<br>pred_bboxes.shape</pre><pre>#plotting the prediction<br>plt.figure(figsize=(12, 3))<br>for i_subplot in range(1, 6):<br>    plt.subplot(1, 5, i_subplot)<br>    i = np.random.randint(len(test_imgs))<br>    plt.imshow(test_imgs[i].T, cmap=&#39;Greys&#39;, interpolation=&#39;none&#39;, origin=&#39;lower&#39;, extent=[0, img_size, 0, img_size])<br>    for pred_bbox, exp_bbox in zip(pred_bboxes[i], test_bboxes[i]):<br>        plt.gca().add_patch(matplotlib.patches.Rectangle((pred_bbox[0], pred_bbox[1]), pred_bbox[2], pred_bbox[3], ec=&#39;r&#39;, fc=&#39;none&#39;))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/700/1*r8CvNytQIK4PGxD52VHgcA.png" /><figcaption>Sample Outputs</figcaption></figure><h4>4. Multiple Objects Detection (with Shapes)</h4><p><em>Here dataset is getting created. Read the comments carefully. GPU/ Internet is not required.</em></p><pre># importing libraries<br>import numpy as np<br>import matplotlib.pyplot as plt<br>import matplotlib<br># creating dataset<br># here 0-4 black objects (different shapes with random sizes) are placed in a noisy image (24 x 24). <br># the image is divided into 4 quadrants (w.r.t. image center) &amp; each quadrant contains 0-1 object randomly.<br># 4000 such images are taken.<br># the objects with rectangular &amp; lower-triangular shapes are of our interest.<br># the upper-traingular shapes are dummy.<br># due to randomness few images may be blank or with upper-triangular shape (dummy object) only.<br># bounding boxes of the interested objects are also saved.<br>num_imgs = 4000<br>img_size = 24<br>min_rect_size = 3<br>max_rect_size = 9<br>max_num_objects = 5<br>bboxes = np.zeros((num_imgs, max_num_objects, 4))<br>imgs = np.random.rand(num_imgs, img_size, img_size)<br>shapes = np.zeros((num_imgs, max_num_objects, 1))<br>for i_img in range(num_imgs):<br>    i_object = 0<br>    if np.random.choice([True, False]):<br>        width, height = np.random.randint(min_rect_size, max_rect_size, size=2)<br>        x = np.random.randint(0, img_size/2 - width)<br>        y = np.random.randint(0, img_size/2 - height)<br>        imgs[i_img, x:x+width, y:y+height] = 1.<br>        bboxes[i_img, i_object] = [x, y, width, height]<br>        shapes[i_img, i_object] = [0]<br>        i_object += 1<br>    if np.random.choice([True, False]):<br>        size = np.random.randint(min_rect_size, max_rect_size)<br>        x, y = np.random.randint(img_size/2, img_size - size, size=2)<br>        mask = np.tril_indices(size)<br>        imgs[i_img, x + mask[0], y + mask[1]] = 1.<br>        bboxes[i_img, i_object] = [x, y, size, size]<br>        shapes[i_img, i_object] = [1]<br>        i_object += 1<br>    if np.random.choice([True, False]):<br>        width, height = np.random.randint(min_rect_size, max_rect_size, size=2)<br>        x = np.random.randint(img_size/2, img_size - width)<br>        y = np.random.randint(0, img_size/2 - height)<br>        imgs[i_img, x:x+width, y:y+height] = 1.<br>        bboxes[i_img, i_object] = [x, y, width, height]<br>        shapes[i_img, i_object] = [0]<br>        i_object += 1<br>    if np.random.choice([True, False]):<br>        size = np.random.randint(min_rect_size, max_rect_size)<br>        x = np.random.randint(0, img_size/2 - size)<br>        y = np.random.randint(img_size/2, img_size - size)<br>        mask = np.triu_indices(size)<br>        imgs[i_img, x + mask[0], y + mask[1]] = 1.<br>        #bboxes[i_img, i_object] = [x, y, size, size]<br>        #shapes[i_img, i_object] = [1]<br>        #i_object += 1<br>    for i in range(i_object, max_num_objects):<br>        bboxes[i_img, i] = [-1, -1, -1, -1]<br>        shapes[i_img, i] = [-1]<br>            <br>imgs.shape, bboxes.shape<br># plotting sample input data<br># see 5 randomly chosen input images. the bounding boxes of interested objects are marked red.<br>plt.figure(figsize=(24, 8))<br>for i_subplot in range(1, 6):<br>    plt.subplot(1, 5, i_subplot)<br>    i = np.random.randint(num_imgs)<br>    plt.imshow(imgs[i].T, cmap=&#39;Greys&#39;, interpolation=&#39;none&#39;, origin=&#39;lower&#39;, extent=[0, img_size, 0, img_size])<br>    for bbox, shape in zip(bboxes[i], shapes[i]):<br>        plt.gca().add_patch(matplotlib.patches.Rectangle((bbox[0], bbox[1]), bbox[2], bbox[3], ec=&#39;r&#39;, fc=&#39;none&#39;))<br># pre-processing data<br>X = (imgs.reshape(num_imgs, img_size, img_size, 1) - np.mean(imgs)) / np.std(imgs)<br>y = np.concatenate([bboxes / img_size, shapes], axis=-1).reshape(num_imgs, -1)<br>X.shape, y.shape<br># final training &amp; testing data<br>i = int(0.8 * num_imgs)<br>train_X = X[:i]<br>test_X = X[i:]<br>train_y = y[:i]<br>test_y = y[i:]<br>test_imgs = imgs[i:]<br>test_bboxes = bboxes[i:]<br># creating deep network model<br>from tensorflow.keras.models import Sequential<br>from tensorflow.keras.layers import Dense, Activation, Dropout, Convolution2D, MaxPooling2D, Flatten<br>from tensorflow.keras.optimizers import SGD<br>model = Sequential([<br>        Convolution2D(8, (3, 3), activation=&#39;relu&#39;, input_shape=(24, 24, 1)),<br>        Convolution2D(8, (3, 3), activation=&#39;relu&#39;),<br>        MaxPooling2D(pool_size=(2, 2)),<br>        Convolution2D(8, (3, 3), activation=&#39;relu&#39;),<br>        MaxPooling2D(pool_size=(2, 2)),<br>        Flatten(),<br>        Dense(3000),<br>        Activation(&#39;relu&#39;),<br>        Dropout(0.3),<br>        Dense(1500), <br>        Activation(&#39;relu&#39;), <br>        Dense(500), <br>        Activation(&#39;relu&#39;),<br>        Dropout(0.3),<br>        Dense(50),<br>        Activation(&#39;relu&#39;),<br>        Dense(y.shape[-1])<br>    ])<br>model.compile(&#39;adadelta&#39;, &#39;mse&#39;)<br># training the model &amp; validating<br>model.fit(train_X, train_y, epochs=100, validation_data=(test_X, test_y), verbose=2)<br># predicting on test data<br>pred_y = model.predict(test_X)<br>pred_y = pred_y.reshape(len(pred_y), max_num_objects, -1)<br>pred_bboxes = pred_y[..., :4] * img_size<br>pred_shapes = pred_y[..., 4:5]<br>pred_bboxes.shape, pred_shapes.shape<br># plotting the predictions<br># see 5 randomly chosen output predictions (in blue/ green shapes). <br># note that no upper-triangular shape has got predicted.<br># accuracy could be improved by other Deep Models or/and by tuning the various associated parameters/ variables/ methods.<br>plt.figure(figsize=(24, 8))<br>for i_subplot in range(1, 6):<br>    plt.subplot(1, 5, i_subplot)<br>    i = np.random.randint(len(test_X))<br>    plt.imshow(test_imgs[i].T, cmap=&#39;Greys&#39;, interpolation=&#39;none&#39;, origin=&#39;lower&#39;, extent=[0, img_size, 0, img_size])<br>    for pred_bbox, pred_shape in zip(pred_bboxes[i], pred_shapes[i]):<br>        if pred_shape[0] &lt;= 0.5:<br>            plt.gca().add_patch(matplotlib.patches.Rectangle((pred_bbox[0], pred_bbox[1]), pred_bbox[2], pred_bbox[3], fc=&#39;b&#39;, alpha=0.5))<br>        else:<br>            xy = ([[pred_bbox[0]+pred_bbox[2], pred_bbox[1]+pred_bbox[3]],<br>                    [pred_bbox[0]+pred_bbox[2], pred_bbox[1]],<br>                    [pred_bbox[0], pred_bbox[1]]])<br>            plt.gca().add_patch(matplotlib.patches.Polygon(xy, True, fc=&#39;g&#39;, alpha=0.5))</pre><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*I16NrHk2ulK6BC99_iKl0Q.png" /><figcaption>Sample Random Inputs</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ShI5zQYlwbU-SVOXryFLTw.png" /><figcaption>Sample Random Outputs</figcaption></figure><pre>References:</pre><pre>- <a href="https://towardsdatascience.com/object-detection-with-neural-networks-a4e2c46b4491">https://towardsdatascience.com/object-detection-with-neural-networks-a4e2c46b4491</a></pre><p><strong>Please CLAP for the post if you like it, &amp; also share it. Stay connected, I will add more codes soon… Thanks!</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7be7b9efa852" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/learning-cnn-using-simple-keras-python-programs-7be7b9efa852">Learning CNN (with Image Data) using Simple PYTHON Programs</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>