<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Praveen Kumar Rajendran on Medium]]></title>
        <description><![CDATA[Stories by Praveen Kumar Rajendran on Medium]]></description>
        <link>https://medium.com/@Praveenkumar_Rajendran?source=rss-f1ed91aec547------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/2*cso2fAyzEcysbm0QWgLTeQ.jpeg</url>
            <title>Stories by Praveen Kumar Rajendran on Medium</title>
            <link>https://medium.com/@Praveenkumar_Rajendran?source=rss-f1ed91aec547------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 27 May 2026 09:12:10 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@Praveenkumar_Rajendran/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[UP-DETR: Unsupervised Pre-training for Object Detection with Transformers (A Review)]]></title>
            <link>https://medium.com/analytics-vidhya/up-detr-unsupervised-pre-training-for-object-detection-with-transformers-a-review-c4b996e12a9c?source=rss-f1ed91aec547------2</link>
            <guid isPermaLink="false">https://medium.com/p/c4b996e12a9c</guid>
            <category><![CDATA[sequence-to-sequence]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[transformers]]></category>
            <category><![CDATA[object-detection]]></category>
            <category><![CDATA[deep-learning]]></category>
            <dc:creator><![CDATA[Praveen Kumar Rajendran]]></dc:creator>
            <pubDate>Thu, 23 Sep 2021 14:01:12 GMT</pubDate>
            <atom:updated>2021-10-04T08:33:18.566Z</atom:updated>
            <content:encoded><![CDATA[<p>Unsupervised pretraining, to the rescue!</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vmkYxdm504B8h_MwZWuKQQ.jpeg" /><figcaption><a href="https://towardsdatascience.com/explained-deep-learning-in-tensorflow-chapter-0-acae8112a98">Source</a></figcaption></figure><p>Researchers from <em>SCTU and Tencent Wechat AI</em> in China have suggested <a href="https://arxiv.org/abs/2011.09094"><strong>UP-DETR</strong></a>, an unsupervised learning approach for object detection that will be explored in this article. It is an advancement of the <a href="https://arxiv.org/abs/2005.12872"><strong>DETR</strong></a> object detection approach put forth by <em>Facebook AI.</em></p><p><strong>Inspired by the great success of pre-training transformers</strong> in NLP, authors of UP-DETR propose a pretext task named random query patch detection to Unsupervisedly Pre-train DETR (UP-DETR) for object detection.</p><blockquote>Before delving into the inner workings of UP-DETR, it’s important to understand what transformers do in deep learning and why they’re needed for computer vision tasks.</blockquote><h3><strong>1. Attention Is All You Need</strong></h3><p>In 2017, Vaswani et al (<em>From Google</em>) propounded a network architecture, the <a href="https://arxiv.org/abs/1706.03762"><strong>Transformer</strong></a>, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. This model performed superior in tasks of machine translation while also ensuring the parallelization ability that promotes faster training.</p><p>For capturing the long term dependencies, in a sequence to sequence task like NLP recurrent neural networks work well yet they are slow due to sequential computation and would easily suffer from vanishing/exploding gradient issues.</p><blockquote>Even though the Transformers does not use any recurrent units, how they actually capture the long term dependency patterns, that you might wonder! answer in 1.2..3…</blockquote><blockquote>“ATTENTION” mechanism.</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/384/1*6nbWN5an1woAy6L4O7QRFA.png" /><figcaption>Source: <a href="https://arxiv.org/pdf/1706.03762.pdf">Link</a></figcaption></figure><p>can’t tell you to consider the attention mechanism as a black box for deeply understanding the working of the transformer, I extremely recommend you to read the <strong>Jay Alammar </strong><a href="http://jalammar.github.io/illustrated-transformer/"><strong>article</strong></a> (Nicely explained with visual aids)</p><p>It is essential to learn the roles of <strong>Queries(Q), Keys(K) and Values(V)</strong> vectors.</p><p>for a further understanding of the Attention Is All You Need paper, Watch the video.</p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FiDulhoQ2pro%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DiDulhoQ2pro&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FiDulhoQ2pro%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/8219452f400a63ace3780aae70b1bbc0/href">https://medium.com/media/8219452f400a63ace3780aae70b1bbc0/href</a></iframe><h3>2. Why do you need transformers for vision tasks?</h3><p>In comparison to RNNs, transformers allow for the modelling of long dependencies between input sequence elements and support <strong>parallel processing</strong> of sequences. Transformers’ <strong>uncomplicated design</strong> allows them to process multiple modalities (e.g., images, videos, text, and speech) with similar processing blocks and demonstrates <strong>excellent scalability</strong> to very large capacity networks and massive datasets. These advantages have resulted in exciting progress on a variety of vision tasks involving Transformer networks. — <a href="https://arxiv.org/abs/2101.01169">link</a></p><h3><strong>3. DETR( simple review )</strong></h3><p>A method proposed in 2020 deals with object detection as a set prediction problem using transformer encoder-decoder architecture. It leverages global loss that forces unique predictions via bipartite matching — Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/478/1*ViuDWBoyn0JwsyqMnTlp6w.png" /><figcaption><a href="https://arxiv.org/pdf/2005.12872.pdf">Fig. 2: DETR</a></figcaption></figure><p>DETR is a supervised learning approach that gives n set of predictions as output. Here <a href="https://www.geeksforgeeks.org/maximum-bipartite-matching/">bipartite matching</a> loss plays a pivotal role in ensuring that a single object is not detected multiple times in a single image input.</p><blockquote><strong><em>It is important to note that this loss function considers the classification loss as well as regression loss of the bounding box.</em></strong></blockquote><blockquote>1.Assume the given input image is having <strong>2</strong> labelled ground-truth objects. 2.Assume the no of total predictions(N) by the DETR is <strong>4</strong></blockquote><blockquote>This <strong>loss function</strong> will try to encourage the model to have the prediction such that it gives two predictions with their classes and bounding boxes and two predictions with no class. It will penalize otherwise.</blockquote><h3>4. Unsupervised Pretraining</h3><p>Deep feedforward neural network training can be difficult due to local optima in the objective function and complex models’ proclivity to overfitting. Unsupervised pre-training is the process of starting a discriminative neural network from one that has been trained using an unsupervised criterion, such as a deep belief network or a deep autoencoder. This method can occasionally aid in optimization.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/619/1*eAMu7htA8flTLclMeN1kkA.png" /><figcaption>Source: <a href="https://iq.opengenus.org/applications-of-autoencoders/">link</a></figcaption></figure><p>The idea is simple and straightforward. Instead of initializing the weights randomly, we pretrain them for a task( usually feature reconstruction in autoencoders ) and then fix the weight. Then we finetune it for the downstream task ( <strong>starts from more favourable regions of feature space so that model learns faster than it would if its weights were initialized randomly</strong> )</p><h3>4. UP-DETR</h3><blockquote>The main picture begins…</blockquote><p>UPDETR approach, randomly crop patches from the given image and then feed them as queries to the decoder. The model is pre-trained to detect these query patches from the original image. Two critical issues addressed in pretraining are as follows.</p><ol><li>Multi-task learning.</li><li>Multi-query localization.</li></ol><p>UP-DETR argues that even though DETR performs well on object detection tasks it comes with hurdles in training and optimization, which requires large scale training data and comparatively longer schedules for training.</p><p>You can infer from the below figures that UP-DETR requires lesser time to converge and performs well in the long run and it is evident that DETR performs inadequately in PASCAL VOC [<a href="https://cv.gluon.ai/build/examples_datasets/pascal_voc.html#:~:text=Pascal%20VOC%20is%20a%20collection,and%202007%20test%20for%20validation.&amp;text=The%20total%20time%20to%20prepare,Internet%20speed%20and%20disk%20performance.">link</a>] which relatively has less training data and instances than COCO [<a href="https://cocodataset.org/#home">link</a>]</p><p>It suggests that pretraining transformers is indispensable on insufficient training data</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CQE1vAG1BqEAxAA-kdghlA.png" /></figure><blockquote>Multi-task learning</blockquote><p>to put it simply, a combination of object classification and localization is known as object detection.</p><p>To prevent query patch detection from destroying classification features, a <strong><em>frozen pre-training backbone</em></strong> and <strong><em>patch feature reconstruction</em></strong> to preserve transformer feature discrimination are introduced.</p><p>Furthermore, an ablation study demonstrates that<strong> freezing the CNN backbone</strong> plays an important role in feature discrimination during the pretraining stage.</p><blockquote>Multi-query localization</blockquote><p>Different object queries concentrate on different position areas and box sizes. A simple single-query pre-training is proposed and expanded to a multi-query version to demonstrate this property.</p><p>Object query shuffle and attention mask are introduced to solve the assignment problems between query patches and object queries in multi-query patches.</p><h3><strong>A Two-stage Attack!</strong></h3><p><strong>I) Pretraining of transformers </strong>in an unsupervised manner.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/522/1*73_1R2PPH0bKxPvSnoc6fw.png" /><figcaption><a href="https://www.enjoyalgorithms.com/blogs/supervised-unsupervised-and-semisupervised-learning">Source</a></figcaption></figure><p>UP-DETR is pre-trained on the <strong>ImageNet</strong> training set without any labels. The CNN backbone (ResNet-50) is pre-trained with SwAV</p><p><strong>II)</strong> <strong>Finetuning</strong></p><p>The model is initialized with pretraining UP-DETR parameters and fine-tuned for all the parameters (including CNN) on VOC and COCO with labelled data.</p><p>As mentioned before this stage start from a favourable feature space thus it performs nice and converges well.</p><p>The model is fine-tuned with short/long schedule for 150/300 epochs and the learning rate is multiplied by 0.1 at 100/200 epochs, respectively.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/360/1*iDYuAOgV1LIMLweOPFmHCQ.gif" /><figcaption><a href="https://g-stat.com/optimization-gradients-overview/">Source</a></figcaption></figure><h3>ARCHITECTURE DETAILS</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/699/1*Sd1bJlK8WJ9cBFP3eVcvBg.png" /><figcaption>Source: <a href="https://arxiv.org/pdf/2011.09094.pdf">Link</a></figcaption></figure><p>As you can see, the input image is first passed through a CNN backbone to extract feature map(f) which is added to positional encodings and fed into multiple transformer encoder layers. The output of the encoder feeds into the decoder.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/87/1*1A5lZW0S9Kat_koeFguPdQ.png" /><figcaption>C=Channel; H=Height; W=Width</figcaption></figure><p>random cropped query patch from the same input image is fed into CNN backbone with GAP(Global Average Pooling) such that it gives the patch feature(p) which is then added with the object query of the same dimension to be fed into the decoder.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*61ac02s8mQL1HUzey9hrIQ.png" /><figcaption><a href="https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/">Source</a></figcaption></figure><p>There are N - number of object queries. These are learnable as the model is training.</p><blockquote>“ The role of object queries is like a <strong>group of people</strong>“</blockquote><blockquote>These guys will be responsible for questioning a certain position and box size [that in turn will help the model ] to give predictions according to it .</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*4sPcIQnrrKU-TLgO72Unsg.gif" /><figcaption>😆 Object queries — Courtesy Zoo Zoo</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/467/1*M2i574F_qX2jDef95h3tBw.png" /><figcaption><a href="https://arxiv.org/pdf/2005.12872v3.pdf">From DETR Paper</a></figcaption></figure><p><strong>Loss Function</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/534/1*tC0PkF2bXDMbi-j5Nx4h7w.png" /></figure><blockquote>For the loss calculation in pretraining stage: The predicted result consists of three elements.</blockquote><blockquote><strong>cˆi</strong> ==&gt; Binary classification of <strong>matching the query patch or not</strong> for each object query</blockquote><blockquote><strong>bˆi</strong> ==&gt; Vector that defines the box center coordinates <strong>{x, y, w, h}</strong><br><br><strong>pˆi </strong>==&gt; Reconstructed feature with C = 2048 for the ResNet-50 backbone</blockquote><p><strong>L rec</strong> component is the reconstruction loss proposed in this paper to balance classification and localization during the unsupervised pre-training. A mean squared error between the L2-normalized patch feature to preserve the feature discrimination.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/231/1*4BGJ33D2tE3l1td2Kz43gA.png" /></figure><p>with multi-query patches,</p><p>If we have <strong>“M”</strong> query patches and <strong>“N”</strong> object queries then we divide N object queries into M groups, where each query patch is assigned to N/M object queries.</p><p>authors hypothesize two requirements for better generalization <br><strong>i) Independence of query patches(attention mask) ii) Diversity of object queries(object query shuffle)</strong></p><p>To satisfy the independence of query patches, we utilize an attention mask matrix to control the interactions between different object queries.</p><p>To simulate implicit group assignment between object queries, we randomly shuffle the permutation of all the object query embeddings during pre-training. 10% query patches are masked to zero during pre-training similarly to dropout. “The object query shuffle is not helpful” in their further study</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/681/1*IvPOI1thmOJz9WJNJx0Tdw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/678/1*pjYQTtYZpmo-lrQz6cHFVA.png" /></figure><p><strong>Results suggest that pre-training transformers are still indispensable even on sufficient training data (i.e. ∼ 118K images on COCO)</strong></p><p>The results of UP-DETR is further extended for one-shot detection and Panoptic segmentation and it seems to perform comprehensively in those tasks as well.</p><p>The following curves and results summarize why an unsupervised approach is important.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*smdteLX1bKTEbZL0n84KMw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/682/1*B-Y7F14I5zsrMcAbRm4-BA.png" /></figure><p>With unsupervised pre-training, UP-DETR significantly outperforms DETR on object detection, one-shot detection and panoptic segmentation.</p><h3>References</h3><ol><li><a href="https://jalammar.github.io/illustrated-transformer/">The Illustrated Transformer — Jay Alammar</a></li><li><a href="https://arxiv.org/abs/2005.12872">End-to-End Object Detection with Transformers</a></li><li><a href="https://arxiv.org/abs/2011.09094">UP-DETR: Unsupervised Pre-training for Object Detection with Transformers</a></li><li><a href="https://www.geeksforgeeks.org/maximum-bipartite-matching/">Maximum bipartite matching — Geeksforgeeks</a></li></ol><blockquote>See ya next time!</blockquote><blockquote><strong>connect with me on LinkedIn<br></strong> linkedin.com/in/praveenkumar-rajendran/</blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c4b996e12a9c" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/up-detr-unsupervised-pre-training-for-object-detection-with-transformers-a-review-c4b996e12a9c">UP-DETR: Unsupervised Pre-training for Object Detection with Transformers (A Review)</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[AlexNet TensorFlow 2.1.0]]></title>
            <link>https://medium.com/analytics-vidhya/alexnet-tensorflow-2-1-0-d398b7c76cf?source=rss-f1ed91aec547------2</link>
            <guid isPermaLink="false">https://medium.com/p/d398b7c76cf</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[imagedatagenerator]]></category>
            <category><![CDATA[tensorflow2]]></category>
            <category><![CDATA[tensorboard]]></category>
            <category><![CDATA[alexnet]]></category>
            <dc:creator><![CDATA[Praveen Kumar Rajendran]]></dc:creator>
            <pubDate>Thu, 30 Apr 2020 16:51:42 GMT</pubDate>
            <atom:updated>2020-06-20T18:38:34.294Z</atom:updated>
            <content:encoded><![CDATA[<h3><strong>Training AlexNet from scratch in TensorFlow 2.1.0 for our own classification task.</strong></h3><p><em>“AI is the new electricity.”— Andrew Ng</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BzZC_LWHYK1LWISMxPmXuw.jpeg" /><figcaption>Demystifying Deep Learning</figcaption></figure><p><em>Hellooooo Everyone! This is my first ever post on the Medium site. It really took a long time to come here. I’m an Automotive Software test engineer with an Electrical Engineering background. Well, I know what you got in your mind now </em><strong><em>“What the heck is he doing with deep learning”. </em></strong><em>Wait for it,</em><strong><em> </em></strong><em>I’ll answer it later because First Things First.</em></p><p>I’m going to go through creating AlexNet and training it on the five Flowers dataset, from scratch. This section will talk exclusively about creating <a href="https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf">AlexNet </a>in <a href="https://www.tensorflow.org/">TensorFlow </a>2.1.0, An end-to-end open-source machine learning platform.</p><h3><strong><em>Why TensorFlow 2.x?</em></strong></h3><p>TensorFlow 2.x makes the development of ML applications much easier. With tight integration of Keras into TensorFlow, eager execution by default, and Pythonic function execution.</p><blockquote>you no longer need to create a session to run the computational graph, See the result of your code directly without the need of creating Session, unlike you do it in TensorFlow 1.x.</blockquote><blockquote><strong>HOW COOL IS THAT!</strong></blockquote><h3>AlexNet</h3><p>AlexNet is an Influential paper published in computer vision, employing CNNs and GPUs to accelerate deep learning. As of 2020, the AlexNet paper has been cited over 61015 times according to the author’s Google Scholar profile.</p><p>AlexNet, A large margin winner of the ILSRVC-2012. The network demonstrated the potential of <strong>training large neural networks</strong> quickly on massive datasets using widely available gaming <strong>GPUs.</strong></p><blockquote>Availability of high computation power and large datasets together is❤️️ <br>Yeah! One of the reasons why deep learning is taking off.<br>“Seismic shift that broke the Richter scale!”</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/684/1*kQonmTVE73r5K449TIEvyQ.png" /><figcaption>AlexNet Architecture as given in the research paper</figcaption></figure><h4><strong>Six Main Ideas of AlexNet</strong></h4><p><strong>1.<em>ReLU nonlinearity</em></strong></p><p><strong><em>ReLU </em></strong><em>is a so-called non-saturating activation. This means that the gradient will never be close to zero for a positive activation and as a result, the training will be faster. In other words, When the activation(a) is negative ReLu(a) = 0, When activation(a) is positive ReLu(a) = a.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*w48zY6o9_5W9iesSsNabmQ.gif" /><figcaption><strong>ReLU Visualization</strong></figcaption></figure><p><strong><em>2.Multiple GPUs for training</em></strong></p><p><strong><em>3.Local response normalization</em></strong></p><p><strong><em>4.Data augmentation</em></strong></p><p><strong><em>5.Test time data augmentation</em></strong></p><p><em>Five Crops of the single Test image(4 Corners &amp; Center) and their horizontal flips were taken, Predictions are made on these 10 augmented images. Later, predictions are averaged to make the final prediction.</em></p><p><strong><em>6.Dropout</em></strong></p><p>It uses 0.5 dropout during training. This means that during the forward pass, 50% of all activations of the layer were set to zero and also did not participate in backpropagation. <strong>During testing, no single neuron is dropped </strong>as in the real-time Inference.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/460/1*qM6ZRfezI6mVLz7B9X0LvA.gif" /><figcaption>Intuition for dropouts</figcaption></figure><h3>TensorFlow Implementation</h3><blockquote>Yes, it’s finally happening! You just came across a little theory that will be useful for you. Honestly, Seeing that working out for yourself is a joy. Man, <strong>That’s the thing, </strong>let’s get it.</blockquote><p><strong><em>ENVIRONMENT USED:</em></strong></p><p>Editor: <a href="https://www.jetbrains.com/pycharm/">PyCharm IDE</a><br><em>OS: Windows 10 (64bit)<br>GPU: Nvidia GeForce GTX 1050<br>CPU: Intel i7–8750H</em></p><p>Training Time: ~17 Minutes(Approx)</p><h3><strong>What are we up to?</strong></h3><ol><li>Import necessary packages.</li><li>Getting the dataset &amp; Analyzing them.</li><li>Defining the Model Architecture, <em>Yey!!!!! The </em><strong><em>AlexNet </em></strong><em>is coming…</em></li><li>Preprocessing the images in the dataset for the training process of our deep learning model.</li><li>Compiling it with the Loss function and Optimizer to be used for training.</li><li>Define callbacks to be used while training.</li><li>Finally, we train the model and save it.</li><li>Visualization of the training process and model in TensorBoard.dev</li><li>Doing Evaluation of the trained model.</li><li>Importance of validation dataset.</li></ol><p><strong>Step 1:<br></strong>I will start off by importing the necessary packages. TensorFlow, NumPy, pathlib, Datetime. I will print out the version for reference.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a25935db6c94c411169afe83e63b4433/href">https://medium.com/media/a25935db6c94c411169afe83e63b4433/href</a></iframe><pre>Tensor Flow Version: 2.1.0<br>numpy Version: 1.18.2</pre><p><strong>Step 2:</strong></p><p>In this section, I’ve specified the Directory of the unzipped <a href="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz">dataset</a>.</p><p>i)The total no of images is then printed. <br>ii)Class names are printed as a list by reading the names of subdirectory in the dataset. <br>iii)The total no of classes is printed.</p><p>The folder structure of the unzipped Dataset is given below.</p><pre>flower_photos<br>|__daisy<br>|__dandelion<br>|__roses<br>|__sunflowers<br>|__tulips<br>|__LICENSE.txt</pre><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/7c4ec358ad54a0fd4dedde90c850036f/href">https://medium.com/media/7c4ec358ad54a0fd4dedde90c850036f/href</a></iframe><pre>3670<br>[&#39;daisy&#39; &#39;dandelion&#39; &#39;roses&#39; &#39;sunflowers&#39; &#39;tulips&#39;]<br>5</pre><p><strong>Step 3:</strong></p><p>Here we define a model architecture of <em>AlexNet</em>.</p><p>i) As you can see, batch Normalization is used after each convolution layer instead of Local response Normalization. <br>ii) The dropout layer is not added but given in the comment section at Two Fully connected layers, So that if you want you can tweak it. <br>iii) The parameters like strides and kernel size are tweaked a little bit (<em>Yey! we are becoming a deep learning practitioner</em>) however the number of kernels kept the same as that of AlexNet.</p><blockquote>The reason why I did not add a dropout layer is that, sometimes, It behaves weirdly at the backpropagation of the neural network.</blockquote><blockquote>Benefits to using Batch normalization is more than just reducing overfitting like, speeding up training by giving us the ability to use a higher learning rate for the optimizer of the network.</blockquote><blockquote>As Andrew NG Explains, I’m talking about those <strong>“tiny tiny baby steps” </strong>❤️</blockquote><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/dbd8e6a1c42e9705c7d3ea1fc0785bb5/href">https://medium.com/media/dbd8e6a1c42e9705c7d3ea1fc0785bb5/href</a></iframe><p><strong>Step 4:</strong></p><p>In this section, we are preparing the data for training which means, preprocessing the data before we feed it to a neural network. Defining the batch size, Height, Width, Steps per epoch. Later Resizing, and preprocessing the image as needed using the ImageDataGenerator which lets you do everything in the fly, That’s a nice gift by Keras.</p><blockquote>I can’t stress enough, how much useful ImageDataGenerator was, for deep learning.</blockquote><blockquote>ImageDataGenerator accepts the raw data, randomly transforms it as we want with the arguments given by us, and returns <em>only</em> the new, transformed data to be used while training.</blockquote><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6cb2f994bf3ce18bff9567e1d49166e6/href">https://medium.com/media/6cb2f994bf3ce18bff9567e1d49166e6/href</a></iframe><pre>Found 3670 images belonging to 5 classes.</pre><p><strong>Step 5:</strong></p><p>In this section, we will train our deep learning model with the data that we have prepared. We specify the Loss function and the optimizer. To know more about the Stochastic gradient optimizer, and how it differs from Normal Gradient descent look at the video below.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/caec91525c5b412b468d6670bc5efb6a/href">https://medium.com/media/caec91525c5b412b468d6670bc5efb6a/href</a></iframe><pre>Model: &quot;sequential&quot;<br>_________________________________________________________________<br>Layer (type)                 Output Shape              Param #   <br>=================================================================<br>conv2d (Conv2D)              (None, 55, 55, 96)        34944     <br>_________________________________________________________________<br>batch_normalization (BatchNo (None, 55, 55, 96)        384       <br>_________________________________________________________________<br>max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         <br>_________________________________________________________________<br>conv2d_1 (Conv2D)            (None, 27, 27, 256)       2973952   <br>_________________________________________________________________<br>batch_normalization_1 (Batch (None, 27, 27, 256)       1024      <br>_________________________________________________________________<br>conv2d_2 (Conv2D)            (None, 27, 27, 384)       885120    <br>_________________________________________________________________<br>batch_normalization_2 (Batch (None, 27, 27, 384)       1536      <br>_________________________________________________________________<br>conv2d_3 (Conv2D)            (None, 27, 27, 384)       1327488   <br>_________________________________________________________________<br>batch_normalization_3 (Batch (None, 27, 27, 384)       1536      <br>_________________________________________________________________<br>conv2d_4 (Conv2D)            (None, 27, 27, 256)       884992    <br>_________________________________________________________________<br>batch_normalization_4 (Batch (None, 27, 27, 256)       1024      <br>_________________________________________________________________<br>max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         <br>_________________________________________________________________<br>flatten (Flatten)            (None, 43264)             0         <br>_________________________________________________________________<br>dense (Dense)                (None, 4096)              177213440 <br>_________________________________________________________________<br>dense_1 (Dense)              (None, 4096)              16781312  <br>_________________________________________________________________<br>dense_2 (Dense)              (None, 5)                 20485     <br>=================================================================<br>Total params: 200,127,237<br>Trainable params: 200,124,485<br>Non-trainable params: 2,752<br>_________________________________________________________________</pre><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FW9iWNJNFzQI%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DW9iWNJNFzQI&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FW9iWNJNFzQI%2Fhqdefault.jpg&amp;key=a19fcc184b9711e1b4764040d3dc5c07&amp;type=text%2Fhtml&amp;schema=youtube" width="854" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/0b23d8fd56a651f73133e15aed404da1/href">https://medium.com/media/0b23d8fd56a651f73133e15aed404da1/href</a></iframe><p><strong>Step 6:</strong></p><p>Here we define the callbacks to be used while our model is training.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f0ac6ec3eb57a457aa48410fc1097b55/href">https://medium.com/media/f0ac6ec3eb57a457aa48410fc1097b55/href</a></iframe><p><strong>Step 7:</strong></p><p>Finally, we train the model. It’s useful to note that even when we specified the epochs = 50, the model was trained for only for 17 epochs, that&#39;s because of the callbacks we used to stop the training when a certain level of accuracy and loss is obtained.</p><blockquote>For a long time, I’m saying training the model, well! what does that mean?<br>It means our model is <strong>learning weights</strong> for the neurons to map the input given to output.<br>What we are dealing with, is a supervised learning problem. i.e we show the neural network that, for this input this is the output. Then our model learns from it using input and output data, the optimizer which will try to reduce the loss that we specified. <br>Model which inturn can be deployed to make prediction on the image that we will give in real time.</blockquote><p>Saving the model is important because later you can use it to deploy it, wherever you want. we can use it to deploy on the Lightweight Embedded device like Raspberry Pi, Mobile devices by converting it into <a href="https://www.tensorflow.org/lite">TFLite</a> model. Or Even you can deploy it on the browser using <a href="https://www.tensorflow.org/js">TensorFlow.js</a></p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/56d45331f9ffe7cefd15de806a3a4b99/href">https://medium.com/media/56d45331f9ffe7cefd15de806a3a4b99/href</a></iframe><p><em>My Blog is getting very long </em>🤯😅<em>… So you can find the training progress </em><a href="https://drive.google.com/file/d/1xwyT1GcV4P2INzvt-tP2wH8f8eW9W5sw/view?usp=sharing"><em>here</em></a></p><p><strong>Step 8:</strong></p><p>TensorBoard is <em>A nice tool for making implementations transparent. </em>So that you can ask other <em>deep learners</em> to debug your model or to demonstrate why your model is performing well.<br>You can follow the below command in cmd to upload it on TensorBoard.dev and get the link for TensorBoard Visualization.</p><p>PS: ‘logs’ is a directory of the log that will be stored during training.</p><blockquote>Interesting thing about the TensorBoard is that you can track how your model is performing During after the training. Cool!</blockquote><pre>tensorboard dev upload --logdir logs \<br>    --name &quot;AlexNet TensorFlow 2.1.0&quot; \                               <br>    --description &quot;AlexNet Architecture Implementation in TensorFlow 2.1.0 from scratch with list of callbacks for stopping training when the required metrics are met. Callbacks are also used for Tensorboard Visuals.&quot;</pre><p>You can see the TensorBoard visualization <a href="https://tensorboard.dev/experiment/xh8yDX2kR2SvPZgIVqIqNg/"><em>here</em></a><em>.</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*AWzCt5iTICD5QJ50sHLbXA.png" /><figcaption>Plots of<strong> Accuracy </strong>(y-axis) Vs <strong>epochs</strong>(x-axis) AND <strong>Loss</strong>(y-axis) Vs <strong>epochs</strong>(x-axis)</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*BOp-7uNBpHGV9L9lyck_3w.png" /><figcaption>Model Graph at TensorBoard</figcaption></figure><p><strong>Step 9:</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/280/1*SqpTVIO5ZCNxj6JESE0tnw.gif" /><figcaption>Example: Neural Network Recognizing <a href="http://yann.lecun.com/exdb/mnist/">Hand written Digits</a></figcaption></figure><p>In this section, we will evaluate the model performance. Even though I can do it in the training file itself, I’m doing it in a separate file <strong>just to let you know that we can use the saved model later for inference or evaluation with the real-time data.<br></strong>I did randomly downloaded 10 images in <a href="https://www.google.com/imghp?hl=en">Google Images</a> for each of the 5 classes(Total 50). Stored it in the same directory structure as the training dataset to make use of the ImageDataGenerator for the evaluation.</p><pre>Test_set<br>|__daisy<br>|__dandelion<br>|__roses<br>|__sunflowers<br>|__tulips</pre><p>The code is as same as already explained at the training of the model. But the difference is we are loading a saved model. Later, using the test data that we acquired from the web to do the Inference to find out how our model’s performing on the <strong>unseen data.</strong></p><p>Accuracy is then printed.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/bd190946d02693405720ef9120d405ad/href">https://medium.com/media/bd190946d02693405720ef9120d405ad/href</a></iframe><pre>48<br>[&#39;daisy&#39; &#39;dandelion&#39; &#39;roses&#39; &#39;sunflowers&#39; &#39;tulips&#39;]<br>5<br>Found 50 images belonging to 5 classes.</pre><pre>Model: &quot;sequential&quot;<br>_________________________________________________________________<br>Layer (type)                 Output Shape              Param #   <br>=================================================================<br>conv2d (Conv2D)              (None, 55, 55, 96)        34944     <br>_________________________________________________________________<br>batch_normalization (BatchNo (None, 55, 55, 96)        384       <br>_________________________________________________________________<br>max_pooling2d (MaxPooling2D) (None, 27, 27, 96)        0         <br>_________________________________________________________________<br>conv2d_1 (Conv2D)            (None, 27, 27, 256)       2973952   <br>_________________________________________________________________<br>batch_normalization_1 (Batch (None, 27, 27, 256)       1024      <br>_________________________________________________________________<br>conv2d_2 (Conv2D)            (None, 27, 27, 384)       885120    <br>_________________________________________________________________<br>batch_normalization_2 (Batch (None, 27, 27, 384)       1536      <br>_________________________________________________________________<br>conv2d_3 (Conv2D)            (None, 27, 27, 384)       1327488   <br>_________________________________________________________________<br>batch_normalization_3 (Batch (None, 27, 27, 384)       1536      <br>_________________________________________________________________<br>conv2d_4 (Conv2D)            (None, 27, 27, 256)       884992    <br>_________________________________________________________________<br>batch_normalization_4 (Batch (None, 27, 27, 256)       1024      <br>_________________________________________________________________<br>max_pooling2d_1 (MaxPooling2 (None, 13, 13, 256)       0         <br>_________________________________________________________________<br>flatten (Flatten)            (None, 43264)             0         <br>_________________________________________________________________<br>dense (Dense)                (None, 4096)              177213440 <br>_________________________________________________________________<br>dense_1 (Dense)              (None, 4096)              16781312  <br>_________________________________________________________________<br>dense_2 (Dense)              (None, 5)                 20485     <br>=================================================================<br>Total params: 200,127,237<br>Trainable params: 200,124,485<br>Non-trainable params: 2,752<br>_________________________________________________________________</pre><pre>1/2 [==============&gt;...............] - ETA: 3s - loss: 1.4212 - accuracy: 0.7188<br>2/2 [==============================] - 5s 2s/step - loss: 1.1020 - accuracy: 0.7000<br>accuracy:70.00%</pre><p>Hurray! <strong>70% ACCURACY</strong>. That’s fair for a model that didn’t even used a validation set while training. Hmm! It&#39;s possible for a model to perform even better at the unseen data by making them generalize well for the data it was not exposed to.</p><blockquote><strong><em>However, we did a great job making the model to do the classification for 5 classes for the flower images downloaded randomly from the web.</em></strong></blockquote><p><strong>Step 10:</strong></p><blockquote>Our model might be little bit overfitting to the training data. If it does not perform well on Test data. So we would need to use the validation data while training itself so that we can debug our model easily. We should also consider tuning the parameters and hyperparameters of the network.</blockquote><p>validation_split can be specified in the ImageDataGenerator for using the portion of the data available, to be the validation set.</p><h4>References :</h4><blockquote>💭“Winter is here.”</blockquote><p>Link to flower dataset is <a href="https://storage.googleapis.com/download.tensorflow.org/example_images/flower_photos.tgz">here</a>.<br>Link to Randomly downloaded test image dataset is <a href="https://drive.google.com/file/d/1r1wMFZ6khj11a5WppqD7U-UsHyuL5EQW/view?usp=sharing">here</a>.<br>Link to the saved model is <a href="https://drive.google.com/file/d/1qecUu7ptWbPvJo6NM_OyVBbyTeVnAVs0/view?usp=sharing">here</a>.<br>Link to the repository is <a href="https://github.com/PraveenKumar-Rajendran/AlexNet_TF2.1.0">here</a>.<br>Link to the AlexNet paper is <a href="https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf">here</a>.</p><h3>Finishing Things off</h3><blockquote>Most of the readers don’t make it to the end of the blog but you did, because you are special who just don’t just give up reading.</blockquote><p>I’m hoping that I’ve taught you something. If you’ve found this post useful then <strong>do clap</strong> and hold it for a while, for the better reach of my blog who will need it.<br>If you have any doubts, clarification, Suggestions for improvement, contact me on LinkedIn and raise the issue at GitHub.</p><p>Haa! I almost forgot to answer the question you had in your mind at the start. No problem, I Gotcha! Well, I’m a Software Tester by profession but that does not stop me from doing what I love to become.</p><blockquote>“When something is important enough, you do it even if the odds are not in your favor.” — Elon Musk</blockquote><ul><li><a href="https://praveenkumar-rajendran.github.io/">Praveen Portfolio</a></li><li><a href="https://www.linkedin.com/in/praveenkumar-rajendran/">Praveenkumar Rajendran - Automotive Embedded Software Test Engineer - SL Corporation | LinkedIn</a></li><li><a href="https://github.com/PraveenKumar-Rajendran">PraveenKumar-Rajendran - Overview</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=d398b7c76cf" width="1" height="1" alt=""><hr><p><a href="https://medium.com/analytics-vidhya/alexnet-tensorflow-2-1-0-d398b7c76cf">AlexNet TensorFlow 2.1.0</a> was originally published in <a href="https://medium.com/analytics-vidhya">Analytics Vidhya</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>