<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Bibek Chaudhary on Medium]]></title>
        <description><![CDATA[Stories by Bibek Chaudhary on Medium]]></description>
        <link>https://medium.com/@bibekchaudhary?source=rss-84a0db77e00e------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*7TdjpjQwDNMvcXOqyNX5gg.jpeg</url>
            <title>Stories by Bibek Chaudhary on Medium</title>
            <link>https://medium.com/@bibekchaudhary?source=rss-84a0db77e00e------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sun, 17 May 2026 19:15:19 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@bibekchaudhary/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Pelee: Real-time Object Detection System on Mobile Devices]]></title>
            <link>https://medium.com/@bibekchaudhary/pelee-real-time-object-detection-system-on-mobile-devices-f565947c04c4?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/f565947c04c4</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[ai-on-device]]></category>
            <category><![CDATA[object-detection]]></category>
            <category><![CDATA[edge-computing]]></category>
            <category><![CDATA[realtime]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Mon, 19 Aug 2019 06:31:32 GMT</pubDate>
            <atom:updated>2019-08-19T16:41:44.263Z</atom:updated>
            <content:encoded><![CDATA[<p>The rise of deep learning in the past decade has been astronomical, especially after introduction of <a href="http://cs231n.github.io/convolutional-networks/">CNN(Convolutional Neural Network)</a>. But this rise has been accompanied with bigger models and need for large compute power. Often these large(and compute heavy) models are hard to deploy for real-life application, especially on edge-devices. This is why <a href="https://venturebeat.com/2019/03/21/the-rise-of-on-device-ai-and-why-its-so-important-vb-live/">On-device AI</a> is gaining popularity and has become an active area of research. On-device AI requires deep learning models to be light-weight, power-efficient and accurate.</p><p>One of those models is <a href="https://arxiv.org/abs/1804.06882">Pelee: Real-time Object Detection System on Mobile Devices</a>, which I will review in this post. This post is divided into three parts:</p><ol><li>SSD: Single Shot Multibox Detection</li><li>PeleeNet for Classification</li><li>Pelee: Real-time Object Detection for tiny devices</li></ol><h3>SSD: Single Shot Multibox Detection</h3><p>Pelee is based on <a href="https://arxiv.org/abs/1512.02325">SSD</a>,but for resource constrained devices. So in order to fully understand Pelee, we first have to understand the architecture and working mechanism of SSD.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*7ydKUHXvFKz1RrjAzRwGqw.png" /><figcaption>SSD Architecture(pic taken from <a href="https://lilianweng.github.io/lil-log/2018/12/27/object-detection-part-4.html">here</a>)</figcaption></figure><p>SSD uses VGG-16 as base-network to extract high-level feature maps: 38x38 and 19x19 from the input image. Now there are many variants of SSD which uses different architectures like MobileNet(v1 and v2),SqueezeNet as base network to get high level features. SSD uses multi-scale feature maps to perform detection(classification+localization). It uses 38x38, 19x19, 10x10, 5x5, 3x3 and 1x1 feature maps to get predictions for object class and bounding box coordinates. If you want to learn more about SSD and its implementation, please go through this excellent <a href="https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection">tutorial</a>.</p><p>Pelee uses PeleeNet, a variant of <a href="https://arxiv.org/abs/1608.06993?source=post_page---------------------------">DenseNet</a> for mobile devices as its base-network to get high level feature maps.</p><h3>PeleeNet for Classification</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/981/1*NeEI1nA11P-HELxVvtrz8A.png" /><figcaption>Overall architecture of PeleeNet</figcaption></figure><p>The architecture of PeleeNet is designed considering the limited computing power and memory resource constraints of mobile devices. It contains three main parts: Stem Block, Dense Block, and Transition Layer. Let’s discuss about these blocks one by one.</p><p><strong>Stem Block</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/500/1*TEMU_QD9InRwOWlPJstScg.png" /><figcaption>Structure of Stem block</figcaption></figure><p>The architecture of stem block is designed to increase the feature expressive power without increasing the computational cost. Let’s code this block to understand it better.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6f6c107b0f817ae0039e30ff91ab7f19/href">https://medium.com/media/6f6c107b0f817ae0039e30ff91ab7f19/href</a></iframe><p><strong>Dense Block</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/764/1*Dp6maVHvBH4ekCLePhBq2g.png" /><figcaption>Structure of two-way dense block</figcaption></figure><p>Inspired by GoogLeNet, the original dense layer in DenseNet is modified into two way dense layer to learn visual patterns for large objects. The code for this architecture would be as follows:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0b20a69bb6e773ed4b822fd19fad9fa9/href">https://medium.com/media/0b20a69bb6e773ed4b822fd19fad9fa9/href</a></iframe><p><strong>Transition Layer</strong></p><p>It contains 1x1conv along with a Maxpool layer; the output channel is same as the input channel in transition layers. The code for transition layer is:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/7eb541eb63a0f7fd5afe89313e9161a1/href">https://medium.com/media/7eb541eb63a0f7fd5afe89313e9161a1/href</a></iframe><p>These were the three main building blocks of PeleeNet; the authors have open sourced their pytorch code for PeleeNet, so please refer to their <a href="https://github.com/Robert-JunWang/PeleeNet">repo</a> to learn more. For keras lovers, please refer to this <a href="https://gist.github.com/imbibekk/fde87f0a57f351d28e182828d118c98b">implementation</a>.</p><p>Now that we have an overview of how SSD and PeleeNet, we can start discussing about Pelee in the next section.</p><h3>Pelee: Real-time Object Detection for tiny devices</h3><p>The architecture for Pelee is similar to that of SSD except that Pelee uses PeleeNet as its base network where as SSD used VGG16. Another main difference is that Pelee uses only 5 scale feature maps for prediction where as SSD used 6 scale feature maps.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/532/1*FOofHfrufqPosHVuT3JsOA.png" /><figcaption>5 scale of feature maps used in Pelee for prediction</figcaption></figure><p>The 38x38 feature map was not used to maintain balance between speed and accuracy of edge devices; each of the five feature maps passes through a residual block to perform classification and regression.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HmYnXS_jepw_uJ7JH-xF7Q.png" /><figcaption>Residual Block prediction</figcaption></figure><p>Residual block is used to better extract features from each of the feature maps, and can be coded as follows:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/12cf295ba35b8c2032eb9d34a7a71630/href">https://medium.com/media/12cf295ba35b8c2032eb9d34a7a71630/href</a></iframe><p>Pelee outperformed every other approaches including including <a href="https://arxiv.org/abs/1612.08242">Yolov2</a>, SSD+MobileNet in every metrics: speed, model size and accuracy.</p><p>The following table demonstrates its performance on <a href="http://host.robots.ox.ac.uk/pascal/VOC/">PASCAL VOC </a>2007.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1013/1*UXquRISjyVyiyiXuxULrBA.png" /><figcaption>Pelee performance on PASCAL VOC 2007</figcaption></figure><p>In terms of speed, Pelee is significantly faster than SSD+MobileNet on iPhone and <a href="https://developer.nvidia.com/embedded/jetson-tx2">TX2</a> in <a href="https://www.reddit.com/r/NintendoSwitch/comments/5urua3/explanation_of_flops_and_fp32_and_fp16/">FP32</a> mode</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/922/1*VYWx12x3bVurL_j8Hw4ZwA.png" /><figcaption>Speed on Real devices</figcaption></figure><p>With efficient architecture design Pelee achieved state-of-the-art performance for object detection on mobile devices but was surpassed by <a href="https://arxiv.org/abs/1807.11013">TinyDSOD</a>, which I will review in my next post.</p><p>Reference:</p><ol><li><a href="https://arxiv.org/abs/1804.06882"><em>Pelee: A real-time Object Detection System on Mobile Devices</em></a></li><li><a href="https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Object-Detection"><em>Pytorch sdd-tutorial</em></a></li></ol><p><strong>PS: I wrote this post based on my understanding of </strong><a href="https://arxiv.org/abs/1804.06882"><strong>Pelee</strong></a><strong>. Any suggestion/improvement about the content and/or style of writing will be appreciated.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=f565947c04c4" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Segmentation in Robotic Surgery]]></title>
            <link>https://medium.com/@bibekchaudhary/segmentation-in-robotic-surgery-7e806bbc38dd?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/7e806bbc38dd</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[robotics]]></category>
            <category><![CDATA[segmentation]]></category>
            <category><![CDATA[surgery]]></category>
            <category><![CDATA[deep-learning]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Sun, 11 Nov 2018 17:42:22 GMT</pubDate>
            <atom:updated>2018-11-11T17:44:30.035Z</atom:updated>
            <content:encoded><![CDATA[<p><em>Application of Deep Learning in Robotic Surgery</em></p><p>This is my second blog based on the <a href="https://www.fast.ai/">fastai </a>lesson. I wanted to apply the learning from the lesson to a different dataset with binary labels. I decided to use the dataset from the <a href="https://endovissub2017-roboticinstrumentsegmentation.grand-challenge.org/">Endoscopic Vision Challenge</a> and focused only on segmenting binary classes.</p><p>After downloading the dataset, black borders from images and masks were copped using this <a href="https://github.com/ternaus/robot-surgery-segmentation/blob/master/prepare_data.py">data-preparation</a> code.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/648/1*8mQK00X6LIaEcC8BYI8jQg.png" /><figcaption>sample Image(left) and corresponding mask(right)</figcaption></figure><p>The popular <a href="https://arxiv.org/abs/1505.04597">Unet </a>architecture was used for this segmentation task with a slight tweak: left side of the Unet was pre-trained Resnet34. fastai-v1 allows you to build a Unet architecture with pre-trained models as encoders.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/531/1*CxxAZ_pKZqEV2b0LMXD5nA.png" /><figcaption>create a learner: Unet architecture with Resnet34 encoder</figcaption></figure><p>The metric <strong>dice</strong>, which measures the similarity between two sets(in this case, real mask and predicted mask) also comes with this library. fastaiers has implemented pretty everything that we need to get state-of-the-art results.</p><p>The first stage of training was done using <strong>one-eighth size of the data </strong>samples due to computational constraints and large image sizes of 1024*1280 pixels.</p><p>The training was done following this pipeline:</p><p><strong>(freezed layers)train → find learning rate → unfreeze layers → train</strong></p><p>This resulted with a <strong>dice coefficient of 0.85 </strong>with nice prediction results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/525/1*cCYxP5FiZB2zvpBrJIKPgA.png" /><figcaption>results from first stage training with one-eight image size</figcaption></figure><p>In the second stage, data sizes were <strong>increased to 512*640</strong>, which is half of the original size. Using full sizes of 1024*1280 resulted in repeated CUDA run- time Error. So, I decided to continue with 512*640 and updated the data-block with new size and batch-size.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/561/1*PhDsSFbEVpUcfSXUrfTOBQ.png" /><figcaption>data-block for updated size and batch-size</figcaption></figure><p>The training started from where it was stopped in the last stage and followed the same training pipeline. This resulted in <strong>dice coefficient of 0.941522 </strong>with sharp edge masks for the validation set.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/549/1*4RvBV0ELEEAaI2YLIOyJ1w.png" /><figcaption>results from second-stage training and 512*640 image size</figcaption></figure><p>The predicted masks are very similar to the ground truths, but I believe the results can be improved by using the full size of the image in the second stage.</p><p>This was nice learning experience; one thing that amazed me is that the model did well even though the <strong>input size was not square, it’s rectangle</strong>.</p><p>This is fastai for you!!!</p><p><em>Code for this post can be found </em><a href="https://nbviewer.jupyter.org/gist/imbibekk/a2ffb143086ef3d161278f340d2e2b2e"><em>here</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7e806bbc38dd" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Are you Chinese, Japanese or Korean?]]></title>
            <link>https://medium.com/@bibekchaudhary/are-you-chinese-japanese-or-korean-93e4bf270a5?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/93e4bf270a5</guid>
            <category><![CDATA[chinese]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[fastai]]></category>
            <category><![CDATA[korean]]></category>
            <category><![CDATA[japanese]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Wed, 31 Oct 2018 19:38:17 GMT</pubDate>
            <atom:updated>2018-11-01T19:35:40.183Z</atom:updated>
            <content:encoded><![CDATA[<p><em>Image Classifier based on fast.ai lecture</em></p><p>One of the challenges that I face living in S. Korea is to tell the difference between Chinese, Japanese and Koreans. The similarities in their appearances has led to many awkward moments during my stay.</p><p>I wish to avoid those awkward moments by building a classifier that will differentiate between them for me.</p><p>First, we need a dataset to train an image classifier. Since I did not found any public dataset for this task, I created my own dataset; images of Chinese, Japanese and Koreans of both gender(and of all ages) were scrapped from the internet. I ended up with a <strong>dataset that contained 171 images of Chinese, 168 images of Japanese, and 167 images of Koreans</strong>. Samples from the dataset is shown below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/652/1*djJMayjDK4un1Uq6kcgZog.png" /><figcaption>female samples from dataset</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/669/1*IWyjNePIjeNsbUC3Tifp0Q.png" /><figcaption>male samples from dataset</figcaption></figure><p>Now that we have the dataset, we can build a model to train the image classifier. The architecture used to train on this dataset was <strong>Resnet50, pre-trained on </strong><a href="http://www.image-net.org/"><strong>ImageNet </strong></a><strong>dataset</strong>. You can learn more about this architecture in this <a href="http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006">post</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/483/1*hvmKMZt2Sv40mcaIZKL3-w.png" /><figcaption>learner for image classifier</figcaption></figure><p>At first, only the last layer of the Resnet50 was trained — freezing the weights of other layers. This accuracy after 20 epochs was around 70%.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/466/1*ggHnH9rzjtLGd61GCIKhgw.png" /><figcaption>Training details for freezed Resnet50</figcaption></figure><p>70%? Not bad, huh?</p><p>Now let’s fine-tune to see if it makes the model better.</p><p>We unfreeze and train all the layers. We will use be using a learning rate of 1e-6 to train the first layer and 1e-5 for the last layer. All the other layers will be trained using the learning rates that fall in the range of [1e-6,1e-5]. <strong>The first and other initial layers are trained with smaller learning rate because these layers learn generic features whereas upper and deeper layers learn task-specific features.</strong>To learn more about it, read <a href="https://arxiv.org/abs/1311.2901">this</a>.</p><p>So after unfreezing and training for 20 epochs, the accuracy was around 70%. The performance did not improve with fine-tuning. One reason could be that our dataset is similar to ImageNet, which has human images as well.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/492/1*nKmaJKEl_83J7ukJzZDjXA.png" /><figcaption>Training details for unfreezed Resnet50</figcaption></figure><p>Now, let’s interpret the results. We will start by plotting confusion matrix which compares the prediction of image classifier with actual results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/807/1*M9I3rS5xHZiaSmuepX-jYw.png" /><figcaption>confusion matrix</figcaption></figure><p>Let’s take the first column of the matrix and interpret its meaning.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/291/1*cPfj7vla3jB8FTGANDR3vg.png" /></figure><p><strong>The 17 images of Chinese were correctly predicted as Chinese by the classifier but 5 images of Japanese were incorrectly predicted as Chinese, and 8 Korean images were confused as of Chinese.</strong></p><p>This is more clear if we look at samples of the cases that were confused.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/909/1*FLvw_M2ffDxyRnPvsVWmhQ.png" /><figcaption>samples of confused cases</figcaption></figure><p>We can also see the cases in which the classifier was most confused.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/353/1*-6WoPTU9AfiWMJk1ovBmJw.png" /><figcaption>most confused cases</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/193/1*CS10RfpxHT-c8igpBituIQ.png" /><figcaption>The classifier was most confused between Korean and Chinese. It misclassified Korean as Chinese 8 times.</figcaption></figure><p><strong>Some Thoughts:</strong></p><ol><li>To differentiate between Chinese, Japanese and Koreans is not an easy task even for the AI-trained classifier.</li><li>The accuracy of the classifier can be improved if trained longer.</li><li><a href="http://www.fast.ai/">fastai</a> and <a href="https://twitter.com/jeremyphoward?lang=en">@jermey</a> will help me to create things that will make my (and hopefully others) life easy.</li></ol><p><em>Code and data for this post can be found in this </em><a href="https://github.com/imbibekk/ethnicClassifier"><em>repo</em></a><em>.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=93e4bf270a5" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Solving Markov Decision Process]]></title>
            <link>https://medium.com/@bibekchaudhary/solving-markov-decision-process-917e233a2985?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/917e233a2985</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[dynamic-programming]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[markov-chains]]></category>
            <category><![CDATA[reinforcement-learning]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Thu, 27 Sep 2018 21:31:24 GMT</pubDate>
            <atom:updated>2018-09-27T21:31:24.188Z</atom:updated>
            <content:encoded><![CDATA[<p>Policy Iteration+ Value Iteration</p><p>In the <a href="https://medium.com/@bibekchaudhary/markov-decision-process-mdp-simplified-1ae44cf53cc1">last post</a>, I wrote about Markov Decision Process(MDP); this time I will summarize my understanding of how to solve MDP by policy iteration and value iteration.</p><p>So what is policy iteration and value iteration?</p><p>These are the algorithms in Dynamic Programming that are used to solve finite MDP. Dynamic Programming allows you to solve complex problems by breaking into simpler sub-problems and solving those sub-problems gives you the solution to main complex problem. It has two properties:</p><ol><li><strong>Divide and Conquer: </strong>This means dividing a bigger and complex problem into smaller and simpler sub-problems. Intuition: 4 is the sum of two 2s –4=2+2</li><li><strong>Information Reuse:</strong> This means using the information that is already available to solve recurring sub-problems. Intuition: The concepts used to solve simpler problem can be used to solve complex problems.</li></ol><p>Policy Iteration and Value iteration use these properties of MDP to find the optimal policy.</p><p><strong>Policy Iteration:</strong> It contains two parts — policy evaluation and policy improvement.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/262/1*xyZ3mZ-_FQjWfk1HRO_Irw.png" /></figure><p>In policy evaluation, given a policy ∏(pi) we need to evaluate how much future rewards can we get following this policy starting from the state s. This is done by evaluation state-value function:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/340/1*gn4B_Kl1hq7ZrNqIlw-9yA.png" /><figcaption>state-value function</figcaption></figure><p>In the policy improvement step, this policy is improved by taking greedy actions with respect to state-value function(shown in above figure).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/154/1*-15eEaZ_rVxV2gF3fR8Usw.png" /><figcaption>policy improvement via greedy action</figcaption></figure><p>So, what does being greedy means?</p><p>Here it means to select the action that maximizes the future rewards that we can get if we take action a in state s and follow policy ∏(pi) thereafter.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/245/1*o13oYAx3C0A2I6qogUoMNA.png" /><figcaption>policy improvement via greedy action</figcaption></figure><p>Now we wanna know whether following this new greedified policy from state-s will give us more or less future reward that just following previous policy ∏(pi) from that state. It turns out that starting from the state-s and taking action-a according to this new greedified policy is atleast as good as good, or better than just following the previous policy ∏(pi) for one step.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/470/1*zdWcknUt72qHaJSmZmESjw.png" /></figure><p>So we can say that this new greedified policy improves our chances of getting more future rewards starting from the state-s.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/146/1*yPlILuhLlF2i3Rz3Kh0nSA.png" /></figure><p>If we apply this notion to all the successive steps, then we can show this new policy is atleast as good as,if not better than the previous policy for the whole trajectory.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/678/1*qrXIJ8H2GRsS5XwQap86Bg.png" /></figure><p>Policy evaluation and improvement are done iteratively until the optimal policy is obtained; optimal policy is reached when policy stops improving and thus Bellman optimality equation is satisfied.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/561/1*AU08s-bTrDnNnEn8rFy7oA.png" /></figure><p>Let’s take an example to apply these concepts and make our understanding more concrete.</p><p>Given a 4 by 4 grid-world, we need to find the optimal policy via policy iteration to reach the goal. It is an undiscounted episodic MDP where every action is equally likely. States are {1,2……,14}</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/534/1*0EihhiCeQUNQjd87No-gnQ.png" /></figure><p>The policy iteration will start with a random policy and then improves it by taking greedy actions.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/551/1*oixkWHUFlOEjxdLWv3rGNg.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/562/1*TconHgoNK0uOj9j-jxT9lg.png" /></figure><p>After certain iterations(in this case k=3), the policy stops improving and hence optimal policy is obtained.</p><p>One major drawback of policy iteration is the computational cost involved in evaluating policies. This cost is reduced in value iteration which stops policy evaluation after k=1 and updates the policy every step thereafter.</p><p><strong>Value Iteration: </strong>Unlike policy iteration, it merges the policy evaluation and improvement steps into one and performs an iterative update using the value function of Bellman optimality equation.</p><p>In value iteration, given a policy ∏(pi), we evaluate the policy once and continuously improve via iterative Bellman optimality equation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/272/1*1ELESflpfZpQV-fY1oPTVA.png" /></figure><p>The iterative version of Bellman optimality can be written as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/470/1*s3YSpcLBtRBaJade3_hzww.png" /></figure><p>The corresponding backup diagram is shown below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/364/1*h3I9aPNCqyhB3YWt562z3g.png" /></figure><p>This is based on the intuition that if the value function of successor state is known then the value function of current state can be found by one-step lookahead.</p><p>References:</p><ol><li><a href="http://incompleteideas.net/book/bookdraft2017nov5.pdf">An introduction to Reinforcement Learning, Sutto and Barto</a></li><li><a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html">David Silver Course on Reinforcement Learning</a></li></ol><p><strong>PS: I wrote this post based on my understanding of Reinforcement Learning. Any suggestion/improvement about the content and/or style of writing will be appreciated.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=917e233a2985" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Gradient Descent: Stochastic vs Batch]]></title>
            <link>https://medium.com/@bibekchaudhary/gradient-descent-stochastic-vs-batch-517e092b083f?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/517e092b083f</guid>
            <category><![CDATA[gradient-descent]]></category>
            <category><![CDATA[epoch]]></category>
            <category><![CDATA[batch-processing]]></category>
            <category><![CDATA[optimization]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Mon, 24 Sep 2018 15:55:13 GMT</pubDate>
            <atom:updated>2018-09-24T15:57:04.559Z</atom:updated>
            <content:encoded><![CDATA[<p><em>The difference is in weight updating patterns</em></p><p>This is my first post on <a href="https://twitter.com/sirajraval/status/1014758160572141568?lang=en">#100DaysofMLCode</a> challenge; everyday I plan to read(which will result in blog to check my understanding) and/or coding.</p><p>In this post, I will talk about Stochastic Gradient Descent and its difference with Batch Gradient Descent. I will also explain about mini-batch and epoch. I am assuming that you are familiar about Gradient Descent; if you are not, then read this <a href="https://hackernoon.com/gradient-descent-aynk-7cbe95a778da">blog</a> to get the intuition behind Gradient Descent.</p><p>Machine Learning is all about finding meaningful patterns and relations in the data. As a result of learning, parameter — called weight(+bias) is obtained which is used for prediction.</p><p>Gradient descent is a popular optimization algorithms which minimizes the loss function and updates the parameters(weights+biases); if the update takes place for every sample, then it is called Stochastic Gradient Descent.</p><p><strong><em>For 100 samples, SGD updates the weights 100 times.</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/520/1*LgO5vUevQqCnDMdEOSdVOQ.png" /><figcaption>SGD Algorithm</figcaption></figure><p>In Batch Gradient Descent(BGD), the update takes place only once for the whole training samples.</p><p><strong><em>For 100 samples, BGD updates the weight only once.</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/489/1*BwY7oLCTrRUiONm8p6ff8g.png" /><figcaption>BGD Algorithm</figcaption></figure><p>Training using the whole sample is rare; mini-batches of samples are used commonly to train machine learning models.</p><p><strong><em>For 100 samples, if the batch size is 20 then mini-batch gradient descent updates the weight five times.</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/507/1*MVcd6jzZ-jJ59-jyBUi7jg.png" /><figcaption>Mini-BDG algorithm</figcaption></figure><p>In Machine Learning literature, you will often hear the word — epoch; it means means one full pass of training over the entire training samples. In the case of mini-BGD, 1 epoch of training is done when the update over the fifth mini-batch of 20 samples is completed; in SGD, this happens when the update over the last sample(100th in this case) is completed.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/633/1*hz27NLPFQrcM0tyi7WOdyQ.png" /><figcaption>One epoch using mini-BGD</figcaption></figure><p><strong>PS: I wrote this post based on my understanding of Stochastic and Batch Gradient Descent. Any suggestion/improvement about the content and/or style of writing will be appreciated.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=517e092b083f" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Markov Decision Process(MDP) Simplified]]></title>
            <link>https://medium.com/@bibekchaudhary/markov-decision-process-mdp-simplified-1ae44cf53cc1?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/1ae44cf53cc1</guid>
            <category><![CDATA[markov-chains]]></category>
            <category><![CDATA[reinforcement-learning]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Tue, 18 Sep 2018 17:38:45 GMT</pubDate>
            <atom:updated>2018-09-27T21:25:20.623Z</atom:updated>
            <content:encoded><![CDATA[<p><em>MDP gives the mathematical formulation of Reinforcement Learning Problem</em></p><p>Markov Decision Process(MDP) is an environment with <strong>Markov</strong> states; Markov states satisfy the <strong>Markov Property</strong>: the state contains all the relevant information from the past to predict future. Mathematically,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/623/1*tormRXstV4DIKuDX8Cr0VA.png" /><figcaption>pic taken from <a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf">David Silver lecture slide</a></figcaption></figure><p>So if I say that the state <strong>S&lt;t&gt;</strong> is Markov, that means, it has all the important representations of the environment from previous states(which means you can throw away all the previous states). Think of it in this way: when you have the boarding pass, you do not need your ticket anymore to board the plane; your boarding pass already contains all the necessary information about boarding.</p><p>MDP is formally defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/509/1*ArjVZTDhGWoaubzh68jMuA.png" /><figcaption>MDP tuple</figcaption></figure><p>Let’s take an example to develop intuition about MDP.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/751/1*ioi8f7jSzmpWzkSu-XwsJw.png" /><figcaption>Student MDP Example from <a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf">David Silver lecture slide</a></figcaption></figure><p>Suppose that you are a student and the figure above portrays the scenario of one of your day at school. The circles and square represents the states you can be in and the words in red are the actions that you can take depending on the state you are in; for example, in the state Class 1 you can choose whether you want to study or check your Facebook and depending on what actions you take, a numerical reward is given. There is also an action node(back dot in the figure) from where you can end up in different states depending on the transition probability; for example, after you decide to go to Pub from Class 3, you have 0.2 probability of getting into Class 1. This node shows the randomness of the environment over which you have no control. In all other cases, the transition probability is 1 and if the discount factor is 1 then MDP can be defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/491/1*FJea_1T2Ysu_ZyJSgwurCQ.png" /><figcaption>MDP Example</figcaption></figure><p>Now that we have MDP, we need to solve it to find the best path that will maximize the sum of rewards, which is the goal of solving reinforcement learning problems. Formally, we need to find an optimal policy that will maximize the overall reward that an agent can get.</p><p>To solve MDP, we first have to know about the policy and value function.</p><p>In simple terms, policy tells you which actions to take. It is defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/837/1*XgaunP0ogQLPGL33SWBu9g.png" /><figcaption>Policy definition taken from <a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf">David Silver lecture slide</a></figcaption></figure><p>For MDPs, the policy depends only on the current state.</p><p>Value function can be defined in two ways: state-value function and action-value function. State-value function tells you “how good” is the state you are in where as Action-value function tells you “how good” is it to take a particular action in a particular state. The “how good” of a state(or state-action pair) is defined in terms of expected future rewards.</p><p>The state-value function is defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/835/1*FxIoBsuZTOOuw4p9csqhhQ.png" /><figcaption>state-value function definition taken from <a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf">David Silver lecture slide</a></figcaption></figure><p>Similarly, the action-value function is defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/835/1*qDB3RleLRTe2NKmIPQJtGg.png" /><figcaption>action-value function taken from <a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/MDP.pdf">David Silver lecture slide</a></figcaption></figure><p>If we take the maximum of the value function over all policies, we get the optimal value function. Once we know the optimal value function, we can solve MDP to find the best policy.</p><p>The value functions that we defined above satisfy the <strong>Bellman equation</strong>; it states: “the value of the start state must equal the (discounted) value of the expected next state, plus the reward expected along the way.”</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/253/1*1Tr17ZqADuXvJBwyrEPq0Q.png" /></figure><p>For example, if we take the path from Class 1 to Class 2 then we can write the Bellman equation in the following way:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/332/1*HPwUhHl_n5hA-AHyjhVwFA.png" /><figcaption>Bellman equation for value function</figcaption></figure><p>The Bellman optimality equation can be written in similar ways:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/336/1*Pq3d9xAL47SWj01weffK8w.png" /><figcaption>Bellman optimality equation for Value function</figcaption></figure><p>These concepts can be easily extended to multiple paths with different actions taking to different states. In this case, the Bellman optimality equation is:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/451/1*cG8DP0vr-mTlGdCaiKzYGw.png" /><figcaption>Optimal state-value function</figcaption></figure><p>Using above equation, we can find the optimal value function for each state in our student MDP example.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/579/1*VxpjQjKe43sqRAAY_UpUwQ.png" /><figcaption>Optimal state-value function</figcaption></figure><p>The optimal action-value can be expressed in similar fashion as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/407/1*fFRhDWp_BeWNuFhwjHDQ9Q.png" /><figcaption>Optimal action-value function</figcaption></figure><p>This equation gives the following result in our student MDP example.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/701/1*VbI8ssdx6VaZbs4IvWWKow.png" /><figcaption>Optimal action-value function</figcaption></figure><p>Once we have action-value function, we can find the optimal policy by taking their maximum. Formally, it would be:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/502/1*Ae3H-_VUhTZUdQsqD9Fjcw.png" /><figcaption>Optimal policy</figcaption></figure><p>The optimal policy, which will maximize the reward for our Student is shown by the red arcs in the figure below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/707/1*YuH2hGJlJODqUrelYCRmsg.png" /><figcaption>Optimal-policy</figcaption></figure><p>Summary:</p><p>MDP represents the reinforcement learning problem mathematically and the goal of solving MDP is to find an optimal policy that will maximize the sum of expected reward. Finding an optimal policy becomes easier once we have the action-value function. The intuition behind Bellman equation simplifies the process of finding action-value function.</p><p>References:</p><ol><li><a href="http://incompleteideas.net/book/bookdraft2017nov5.pdf">An introduction to Reinforcement Learning, Sutto and Barto</a></li><li><a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html">David Silver Course on Reinforcement Learning</a></li></ol><p><strong>PS: I wrote this post based on my understanding of Reinforcement Learning. Any suggestion/improvement about the content and/or style of writing will be appreciated.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1ae44cf53cc1" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Spine Segmentation using U-Net]]></title>
            <link>https://medium.com/@bibekchaudhary/spine-segmentation-using-u-net-14bc5ab22b78?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/14bc5ab22b78</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[segmentation]]></category>
            <category><![CDATA[biomedical]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[spine-surgery]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Wed, 12 Sep 2018 07:04:51 GMT</pubDate>
            <atom:updated>2018-09-12T07:07:14.923Z</atom:updated>
            <content:encoded><![CDATA[<p>This post is based on my internship experience where I worked on the segmentation of Spine using <a href="https://arxiv.org/abs/1505.04597">U-Net</a> architecture.</p><p><strong>Data-Set:</strong> <a href="https://www.emedicinehealth.com/ct_scan/article_em.htm">CT</a> scans of 11 patients collected from the institution-affiliated hospital. The data were in <a href="https://en.wikipedia.org/wiki/DICOM">dicom</a> format with no labels.</p><p><strong>Image Pre-processing: </strong>Since the data had no labels, I had to generate labels manually. I used <a href="https://www.slicer.org/">3D slicer</a>’s automatic segmentation feature to generate labels and save them as dicom files.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rHt_wOA_qwZMnk0LKWnmuQ.png" /><figcaption>Automatic-Segmentation in 3D Slicer</figcaption></figure><p>The above figure shows how automatic segmentation can be used to generate labels(mask). You can slide the slider to adjust the noise in the mask.</p><p>To save the mask as dicom files:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*afkrMBokP2-yQieieCecbA.png" /><figcaption>Creating Dicom Series of masks</figcaption></figure><p>The dicom files then can be read,cropped and saved as .png files using python package: <a href="https://pydicom.github.io/pydicom/stable/getting_started.html">pydicom</a></p><p>An instance of image and label obtained after pre-processing are shown below:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/690/1*QFnYIxOs6AVImW-KkdiGGQ.png" /><figcaption>Image(left) and Label(right)</figcaption></figure><p>There are still noises(white dots) in the label — It was acceptable in my case. You can manually reduce these noises in 3D slicer if you want.</p><p><strong>Training: </strong>The images and labels obtained were split into train and test set, and trained using U-Net architecture. The training was done for 100 epochs using the Adam Optimizer with a learning rate of 0.001.</p><p><strong>Evaluation metric: </strong>Jaccard Index, also known as Intersection over Union (IoU) was used as the evaluation metric during training. For two sets A and B, Jaccard index is defined as the following:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/352/1*xpwWkZsoyMWzG-RtqZK6_w.gif" /><figcaption>Jaccard Index( IoU)</figcaption></figure><p><strong>Loss function: </strong>The loss function used for optimization can be defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/316/1*ZT-AQnFCTpNlWu6CRdGHIg.png" /><figcaption>Loss function: L</figcaption></figure><p><strong>Result: </strong>The Jaccard Index obtained after training for 100 epochs with U-Net architecture was 0.7; This can be improved by training longer and using data-augmentation, both of which were not used in this project.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/664/1*G3SsRnq4_YMZKeNUqYu-Fg.png" /><figcaption>Image with real label(left) and Image with predicted label(right)</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/663/1*RIm4bGiBHziVpsvh2cMNpA.png" /><figcaption>Image with real label(left) and Image with predicted label(right)</figcaption></figure><p><strong>PS: I cannot publicly share the data-set and codes of this project as it was not a personal project. However I edited and improved on </strong><a href="https://github.com/jocicmarko/ultrasound-nerve-segmentation"><strong>this</strong></a><strong> post.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=14bc5ab22b78" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Reinforcement Learning Simplified]]></title>
            <link>https://medium.com/@bibekchaudhary/reinforcement-learning-simplified-1cf40285f05d?source=rss-84a0db77e00e------2</link>
            <guid isPermaLink="false">https://medium.com/p/1cf40285f05d</guid>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[reinforcement-learning]]></category>
            <category><![CDATA[robotics]]></category>
            <dc:creator><![CDATA[Bibek Chaudhary]]></dc:creator>
            <pubDate>Tue, 11 Sep 2018 21:50:11 GMT</pubDate>
            <atom:updated>2018-09-11T21:59:14.455Z</atom:updated>
            <content:encoded><![CDATA[<p><em>In simple terms — Reinforcement Learning is learning from experience</em></p><p>Just like humans, machines can also learn from its interaction with the environment; Reinforcement Learning is how they can do it. It is the branch of Machine Learning in which the learner is not trained(like other Machine Learning domains) rather, supposed to learn from its experience by interacting with the environment. The interaction includes taking actions through trial-and-error search, and getting feedback( positive or negative) from the environment. It has the following elements:</p><ol><li><strong>Agent: </strong>It learns and makes decision by interacting with its environment.</li><li><strong>Environment: </strong>Everything that is outside of agent and cannot be directly controlled by the agent is known as the environment. It responds to agent’s action by giving feedback and presents new state to the agent.</li><li><strong>Reward function: </strong>It defines the reward of the agent depending on its action. It tells the agent what kind of reward it will get if it takes a particular action.</li><li><strong>Policy: </strong>The behavior of the agent is defined by the policy. It tells the agent what actions to take and what actions to avoid to achieve its goal.</li><li><strong>Value function: </strong>It evaluates the action of the agent taken in a particular state considering futu re rewards. It give the agent information about the long term consequences its actions.</li><li><strong>Model of the environment(optional): </strong>It is the representation of the environment based on which it gives feedback and presents new state to the agent.</li></ol><p>I will illustrate the idea behind each element through a popular childhood game of tic-tac-toe.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/195/1*JJBmGqvcAPr_QKNoG_jsEg.png" /><figcaption>tic-tac-toe game</figcaption></figure><p>Tic-Tac-Toe is a 3x3 board game of two players and the players who successfully place Os or Xs in three consecutive places either horizontally, vertically or diagonally wins the game. The game is draw otherwise. The above figure shows Xs in three consecutive places diagonally.</p><p>Now consider two players — player A and player B are playing against each other; Player A is is an imperfect player(who is semi-skilled and can make mistakes at times) and Player B is the one who can learn from experience.In this case the elements are:</p><p><strong>Agent</strong>: Player B because it can learns and makes decisions based on its interaction with the environment.</p><p><strong>Environment:</strong> everything(including Player A) is the environment as it gives feedback and presents new states to Player B.</p><p><strong>Reward signal:</strong> Goal of the player B; In this case to win the game</p><p><strong>Policy:</strong> What move to make when going from one state to another?</p><p><strong>Value function:</strong> What moves are good or bad for Player B in the long term?</p><p><strong>Model of the environment:</strong> representation of the environment which is used to give reward to player B</p><p>Now that we have an overview of the elements of reinforcement learning. Let me explain about the interaction between them.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/769/1*rAi2PI4oELUqGjcj9MzOmA.png" /><figcaption>Agent-Environment Ineraction</figcaption></figure><p>At each time step t, the environment sends some information about agent’s state s&lt;t&gt;;In above example, s&lt;t&gt; is column/row of the board. The agent then takes an action a&lt;t&gt; depending on the s&lt;t&gt;. In the case of tic-tac-toe game, a&lt;t&gt; would be the move Player B makes after knowing about its state. As a consequence of agent’s action, the environment then sends a numerical reward r&lt;t+1&gt; at time step t+1. This interaction continues until the agent achieves its goal.</p><p>References:</p><ol><li><a href="http://incompleteideas.net/book/bookdraft2017nov5.pdf">An introduction to Reinforcement Learning, Sutto and Barto</a></li><li><a href="http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html">David Silver Course on Reinforcement Learning</a></li></ol><p><strong>PS: This is my first online post. I wrote it based on my understanding of Reinforcement Learning. Any suggestion/improvement about the content and/or style of writing will be appreciated.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1cf40285f05d" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>