<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Debarko De 🦁 on Medium]]></title>
        <description><![CDATA[Stories by Debarko De 🦁 on Medium]]></description>
        <link>https://medium.com/@debarko?source=rss-6a4320201780------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*qqKeigIdHoP7sg5fiNPDiQ.jpeg</url>
            <title>Stories by Debarko De 🦁 on Medium</title>
            <link>https://medium.com/@debarko?source=rss-6a4320201780------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Mon, 18 May 2026 04:34:38 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@debarko/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Interviews for Engineeribng]]></title>
            <link>https://medium.com/@debarko/interviews-for-engineeribng-16f294710c8c?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/16f294710c8c</guid>
            <category><![CDATA[engineering-management]]></category>
            <category><![CDATA[interview]]></category>
            <category><![CDATA[developer]]></category>
            <category><![CDATA[promotion]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Fri, 09 Feb 2024 13:10:33 GMT</pubDate>
            <atom:updated>2024-02-09T13:11:04.253Z</atom:updated>
            <content:encoded><![CDATA[<h3>Interviews for Engineering Managers</h3><p>Transitioning into an Engineering Manager role can be both exciting and challenging. Often, interviews for this position encompass various aspects of team management, leadership, and problem-solving. While some argue that focusing on becoming a better manager overall is key, there’s merit in understanding the common scenarios and topics that interviewers often explore.</p><p>As someone who has been a manager for over seven years, overseeing teams of various sizes and functions, I’ve encountered a wide range of interview questions across different companies. In this guide, I aim to compile a comprehensive list of common scenarios and topics that candidates might encounter during Engineering Manager interviews. While the list may not be exhaustive, it draws from my personal experience and interactions in the field.</p><p>Understanding the Categories: The interview rounds for Engineering Manager positions may go by different names like team management rounds, people rounds, or process rounds. Despite the varied terminology, the underlying themes often overlap. In this guide, I’ve categorized the topics based on the key areas of focus:</p><ol><li>Team Management and Leadership: Questions in this category assess your ability to lead and inspire teams, handle conflicts, and foster a positive work culture.</li><li>Problem-Solving and Decision-Making: These questions evaluate your approach to solving complex problems, making tough decisions, and prioritizing tasks effectively.</li><li>Technical Proficiency: While less common, some interviews may include technical questions to gauge your understanding of the technical aspects relevant to your role.</li><li>Communication and Collaboration: This category encompasses questions related to communication skills, stakeholder management, and collaboration within and across teams.</li><li>Project and Process Management: Questions here explore your experience in project management methodologies, process optimization, and ensuring efficient workflows.</li></ol><p>Approach to Answering Questions: Rather than providing specific answers, I’ll guide you on how to approach crafting responses that align with your experiences and strengths. Remember, authenticity and relevance are key in showcasing your capabilities effectively.</p><blockquote><strong>I talk about each of these points in an article on my blog. Here’s a link to that article. Please continue reading the rest of the article there.</strong></blockquote><p><a href="https://debarko.de/engineering-manager-interviews/">Engineering Manager Interviews</a></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eSi28x75iWedxm-YCRFOyA.png" /><figcaption><a href="https://debarko.de/engineering-manager-interviews/">https://debarko.de/engineering-manager-interviews/</a></figcaption></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=16f294710c8c" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[RNN or Recurrent Neural Network for Noobs]]></title>
            <link>https://medium.com/hackernoon/rnn-or-recurrent-neural-network-for-noobs-a9afbb00e860?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/a9afbb00e860</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[recurrent-neural-network]]></category>
            <category><![CDATA[rnn]]></category>
            <category><![CDATA[neural-networks]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Tue, 19 Jun 2018 18:36:02 GMT</pubDate>
            <atom:updated>2018-06-19T21:56:00.055Z</atom:updated>
            <content:encoded><![CDATA[<p>What is a Recurrent Neural Network or RNN, how it works, where it can be used? This article tries to answer the above questions. It also shows a demo implementation of a RNN used for a specific purpose, but you would be able to generalise it for your needs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/651/1*6xj691fPWf3S-mWUCbxSJg.jpeg" /><figcaption>Recurrent Neural Network Architecture</figcaption></figure><p>Knowhow. Python, CNN knowledge is required. CNN is required to compare why and where RNN performs better than CNN? No need to understand the math. If you want to check then go back to my earlier article to check what is a CNN.</p><p>We will begin with the word use of the word “Recurrent”. Why is it called Recurrent? In english the word recurrent means:</p><blockquote>occurring often or repeatedly</blockquote><p>In the case of this type of Neural Network it’s called Recurrent since it does the same operation over and over on sets of sequential input. We will discuss about the meaning of the <em>operation</em> later in the article.</p><h3>Why do we need RNN?</h3><p>You might be wondering by now, we have vanilla networks like Convolutional ones which perform very well. Why do we need another type of a network? There is a very specific use case where RNNs are required. In order to explain RNNs you need to first understand something called a sequence. Let&#39;s talk about <strong>sequences </strong>first.</p><p>Sequence is a stream of data (finite or infinite) which are interdependent. Examples would be time series data, informative pieces of strings, conversations etc. In a conversation a sentence means something but the entire flow of the conversation mostly means something completely different. Also in a time series data like stock market data, a single tick data means the current price, but a full days data will show movement and allow us to take decision whether to buy or sell.</p><p>CNNs generally don’t perform well when the input data is interdependent in a sequential pattern. CNNs don’t have any sort of correlation between previous input to the next input. So all the outputs are self dependent. CNN takes in an input and outputs based on the trained model. If you run 100 different inputs none of them would be biased by the previous output. But imagine a scenario like sentence generation or text translation. All the words generated are dependent on the words generated before (in certain cases, it’s dependent on words coming after as well, but we will discuss that later). So you need to have some bias based on your previous output. This is where RNNs shine. RNNs have in them a sense some memory about what happened earlier in the sequence of data. This helps the system to gain context. Theoretically RNNs have infinite memory, meaning they have the capability to look back indefinitely. By look back I mean all previous inputs. But practically they can only look back a last few steps. <em>(we will discuss this later)</em></p><blockquote>Just to draw a correlation with humans in general, we also don’t take in place decisions. We also base our decisions on previous knowledge on the subject. (**over simplified, hard to say I understand even 0.1% of human brains**)</blockquote><h3>Where to use a RNN?</h3><p>RNNs can be used in a lot of different places. Following are a few examples where a lot of RNNs are used.</p><h4>1. Language Modelling and Generating Text</h4><p>Given a sequence of word, here we try to predict the likelihood of the next word. This is useful for translation since the most likely sentence would be the one that is correct.</p><h4>2. Machine Translation</h4><p>Translating text from one language to other uses one or the other form of RNN. All practical day systems use some advanced version of a RNN.</p><h4>3. Speech Recognition</h4><p>Predicting phonetic segments based on input sound waves, thus formulating a word.</p><h4>4. Generating Image Descriptions</h4><p>A very big use case is to understand what is happening inside an image, thus we have a good description. This works in a combination of CNN and RNN. CNN does the segmentation and RNN then used the segmented data to recreate the description. It’s rudimentary but the possibilities are limitless.</p><h4>5. Video Tagging</h4><p>This can be used for video search where we do image description of a video frame by frame.</p><h3>Lets Dig Deep!</h3><p>We will be following the below mentioned sequence of topics to finish the document. Each section builds on top of another so don’t read this as a reference.</p><ol><li>Feed-forward Networks</li><li>Recurrent Networks</li><li>Recurrent Neuron</li><li>Backpropagation Through Time (BPTT)</li><li>RNN Implementation</li></ol><h3>Feed-forward Networks Primer</h3><p>Feed-forward networks channel information through a series of operations which take place in each node of the network. Feed-forward networks pass the information directly through each layer exactly once. This is different from other recurrent networks. We will talk about them in a later section. Generally feed-forward nets take an input and produce an output from it. This is also mostly a supervised learning step and the outcome most likely will be a classification. It behaves similarly to how a CNN behaves. Outputs can be expected to be classes like cats or dogs as labels.</p><p>A feed-forward network is trained on a set of pre labelled data. The objective of the training phase is to reduce the error while the feed-forward network tries to guess the class. Once training is done, the weights are used to classify new batches of data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SL8FESMwzSy6QTrcIzcRYw.png" /><figcaption>A typical feed-forward network architecture</figcaption></figure><p>One important thing to note here. In a feed-forward network whatever image is shown to the classifier during test phase, it doesn’t alter the weights so the second decision is not affected. This is one very important difference between feed-forward networks and recurrent nets.</p><blockquote>Feed-forward nets don’t remember historic input data at test time unlike recurrent networks.</blockquote><p>It’s always point in time decision. They only remember things that were shown to them during the training phase.</p><h3>Recurrent Networks</h3><p>Recurrent networks, on the other hand, take as their input not just the current input example they see, but also what they have perceived previously in time.</p><p>Let’s try to build a <a href="https://en.wikipedia.org/wiki/Multilayer_perceptron">multi layer perceptron</a> to start with the explanation. In simple terms there is a input layer, a hidden layer with certain activations and finally we receive an output.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/376/1*UXDlYTeJFlbq2an7MH1HbA.jpeg" /><figcaption>A sample multi layer perceptron architecture</figcaption></figure><p>If we increase the number of layers in the above example, input layer takes the input. Then the first hidden layer does the activation passing onto the next hidden layers and so on. Finally it reaches the output layer which gives the output. Each hidden layer has its own weights and biases. Now the question is can we input to the hidden layers.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/706/1*8m2AxT3aH7bHfnnaMj8Ptw.jpeg" /></figure><p>Each layer has its own weight (W), biases (B), Activation Functions (F). These layers behave differently and technically would be challenging to merge together. To be able to merge them, lets replace all the layers with the same weights and biases. It will look something like this.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/706/1*sL5dNMry95B_u4NLbuBNKg.jpeg" /></figure><p>Now we can merge all the layers together. All the hidden layers can be combined into a single recurrent layer. So they start looking somewhat like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/364/1*EHd7wwjnogvNvH9vHLp3Uw.jpeg" /></figure><p>We will provide input to the hidden layer at each step. A recurrent neuron now stores all the previous step input and merges that information with the current step input. Thus it also captures some information regarding the correlation between current data step and the previous steps. The decision at a time step t-1 affects the decision taken at time t. This is very much like how we as humans take decisions in our life. We combine the present data with recent past to take a call on a particular problem at hand. This example is excessively rudimentary but in principle it aligns with our decision making capability. <em>This really intrigues me as to whether we as humans are intelligent or we have a very advanced neural network model. Our decisions are just the training data that we have been collecting throughout our life. Thus can we digitise our brains once we have a fairly advanced model and systems capable of storing and computing them in reasonable time periods. So what happens when we have models better and faster than our brains training on data from millions of people?</em></p><blockquote>Funny anecdote from another <a href="https://deeplearning4j.org/lstm.html">article</a>: <strong>a person is haunted by their deeds</strong></blockquote><p>Let’s come back to the problem at hand and rephrase the above explanation with an example to predict what the next letter is after a sequence of letters. Imagine in the word <strong>namaskar</strong>. The word is of 8 letters.</p><blockquote><strong>namaskar</strong>: a traditional Indian greeting or gesture of respect, made by bringing the palms together before the face or chest and bowing.</blockquote><p>If we were trying to figure out the 8th letter after 7 letters were fed to the network, what would have happened. The hidden layer would have gone through 8 iterations. If we were to unfold the network, it would be a 8 layer network, one layer for each letter. So you can imagine that a normal neural network is repeated multiple times. The number of times you unroll has a direct correlation with how far in the past it can remember. But more on this later.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_mM83sFLjzKt8cRB439Y3Q.gif" /><figcaption>how recurrent neural networks work #deeplearning4j #dl4j</figcaption></figure><h3>Recurrent Neuron</h3><p>Here we will look in more depth regarding the actual neuron that is responsible for the decision making. We will be using the <strong>namaskar </strong>example described above. We will try to figure out the 8th letter given all the previous 7 letters. Total vocabulary of the input data is {n,a,m,s,k,r}. In real world you will have more complex words or sentences. For simplicity we will use this simple vocabulary.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/364/1*EHd7wwjnogvNvH9vHLp3Uw.jpeg" /></figure><p>In the above diagram, the hidden layer or the RNN block applies a formula to the current input as well as the previous state. In this case the letter n from namaste has nothing preceding it, so we will move on to the next letter which is a. During the time of letter a and the previous state which was letter n the formula is applied by the hidden layer. We will go through the formula in a bit. Each state when an input passes the network is a time step or a step. So if at time t, the input is a, then at time t-1, the input is n. After applying the formula to both n and a, we get a new state.</p><p>The formula for the current state can be written like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/192/1*cL2HAU5Q9qcwD-LKjgPdWw.png" /></figure><p>ht is the new state and ht-1 is the previous state. xt is the input at time t. We now have a sense of the previous inputs after it has gone through the same formula from the previous time steps. We will go through 7 such inputs to the network which passes by the same weights and same function at each step.</p><p>Now let’s try to define f() in a simple fashion. We will take tanh as the activation function. The weights are defined by the matrix Whh and the input is defined by the matrix Wxh. So the formula looks like:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/346/1*rZCv_pub_2Kdzb7sqsXEsg.png" /></figure><p>The above example takes only the last step as memory and thus merging with the data of last step. To increase the memory capacity of the network, and hold longer sequences in memory, we have to add more states to the equation, like ht-2, ht-3 etc. Finally the output can be calculated as during test time:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/158/1*kBJUiDmobt-ZbzoXwUSAgw.png" /></figure><p>where yt is the output. The output is compared to the actual output and then an error value is computed. The network learns by back propagating the error via the network to update the weights. We will talk about backpropagation in the next section.</p><h3>Backpropagation Through Time (BPTT)</h3><p>This section considers that you are aware of Backpropagation as a concept. If you need to understand Backpropagation then please visit this <a href="http://cs231n.github.io/optimization-2/">link </a>to read more.</p><p>So now we understand how a RNN actually works, but how does the training actually work? How do we decide the weights for each connection? And how do we initialise these weights for these hidden units. The purpose of recurrent nets is to accurately classify sequential input. We rely on the backpropagation of error and gradient descent to do so. But a standard backpropagation like how used in feed forward networks can’t be used here.</p><p>The problem with RNNs is that they are cyclic graphs unlike feed-forward networks which are acyclic directional graphs. In feed-forward networks we could calculate the error derivatives from the layer above. In a RNN we don’t have such layering.</p><p>The answer lies in what we had discussed above. We need to unroll the network. We will unroll it and make it look like a feed-forward network.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*T1_uXU6oW4Bt5UFoaqvAiw.jpeg" /><figcaption>Unrolling a RNN</figcaption></figure><p>We take a RNN’s hidden units and replicate it for every time step. Each replication through time step is like a layer in a feed-forward network. Each time step t layer connects to all possible layers in the time step t+1. Thus we randomly initialise the weights, unroll the network and then use backpropagation to optimise the weights in the hidden layer. Initialisation is done by passing parameters to the lowest layer. These parameters are also optimised as a part of backpropagation.</p><p>An outcome of the unrolling is that each layer now starts maintaining different weights and thus end up getting optimised differently. The errors calculated w.r.t the weights are not guaranteed to be equal. So each layer can have different weights at the end of a single run. We definitely don’t want that to happen. The easy solution out is to aggregate the errors across all the layers in some fashion. We can average out the errors or even sum them up. This way we can have a single layer in all time steps to maintain the same weights.</p><h3>RNN Implementation</h3><p>Here is a sample code where we have tried to implement a RNN using Keras models. Here is the direct link to the <a href="https://gist.github.com/09aefc5231972618d2c13ccedb0e22cc.git">gist</a>. We are trying to predict the next sequence given a set of text.</p><p>This model was built by <a href="https://github.com/yashk2810/Predicting-Next-Character-using-RNN">Yash Katariya</a>. I have updated the code slightly to fit the requirements of this article. The code is commented as you go along, it’s pretty self explanatory.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HwRiD82qcfUfMJiGiEcBeA.png" /></figure><h3>Conclusion</h3><p>Well so we have come to the end of this article. What we have discussed so far is just a basic implementation of a RNN. There are so many things that we need to cover to get a full understanding on this topic. I’ll be writing a second article within a week. I will try to cover the following topics.</p><ol><li>Vanishing and Exploding Gradients</li><li>The Problem of Long-Term Dependencies</li><li>Long Short Term Memory networks(LSTM)</li><li>LSTM Gate</li><li>Bidirectional RNNs</li><li>Deep (Bidirectional) RNNs</li><li>GRU (Gated Recurrent Unit) Cells</li></ol><p>If you want me to cover things apart from this, please drop a message in the comments section. RNNs are really powerful stuff, and it is very close to how a human brain seems to work. I will be looking out for more development in this area and also am personally working on this. Any improvement I’ll surely share here. So please follow me either here on <a href="https://medium.com/@debarko">Medium</a> or on <a href="https://twitter.com/debarko">Twitter</a> to stay updated.</p><h4><strong><em>If you liked this article, please hit the 👏 button to support it. This will help other Medium users find it. </em></strong><a href="http://twitter.com/intent/tweet?text=%40debarko%20just%20released%20an%20article%20on%20%23RNN.%20It%20talks%20about%20how%20you%20can%20build%20a%20%23RecurrentNeuralNetwork%20%F0%9F%9A%80and%20its%20workings.%20%23AI%20%23ML%20%23NeuralNetworks%20%23MachineLearning%20https%3A%2F%2Fgoo.gl%2FFPPwYN"><strong><em>Share this on Twitter</em></strong></a><strong><em> to help out reach as many readers as possible.</em></strong></h4><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a9afbb00e860" width="1" height="1" alt=""><hr><p><a href="https://medium.com/hackernoon/rnn-or-recurrent-neural-network-for-noobs-a9afbb00e860">RNN or Recurrent Neural Network for Noobs</a> was originally published in <a href="https://medium.com/hackernoon">HackerNoon.com</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[What is a CapsNet or Capsule Network?]]></title>
            <link>https://medium.com/hackernoon/what-is-a-capsnet-or-capsule-network-2bfbe48769cc?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/2bfbe48769cc</guid>
            <category><![CDATA[capsule]]></category>
            <category><![CDATA[capsnet]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Wed, 01 Nov 2017 01:04:20 GMT</pubDate>
            <atom:updated>2017-11-04T08:20:12.617Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JcpMJty2A0x1ryY-ZQe2VA.png" /></figure><p><em>What is a Capsule Network? What is a Capsule? Is CapsNet better than a Convolutional Neural Network (CNN)? In this article I will talk about all the above questions about CapsNet or Capsule Network released by Hinton.</em></p><blockquote>Note: This article is not about pharmaceutical capsules. It is about Capsules in Neural Networks or Machine Learning world.</blockquote><p>There is an expectation from you as a reader. You need to be aware of CNNs. If not, I would like you to go through <a href="https://hackernoon.com/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d">this article</a> on <a href="https://hackernoon.com/">Hackernoon</a>. Next I will run through a small recap of relevant points of CNN. That way you can easily grab on to the comparison done below. So without further ado lets dive in.</p><p>CNN are essentially a system where we stack a lot of neurons together. These networks have been proven to be exceptionally great at handling image classification problems. It would be hard to have a neural network map out all the pixels of an image since it‘s computationally really expensive. So convolutional is a method which helps you simplify the computation to a great extent without losing the essence of the data. Convolution is basically a lot of matrix multiplication and summation of those results.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*qd_T1j_0dqGzsN8X-H89Pw.jpeg" /><figcaption>image 1.0: Convolutional Neural Network</figcaption></figure><p>After an image is fed to the network, a set of kernels or filters scan it and perform the convolution operation. This leads to creation of feature maps inside the network. These features next pass via activation layer and pooling layers in succession and then based on the number of layers in the network this continues. Activation layers are required to induce a sense of <a href="https://stackoverflow.com/a/9783865/2235170">non linearity</a> in the network (eg: <a href="https://github.com/Kulbear/deep-learning-nano-foundation/wiki/ReLU-and-Softmax-Activation-Functions">ReLU</a>). Pooling (eg: max pooling) helps in reducing the training time. The idea of pooling is that it creates “summaries” of each sub-region. It also gives you a little bit of positional and translational invariance in object detection. At the end of the network it will pass via a classifier like softmax classifier which will give us a class. Training happens based on back propagation of error matched against some labelled data. Non linearity also helps in solving the vanishing gradient in this step.</p><h3><strong>What is the problem with CNNs?</strong></h3><p>CNNs perform exceptionally great when they are classifying images which are very close to the data set. If the images have rotation, tilt or any other different orientation then CNNs have poor performance. This problem was solved by adding different variations of the same image during training. In CNN each layer understands an image at a much more granular level. Lets understand this with an example. If you are trying to classify ships and horses. The innermost layer or the 1st layer understands the small curves and edges. The 2nd layer might understand the straight lines or the smaller shapes, like the mast of a ship or the curvature of the entire tail. Higher up layers start understanding more complex shapes like the entire tail or the ship hull. Final layers try to see a more holistic picture like the entire ship or the entire horse. We use pooling after each layer to make it compute in reasonable time frames. But in essence it also loses out the positional data.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/812/1*0RGB8Eql5j27ujkt--yB_Q.png" /><figcaption>image 2.0: Disfiguration transformation</figcaption></figure><p>Pooling helps in creating the positional invariance. Otherwise CNNs would fit only for images or data which are very close to the training set. This invariance also leads to triggering false positive for images which have the components of a ship but not in the correct order. So the system can trigger the right to match with the left in the above image. You as an observer clearly see the difference. The pooling layer also adds this sort of invariance.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/766/1*k7cUF8V3BdiD3k7e4sHgHw.png" /><figcaption>image 2.1 Proportional transformation</figcaption></figure><p>This was never the intention of pooling layer. What the pooling was supposed to do is to introduce positional, orientational, proportional invariances. But the method we use to get this uses is very crude. In reality it adds all sorts of positional invariance. Thus leading to the dilemma of detecting right ship in image 2.0 as a correct ship. What we needed was not invariance but equivariance. Invariance makes a CNN tolerant to small changes in the viewpoint. <strong>Equivariance</strong> makes a CNN understand the rotation or proportion change and adapt itself accordingly so that the spatial positioning inside an image is not lost. A ship will still be a smaller ship but the CNN will reduce its size to detect that. This leads us to the recent advancement of Capsule Networks.</p><h3>What is a Capsule Network?</h3><p>Every few days there is an advancement in the field of Neural Networks. Some brilliant minds are working on this field. You can pretty much assume every paper on this topic is almost ground breaking or path changing. Sara Sabour, Nicholas Frost and Geoffrey Hinton released a paper titled <strong><em>“</em></strong><a href="https://arxiv.org/abs/1710.09829"><strong><em>Dynamic Routing Between Capsules</em></strong></a><strong><em>”</em></strong> 4 days back. Now when one of the Godfathers of Deep Learning “<a href="https://en.wikipedia.org/wiki/Geoffrey_Hinton">Geoffrey Hinton</a>” is releasing a paper it is bound to be ground breaking. The entire Deep Learning community is going crazy on this paper as you read this article. So this paper talks about Capsules, CapsNet and a run on MNIST. MNIST is a database of tagged handwritten digit images. Results are showing a significant increase in performance in case of overlapped digits. The paper compares to the current state-of-the-art CNNs. In this paper the authors project that human brain have modules called “capsules”. These capsules are particularly good at handling different types of visual stimulus and encoding things like pose (position, size, orientation), deformation, velocity, albedo, hue, texture etc. The brain must have a mechanism for “routing” low level visual information to what it believes is the best capsule for handling it.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/961/1*P1y-bAF1Wv9-EtdQcsErhA.png" /><figcaption>image 3.0: CapsNet Architecture</figcaption></figure><p>Capsule is a nested set of neural layers. So in a regular neural network you keep on adding more layers. In CapsNet you would add more layers inside a single layer. Or in other words nest a neural layer inside another. The state of the neurons inside a capsule capture the above properties of one entity inside an image. A capsule outputs a vector to represent the existence of the entity. The orientation of the vector represents the properties of the entity. The vector is sent to all possible parents in the neural network. For each possible parent a capsule can find a prediction vector. Prediction vector is calculated based on multiplying it’s own weight and a weight matrix. Whichever parent has the largest scalar prediction vector product, increases the capsule bond. Rest of the parents decrease their bond. This <strong>routing by agreement </strong>method is superior than the current mechanism like max-pooling. Max pooling routes based on the strongest feature detected in the lower layer. Apart from dynamic routing, CapsNet talks about adding squashing to a capsule. Squashing is a non-linearity. So instead of adding squashing to each layer like how you do in CNN, you add the squashing to a nested set of layers. So the squashing function gets applied to the vector output of each capsule.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/628/1*CCxgEjhlsXyui4PaUJQxpw.png" /><figcaption>image 3.1: Novel Squashing Function</figcaption></figure><p>The paper introduces a new squashing function. You can see it in image 3.1. ReLU or similar non linearity functions work well with single neurons. But the paper found that this squashing function works best with capsules. This tries to squash the length of output vector of a capsule. It squashes to 0 if it is a small vector and tries to limit the output vector to 1 if the vector is long. The dynamic routing adds some extra computation cost. But it definitely gives added advantage.</p><p>Now we need to realise that this paper is almost brand new and the concept of capsules is not throughly tested. It works on MNIST data but it still needs to be proven against much larger dataset across a variety of classes. There are already (within 4 days) updates on this paper who raise the following concerns:</p><blockquote>1. It uses the length of the pose vector to represent the probability that the entity represented by a capsule is present. To keep the length less than 1 requires an unprincipled non-linearity that prevents there from being any sensible objective function that is minimized by the iterative routing procedure.</blockquote><blockquote>2. It uses the cosine of the angle between two pose vectors to measure their agreement for routing. Unlike the log variance of a Gaussian cluster, the cosine is not good at distinguishing between quite good agreement and very good agreement.</blockquote><blockquote>3. It uses a vector of length n rather than a matrix with n elements to represent a pose, so its transformation matrices have n 2 parameters rather than just n.</blockquote><p>The current implementation of capsules has scope for improvement. But we should also keep in mind that the Hinton paper in the first place only says:</p><blockquote>The aim of this paper is not to explore this whole space but to simply show that one fairly straightforward implementation works well and that dynamic routing helps.</blockquote><p>So that’s a lot of theory. Lets have some fun and build a CapsNet. I will take you through some code to setup a basic CapsNet for MNIST data. I will comment inside the code so you can follow through line by line and get an understanding of how it works. I will take you through two important pieces in the code. Rest you can go to the repo, fork it and start working on it:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a6a6fd093feada41922956aa8673ebc2/href">https://medium.com/media/a6a6fd093feada41922956aa8673ebc2/href</a></iframe><p>The above is the entire Capsule layer. This is now stacked to created a Capsule Network. Code for CapsNet is below:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f16430ad59165a58c4e2cb62cb4c2b55/href">https://medium.com/media/f16430ad59165a58c4e2cb62cb4c2b55/href</a></iframe><p>The entire code along with the training and evaluation module is present <a href="https://github.com/debarko/CapsNet-Tensorflow">here</a>. It’s under <a href="https://github.com/debarko/CapsNet-Tensorflow/blob/master/LICENSE">Apache 2.0 License</a>. You can use it freely. I want to give credits for the code to <a href="https://github.com/naturomics">naturomics</a>.</p><h3>Summary</h3><p>So we went through what is a CapsNet and how they are built. We tried to understand that capsules are nothing but nested neural layers on a high level. We also looked at how a CapsNet delivers rotational and other invariances. It does that being equivariant to the spatial setup of the each entity inside an image. I am sure there are still questions to be answered. Capsules and their best implementation is probably the biggest question. But this post is an initial push in trying to throw some light on the topic. If you have any queries please do share them. I will answer them to the best of my knowledge.</p><blockquote><a href="https://twitter.com/sirajraval">Siraj Raval</a> and his talks greatly influence this article. Share this article on <a href="http://bit.ly/2z1z6RU">Twitter</a>. Do follow me on <a href="http://twitter.com/debarko">twitter</a> for future updates. If you liked this article, please hit the 👏 button to support it. This will help other Medium users find it. <a href="http://bit.ly/2z1z6RU">Share this article on Twitter so that others can read it</a>.</blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2bfbe48769cc" width="1" height="1" alt=""><hr><p><a href="https://medium.com/hackernoon/what-is-a-capsnet-or-capsule-network-2bfbe48769cc">What is a CapsNet or Capsule Network?</a> was originally published in <a href="https://medium.com/hackernoon">HackerNoon.com</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Deep Learning for Noobs [Part 2]]]></title>
            <link>https://medium.com/hackernoon/deep-learning-for-noobs-part-2-43d5098e61f6?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/43d5098e61f6</guid>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[tutorial]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Tue, 14 Feb 2017 22:15:59 GMT</pubDate>
            <atom:updated>2017-07-16T19:42:22.063Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rL1yoYgk66hhHJmdwTVAog.jpeg" /></figure><p>How you can setup your own Convolutional Neural Network? Lets try to solve that in this article. We will be working on a Image Segmentation problem which I discussed in the <a href="https://hackernoon.com/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d#.9mmzimdgf">first part of this series</a>.</p><p>There are a lot of libraries available for creating a Convolutional Neural Network. We will choosing <a href="https://keras.io/">Keras</a> and <a href="https://www.tensorflow.org/">Tensorflow</a>. First question that comes to mind is:</p><blockquote>Why these two specifically? Why not just Tensorflow?</blockquote><p>In the Machine Learning library space there are a lot of libraries. Tensorflow, Theano, PyTorch, Caffe and Torch are few of the notable ones. A big shoutout to <a href="http://pytorch.org/">PyTorch</a> by <a href="https://medium.com/u/45e1eae66802">Soumith Chintala</a> and team. You guys created an awesome library. Hopefully you guys will will take over the world (**evil grin**).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*kok8tc3A2s8UP2QTrZ7RlA.png" /><figcaption>PyTorch planning to take over the world :P <a href="https://medium.com/u/ac9d9a35533e">Andrej Karpathy</a> has high hopes for Tensorflow</figcaption></figure><p>All are low level libraries. The involve in GPU or CPU accelerations and optimisations on matrix computations. So building networks using them might become challenging. Keras is a high level library. It helps you create neuron layers. It abstract all the complexities of implementing the calculation. Keras works with either Theano or Tensorflow as a backend. I chose Tensorflow as the backend since it has a better community support.</p><blockquote>KEras &amp; TEnsorflow (KETE) combo rocks.</blockquote><h3>Installation</h3><p>Lets get our hands dirty. Don’t think about where you can do it. Your regular systems will die while training the datasets. So lets get a AWS server. If you have an insane gaming rig then feel free to go ahead in setting it up on local. We will be using a g2.2xLarge system from AWS. It has 26 GPU Cores and costs USD 0.65 / hr. Why did we choose this. It is because this is the cheapest GPU system that is available over cloud. It will perform better than most of the available hardwares at our house. Next up is which OS to use. Definitely it makes sense to use Ubuntu 16.04 LTS but wait a min. We will be using a pre-baked AMI which has a lot of tools built in. This way we can do away with most of the setup. Search for Deep Learning AMI from AWS. There are other good AMIs on Deep Learning as well, feel free to explore. We need Python 2.7 and Tensorflow installed at least in the AMIs.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mc1KOPU7j5eUowBdvg2iPw.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vVfo8m7-cKD8sUbuz65-_Q.png" /><figcaption>GPU instance and Deep Learning AMI on AWS</figcaption></figure><p>After you have selected the instance type and AMI go ahead to create a key. You can use a pre built key if you already have one. For this article, we will be creating one. Assume the name of the key file is deepkey.pem. Download the key and keep it safe at some place. Launch the instance. It will take like 5- 10 mins to create the instance. In the mean time, change the permission of the key to 400. Otherwise ssh will not let you login.</p><pre>chmod 400 ~/deepkey.pem</pre><p>Next go to the list view of EC2 instances. From there select the instance that got created. Copy the AWS instance Public DNS. It will look something like this <strong>ec2–52–24–183–62.us-west-2.compute.amazonaws.com</strong></p><pre><strong># Next lets login to the system<br></strong>ssh ec2-user@ec2–52–24–183–62.us-west-2.compute.amazonaws.com -i ~/deepkey.pem</pre><pre><strong># The AMI might be a bit backdated, so it&#39;s always better to update</strong><br>sudo yum update</pre><pre><strong># Install pip to get Keras</strong><br>sudo yum install python-pip</pre><pre><strong># Upgrade the pip master that got installed<br></strong>sudo /usr/local/bin/pip install — upgrade pip</pre><pre><strong># Install Keras</strong><br>sudo /usr/local/bin/pip install keras</pre><p>By default Keras gets installed with Theano as the base config. We are going to use Tensorflow. So lets change that. Open <strong>~/.keras/keras.conf </strong>and update as shown below. The file should look like the section below.</p><pre><strong>{<br>“image_dim_ordering”: “tf”,<br>“epsilon”: 1e-07,<br>“floatx”: “float32”,<br>“backend”: “tensorflow”<br>}</strong></pre><p>I hope you have followed through all the steps without errors. Lets test our installation. Open python and then import keras to test it out. The output should look something like below.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SHNwlXz8I-Up6W9pDwXz4w.png" /><figcaption>Test Keras installation</figcaption></figure><p>So now you have Python, Tensorflow and Keras installed. The AMI also gives Theano and other stuff pre installed but we are not going to use them. Don’t bother to uninstall since they won’t interfere. Enough of installation, lets dig in to code.</p><blockquote>Don’t waste time installing, spend time on learning and implementing.</blockquote><p>We are going to train a network which we can use to classify the dogs and cats from Kaggle. Before that we will start writing a simple model. This will help you get an understanding of how Keras works. I’ll start with the code. If you notice, there are comments before each line in the code. These comments explain to some extent what is happening in that particular line of code. To run this code, you can either use your own set of cats and dogs or you can download the sample data from Kaggle. You would have to signup and join the Kaggle competition to be able to download the sample data. Here is the <a href="https://www.kaggle.com/c/dogs-vs-cats/data">Kaggle link</a>.</p><pre>from keras.preprocessing.image import ImageDataGenerator<br>from keras.models import Sequential<br>from keras.layers import Convolution2D, MaxPooling2D<br>from keras.layers import Activation, Dropout, Flatten, Dense<br><br><br># expected image size<br>img_width, img_height = 150, 150</pre><pre># folder containing the images on which<br># the network will train. The train folder <br># has two sub folders, dogs and cats.<br>train_data_dir = &#39;data/train&#39;</pre><pre># folder containing the validation samples<br># folder structure is same as the training folder<br>validation_data_dir = &#39;data/validation&#39;</pre><pre># how many images to be considered for training<br>train_samples = 2000</pre><pre># how many images to be used for validation<br>validation_samples = 800</pre><pre># how many runs will the network make<br># over the training set before starting on<br># validation<br>epoch = 50</pre><pre># ** Model Begins **<br>model = Sequential()<br>model.add(Convolution2D(32, 3, 3, input_shape=(3, img_width, img_height)))<br>model.add(Activation(&#39;relu&#39;))<br>model.add(MaxPooling2D(pool_size=(2, 2)))<br><br>model.add(Convolution2D(32, 3, 3))<br>model.add(Activation(&#39;relu&#39;))<br>model.add(MaxPooling2D(pool_size=(2, 2)))<br><br>model.add(Convolution2D(64, 3, 3))<br>model.add(Activation(&#39;relu&#39;))<br>model.add(MaxPooling2D(pool_size=(2, 2)))<br><br>model.add(Flatten())<br>model.add(Dense(64))<br>model.add(Activation(&#39;relu&#39;))<br>model.add(Dropout(0.5))<br>model.add(Dense(1))<br>model.add(Activation(&#39;sigmoid&#39;))<br># ** Model Ends **</pre><pre>model.compile(loss=&#39;binary_crossentropy&#39;,<br>              optimizer=&#39;rmsprop&#39;,<br>              metrics=[&#39;accuracy&#39;])<br><br># this is the augmentation configuration we will use for training<br># we are generating a lot of transformed images so that the model<br># can handle variety in the real world scenario<br>train_datagen = ImageDataGenerator(<br>        rescale=1./255,<br>        shear_range=0.2,<br>        zoom_range=0.2,<br>        horizontal_flip=True)<br><br># this is the augmentation configuration we will use for testing:<br># only rescaling<br>test_datagen = ImageDataGenerator(rescale=1./255)</pre><pre># this section is actually taking images from the folder<br># and passing on to the ImageGenerator which then<br># creates a lot of transformed versions<br>train_generator = train_datagen.flow_from_directory(<br>        train_data_dir,<br>        target_size=(img_width, img_height),<br>        batch_size=32,<br>        class_mode=&#39;binary&#39;)<br><br>validation_generator = test_datagen.flow_from_directory(<br>        validation_data_dir,<br>        target_size=(img_width, img_height),<br>        batch_size=32,<br>        class_mode=&#39;binary&#39;)</pre><pre># this is where the actual processing happens<br># it will take some time to run this step.<br>model.fit_generator(<br>        train_generator,<br>        samples_per_epoch=train_samples,<br>        nb_epoch=epoch,<br>        validation_data=validation_generator,<br>        nb_val_samples=validation_samples)<br><br>model.save_weights(&#39;trial.h5&#39;)</pre><p>The code is pretty self explanatory. Replace the section between “Model Beings” and “Model Ends” to use other models. You will have your very own classifier code. I will walk you guys through the code. First you import a few Keras dependencies. Then you define the image dimensions that will pass to the network. After that you tell the code where the image sets are. Both training dataset and validation dataset. After that you build the model from where the model start beings, till the model end. I am not going into the depth of the model as this is a standard VGGNet implementation. Details about the network architecture can be found in the following arXiv paper:</p><pre>Very Deep Convolutional Networks for Large-Scale Image Recognition<br>K. Simonyan, A. Zisserman<br>arXiv:1409.1556</pre><p>Next up in the code is generating few transforms of the data. Here you would shear, stretch, skew the dataset so that the network doesn’t get overtrained. You create generators so that the code can read images from the specified folders. After that the processing starts. The system does the training and validation for the number of epoch times mentioned. Finally we save these weights so that we can use them in future without having to train the network all over again. If you have further doubts, please highlight and ask questions. I’ll try to answer them to the best of my knowledge.</p><p>The above model is a simple one and is there only for the sake of simpler explanation. Cat and dog classification might not be that successful with the amount of data we have. So we have go for transfer learning. In Transfer learning we work on models which we train for solving similar statements. We take the trained weights and reuse them to solve a different statement altogether. We train models which we pre-train on images to classify different things. Why does this work? It is because the model that we are going to use is also something which was trained to do image classification. The layers deep inside will always be able to classify generically. These will be working at the level of detecting edges and curves. Thus the term transfer learning. Where you transfer learning from a problem statement into another one. This might work good for us. But we can make it work better. Next we train the top layers. These layers actually bother about the actual elements getting classified. We train them on our training dataset. We can call this dataset, domain specific. This gives an understanding to the network, exactly what we want to classify. So the code goes as follows:</p><pre>import os<br>import h5py<br>import numpy as np<br>from keras.preprocessing.image import ImageDataGenerator<br>from keras import optimizers<br>from keras.models import Sequential<br>from keras.layers import Convolution2D, MaxPooling2D, ZeroPadding2D<br>from keras.layers import Activation, Dropout, Flatten, Dense<br><br># path to the model weights files.<br>weights_path = &#39;vgg16_weights.h5&#39;<br>top_model_weights_path = &#39;fc_model.h5&#39;<br># dimensions of our images.<br>img_width, img_height = 150, 150<br><br>train_data_dir = &#39;data/train&#39;<br>validation_data_dir = &#39;data/validation&#39;<br>nb_train_samples = 2000<br>nb_validation_samples = 800<br>nb_epoch = 50<br><br># build the VGG16 network<br>model = Sequential()<br>model.add(ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height)))<br><br>model.add(Convolution2D(64, 3, 3, activation=&#39;relu&#39;, name=&#39;conv1_1&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(64, 3, 3, activation=&#39;relu&#39;, name=&#39;conv1_2&#39;))<br>model.add(MaxPooling2D((2, 2), strides=(2, 2)))<br><br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(128, 3, 3, activation=&#39;relu&#39;, name=&#39;conv2_1&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(128, 3, 3, activation=&#39;relu&#39;, name=&#39;conv2_2&#39;))<br>model.add(MaxPooling2D((2, 2), strides=(2, 2)))<br><br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(256, 3, 3, activation=&#39;relu&#39;, name=&#39;conv3_1&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(256, 3, 3, activation=&#39;relu&#39;, name=&#39;conv3_2&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(256, 3, 3, activation=&#39;relu&#39;, name=&#39;conv3_3&#39;))<br>model.add(MaxPooling2D((2, 2), strides=(2, 2)))<br><br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(512, 3, 3, activation=&#39;relu&#39;, name=&#39;conv4_1&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(512, 3, 3, activation=&#39;relu&#39;, name=&#39;conv4_2&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(512, 3, 3, activation=&#39;relu&#39;, name=&#39;conv4_3&#39;))<br>model.add(MaxPooling2D((2, 2), strides=(2, 2)))<br><br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(512, 3, 3, activation=&#39;relu&#39;, name=&#39;conv5_1&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(512, 3, 3, activation=&#39;relu&#39;, name=&#39;conv5_2&#39;))<br>model.add(ZeroPadding2D((1, 1)))<br>model.add(Convolution2D(512, 3, 3, activation=&#39;relu&#39;, name=&#39;conv5_3&#39;))<br>model.add(MaxPooling2D((2, 2), strides=(2, 2)))<br><br># load the weights of the VGG16 networks<br># (trained on ImageNet, won the ILSVRC competition in 2014)<br># note: when there is a complete match between your model definition<br># and your weight savefile, you can simply call model.load_weights(filename)<br>assert os.path.exists(weights_path), &#39;Model weights not found (see &quot;weights_path&quot; variable in script).&#39;<br>f = h5py.File(weights_path)<br>for k in range(f.attrs[&#39;nb_layers&#39;]):<br>    if k &gt;= len(model.layers):<br>        # we don&#39;t look at the last (fully-connected) layers in the savefile<br>        break<br>    g = f[&#39;layer_{}&#39;.format(k)]<br>    weights = [g[&#39;param_{}&#39;.format(p)] for p in range(g.attrs[&#39;nb_params&#39;])]<br>    model.layers[k].set_weights(weights)<br>f.close()<br>print(&#39;Model loaded.&#39;)<br><br># build a classifier model to put on top of the convolutional model<br>top_model = Sequential()<br>top_model.add(Flatten(input_shape=model.output_shape[1:]))<br>top_model.add(Dense(256, activation=&#39;relu&#39;))<br>top_model.add(Dropout(0.5))<br>top_model.add(Dense(1, activation=&#39;sigmoid&#39;))<br><br># note that it is necessary to start with a fully-trained<br># classifier, including the top classifier,<br># in order to successfully do fine-tuning<br>top_model.load_weights(top_model_weights_path)<br><br># add the model on top of the convolutional base<br>model.add(top_model)<br><br># set the first 25 layers (up to the last conv block)<br># to non-trainable (weights will not be updated)<br>for layer in model.layers[:25]:<br>    layer.trainable = False<br><br># compile the model with a SGD/momentum optimizer<br># and a very slow learning rate.<br>model.compile(loss=&#39;binary_crossentropy&#39;,<br>              optimizer=optimizers.SGD(lr=1e-4, momentum=0.9),<br>              metrics=[&#39;accuracy&#39;])<br><br># prepare data augmentation configuration<br>train_datagen = ImageDataGenerator(<br>        rescale=1./255,<br>        shear_range=0.2,<br>        zoom_range=0.2,<br>        horizontal_flip=True)<br><br>test_datagen = ImageDataGenerator(rescale=1./255)<br><br>train_generator = train_datagen.flow_from_directory(<br>        train_data_dir,<br>        target_size=(img_height, img_width),<br>        batch_size=32,<br>        class_mode=&#39;binary&#39;)<br><br>validation_generator = test_datagen.flow_from_directory(<br>        validation_data_dir,<br>        target_size=(img_height, img_width),<br>        batch_size=32,<br>        class_mode=&#39;binary&#39;)<br><br># fine-tune the model<br>model.fit_generator(<br>        train_generator,<br>        samples_per_epoch=nb_train_samples,<br>        nb_epoch=nb_epoch,<br>        validation_data=validation_generator,<br>        nb_val_samples=nb_validation_samples)</pre><p>The weights for VGG16 can be acquired from my <a href="https://gist.github.com/debarko/6b1983ec3dd0403321082d07ddfea17c#file-readme-md">Github</a> gist. You can also get the fc_model weight file by running this <a href="https://gist.github.com/fchollet/f35fbc80e066a49d65f1688a7e99f069">piece of code</a> on your dataset. You can use the same set of weights from the VGG16 link shared. You can tweak the number of <a href="http://stackoverflow.com/a/31157729">epoch</a> to get a better learning, but don’t go overboard as that might lead to <a href="http://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/">overfitting</a>. I have been using this technique on a lot of practical use cases at my <a href="http://www.practo.com/">workplace</a>. One use case is distinguishing between prescriptions and non prescriptions. We use the exact same model trained on ImageNet data of cats and dogs to classify prescriptions. I hope you guys can use it on practical cases in the real world. Do respond about any interesting case that you have solved using this method.</p><p>This article takes content heavily from a <a href="https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html">blog post</a> from Keras. Do follow me on <a href="http://twitter.com/debarko">twitter</a> and you can also signup for a small and infrequent <a href="http://debarko.de">mailing list</a> that I maintain. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.</p><figure><a href="http://bit.ly/HackernoonFB"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0hqOaABQ7XGPT-OYNgiUBg.png" /></a></figure><figure><a href="https://goo.gl/k7XYbx"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Vgw1jkA6hgnvwzTsfMlnpg.png" /></a></figure><figure><a href="https://goo.gl/4ofytp"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gKBpq1ruUi0FVK2UM_I4tQ.png" /></a></figure><blockquote><a href="http://bit.ly/Hackernoon">Hacker Noon</a> is how hackers start their afternoons. We’re a part of the <a href="http://bit.ly/atAMIatAMI">@AMI</a> family. We are now <a href="http://bit.ly/hackernoonsubmission">accepting submissions</a> and happy to <a href="mailto:partners@amipublications.com">discuss advertising &amp; sponsorship</a> opportunities.</blockquote><blockquote>If you enjoyed this story, we recommend reading our <a href="http://bit.ly/hackernoonlatestt">latest tech stories</a> and <a href="https://hackernoon.com/trending">trending tech stories</a>. Until next time, don’t take the realities of the world for granted!</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*35tCjoPcvq6LbB3I6Wegqw.jpeg" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=43d5098e61f6" width="1" height="1" alt=""><hr><p><a href="https://medium.com/hackernoon/deep-learning-for-noobs-part-2-43d5098e61f6">Deep Learning for Noobs [Part 2]</a> was originally published in <a href="https://medium.com/hackernoon">HackerNoon.com</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Engineering hyperlocal deliveries in India]]></title>
            <link>https://medium.com/practo-engineering/engineering-hyperlocal-deliveries-in-india-8c45f717f8e6?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/8c45f717f8e6</guid>
            <category><![CDATA[hyperlocal]]></category>
            <category><![CDATA[practo]]></category>
            <category><![CDATA[ecommerce]]></category>
            <category><![CDATA[polygon]]></category>
            <category><![CDATA[tech]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Mon, 06 Feb 2017 20:24:16 GMT</pubDate>
            <atom:updated>2017-07-21T20:13:36.428Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*SM8Z0SuFE-J_gkzznLFrCg.png" /></figure><h4>Practo has started home delivery of medicines and other health-care essentials. This is Practo’s first step into the e-commerce and hyperlocal domain. We’ve launched this service in Bangalore with a key focus on on-time delivery.</h4><p>On-time delivery is possible only if we maintain discipline from the time the order is placed till the time of delivery. We measure and optimize every step to the fullest. Our pharmacists get involved in every step of the transaction. This includes verification of prescription to the delivery of the drugs.</p><h3>How does it work?</h3><p>Our process begins the moment someone places the order and ends on delivery. Our system understands all the complexities of the real world. It mimics the state of the order in the lifecycle and also tracks exactly who is handling the order now. We need to understand the different parties involved in the lifecycle of an order. These entities are shops, suppliers, distributers, Family Pharmacists (FP), Zonal Pharmacists (ZP). One more important aspect is the location tracking of any order. We implemented polygons for this reason.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*HtA7xrqvE-or2378UFOmJQ.png" /><figcaption>Polygon Creation.</figcaption></figure><h3>Polygons</h3><p>Let’s take the example of Bengaluru, where we started our operations. We divide the entire city into a multitude of polygons, which are either hand drawn or loaded via KML data. So in our system assume entire JP Nagar will be a polygon. The image on left shows how a polygon can be drawn from our interface. Each polygon needs a at least of 1 supplier, 1 ZP and 1 FP to be functional. Once we finish resource allocation we can switch on a polygon as serviceable. Note here that our polygons are not active as soon as we add the polygons to the map. This basic check prevents our system from taking in orders where we can’t service them. Currently, we do a serviceability check from the client end before placing the order. In the serviceability check, we run a point in polygon algorithm on all the polygons in our system.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*78E0anhmQ7-A6lcUhh3tzg.png" /><figcaption>Currently available slots in JP Nagar 7th Phase, Bengaluru, India</figcaption></figure><h3>Slots</h3><p>Each resource (aka ZP, FP etc) have a capacity slot definition. We have divided each day into 24 slots taking 1 hour each. Each resource has a working slot schedule on a weekly basis. So we can set working hours for each resource in our system for each specific day of the week. We also handle overriding slots at certain cases when we want to keep our services down (eg: Sunday). Our system lets us declare holidays for either full day or any specific part of a day. Above all our systems allows us to declare holidays for specific polygons or a set of polygons. The image of the left shows the currently available slots in an area. The strike out ones have already been filled up with orders. We also group slots so that it becomes easy for the user to understand.</p><h3>Load</h3><p>System assigns a max capacity of each resource for a single slot. If any of the resources, cross the limits then for that slot it becomes unserviceable.</p><h3>Clients</h3><p>Front end clients, can figure out whether we service in that location or not. If we service we will also store a set of slots in which we can deliver the supplies. Based on the user selected slot we deliver the supplies. In most cases a client will follow the below mentioned steps:</p><ol><li>Fetch serviceability based on latitude and longitude</li><li>The client sends a lat and long value to the server. We run a point in polygon search algorithm using MySQL. It lets us know exactly which polygon does this client fall into. The server responds back with a serviceability response either true or false. This signifies whether we service in that location or not. Along with the response we also send back a polygon id for future use.</li><li>Add drugs and documents</li><li>Add flat number and landmark</li><li>Fetch available deliverable slots</li><li>The client makes a request to the slots API which is exposed to authenticated users only. While making a request the client sends a UUID, slot id and also sends us the polygon id which we had sent back while serviceability check. In our server, we temporarily lock a slot for that UUID for X amount of minutes. Locking is defined as temporarily holding one FP for that X minutes who was assigned to that polygon id. Let’s assume each FP has a max limit of 3 deliveries per slot, then once one UUID locks one FP that means now that FP is only left with 2 deliveries in that slot. We make an entry in Memcache with a TTL of X minutes which helps us put in place this locking mechanism. This X minute is configurable till the level of per polygon. After X minutes the lock is lifted and that slot is again available for someone else. If an order is placed during this X minutes for that slot then we persist the slot entry to our primary MySQL database. While placing the order the client needs to send the same UUID, based on which we identify whether there is a slot already locked or not.</li><li>Finally, post the entire order along with the same UUID</li><li>This entire system helps us keep track of load in a particular polygon and also the performance of our FPs (delivery agents). We also maintain a legacy system which helps us to auto schedule an order to the first available slot in a particular polygon when orders come in without UUID. This part of the system is maintained for older clients which we had released in the very early stages.</li></ol><p>In my last words, I would like to say that this system is nowhere close to being in a final state and we have already started work on the next version of this system. We will phase this out within the end of this month. If you have any doubts, drop a DM on twitter at <a href="http://twitter.com/debarko">@debarko</a></p><p>Follow us on <a href="https://twitter.com/practodev">twitter</a> for regular updates. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=8c45f717f8e6" width="1" height="1" alt=""><hr><p><a href="https://medium.com/practo-engineering/engineering-hyperlocal-deliveries-in-india-8c45f717f8e6">Engineering hyperlocal deliveries in India</a> was originally published in <a href="https://medium.com/practo-engineering">Practo Engineering</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Deep Learning for Noobs [Part 1]]]></title>
            <link>https://medium.com/hackernoon/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/9f831b6d430d</guid>
            <category><![CDATA[tutorial]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[debarko-de]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[deep-learning]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Sun, 05 Feb 2017 11:26:37 GMT</pubDate>
            <atom:updated>2017-07-15T22:11:10.464Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nEK-e-AlS5n3q9T1Y29VRA.jpeg" /></figure><p>Deep Learning has been on the rise for some time. Recently people have started using Deep Learning in many fields. If you directly want to dive in to the code then go to the <a href="https://medium.com/@debarko/deep-learning-for-noobs-part-2-43d5098e61f6#.csd7uhg52">2nd Part</a>.</p><p>In this series, you will learn to solve a simple problem of detecting a single object (like cat or dog) in an image. In the course of this solution, you will learn about one type of Deep Learning. You will also be able to code in Keras and Tensorflow, two of the famous libraries in this technology. I am not going to talk about the maths behind Deep Learning. The series has two parts. The first part talks about Deep Learning in basics and the gotchas. In the second part of the series, we will be looking at how to create your own models in Keras.</p><p>Before we begin, I’ll introduce myself. I am a Computer Science Engg, currently working @ Practo. Earlier I have worked on games on the Facebook platform <em>(when it used to be a thing)</em> and later on mobile games.</p><blockquote>So what is Deep Learning? Why is it called Deep? Is the system actually learning?</blockquote><p>Let’s start with a bit of history. Deep Learning is the latest cool word for Neural Networks and they have been around from the 60’s. If you don’t know what is a Neural Network, then don’t bother, I’ll explain in later part of this article. Around 2006 a brilliant guy called <em>Geoffrey Hinton</em> along with others came up with a paper. That paper had an interesting implementation of one type of Neural Network. In 2012 two of Hinton’s students won a competition (ILSVRC) by twice the margin from it’s nearest competitors. This showed the entire world that Hinton’s work can solve very interesting problems.</p><p>We are trying to solve <strong>Image Classification</strong> as a problem. By <em>classification</em> what we are trying to do is take an image and try to understand what is the content in that image. The current scope limits the solution to work on images which only have one type of object. Either the image will be a cat or a dog. For simplicity’s sake, we are currently not going to classify images which have a dog sitting in a car.</p><p>In a Neural Network, there are n-number of neurons and they interconnect with each other in a linear way. An input image passes from the input end and the network decides the class as an output. Training of a network means passing a lot of images of various classes as inputs. Each of these images is already tagged to one of the classes.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*N4h1SgwbWNmtrRhszM9EJg.png" /><figcaption>Basic figure depicting cross section of a Convolutional Neural Network</figcaption></figure><p>Neural Network is a simple mathematical formula which looks something like this:</p><blockquote>x * w = y</blockquote><p>Assume <em>x</em> is your input image and <em>y</em> is some output which is the network defined class. <em>x</em> is constant because there is only a fixed set of images. Network gives <em>y</em> as the output. We can only change <em>w</em>. We call the <em>w </em>as the weight of a single neuron layer. The process of training consists of two parts, forward pass and backpropagation. In forward pass we give images to the network as input (<em>x</em>) and the network generates some <em>y’ </em>output class. How close y’ is to y is the error of the network. In backpropagation, the network tries to diminish the error by tweaking the weight <em>w</em>. A lot of lingo calls <em>w</em> as hyper parameter, kernel, filter. The problem with neural networks is that all the layers pass the entire data from one layer to the other layer. To solve this we are going to use Convolutional Neural Networks. So what is convolution? Let’s see that below.</p><h3>Convolutional Layer</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/526/1*ZCjPUFrB6eHPRi4eyP6aaA.gif" /><figcaption>Convolution Layer</figcaption></figure><p>Neural Networks are fully connected, which means that one neuron layer would pass the entire dataset to the next layer. The next layer would process the entire data and so on and so forth. This works for simple images like 8x8 or even a 36x36 images. But practical images are 1024x768 in size then it becomes a huge computational task. Images are generally stationary in nature. That means the statistics of one part of the image is same as any other part. So a feature learnt in one zone can do similar pattern matching in another zone. In a big image, we take a small section and pass it through all the points in the big image. While passing at any point we convolve or join them into a single position. Instead, try to imagine that a big box of data becomes a small box of data for the next layer of neuron. This helps faster computation without loss of precision of data. Each small section of the image that passes over the big image converts into a small filter. The filters are later configured based on the back propagation data (we will come to that in a bit).</p><h3>Pooling Layer</h3><p>Next up is pooling. Pooling is nothing other than down sampling of an image. It again helps the processor to process things faster. There are many pooling techniques. One is max pooling where we take largest of the pixel values of a segment. Mean pooling, Avg pooling are also done. Instead of the largest pixel, we calculate mean and avg. Pooling makes the network invariant to translations in shape, size and scale. Max pooling is generally predominant.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/314/1*TUiAh2gWmzdumZKFYa4Vbw.png" /><figcaption>A simple example of Max Pooling, where we are taking the largest pixel value in each coloured square.</figcaption></figure><h3>Activation Layer</h3><p>A single neuron behaves as a linear classifier. A neuron has the capacity to switch on or switch off based on certain consecutive sections of input data. We call this property of a neuron, activation. Activation functions are mathematical functions which behave very much like valves. Assume there is a valve which opens when there is a good amount of pressure like a pressure cooker. Data which makes an activation function turn true marks the neuron as active. We classify an image based on which all neurons in the network got activated. There are many activation functions, but ReLu is the most famous of them. Why you choose ReLu is out of the scope of this document. I will soon write another article which talks about different Activations functions.</p><h3>Backpropagation</h3><p>Back propagation is the process in which we try to bring the error down. By error, I mean the difference in <em>y </em>and<em> y’.</em> This will help <em>w</em>, to fit the data set that we gave to the network. We perform Back propagation using Gradient descent process. This process tries to bring the error value close to zero.</p><h3>What’s Next?</h3><p>Above literature is pretty much enough for starting to work on applied #CNNs. As and when you will get stuck in the implementation phase, you can read more about that particular topic. Leave back questions in the comments section and I will address them. This brings us to the end of this part of the series. Second part of this article is finished and you can find the link below.</p><p>You can find the second part of the series at this <a href="https://medium.com/@debarko/deep-learning-for-noobs-part-2-43d5098e61f6#.csd7uhg52">link</a>. Do follow me on <a href="http://twitter.com/debarko">twitter</a> and you can also signup for a small and infrequent <a href="http://debarko.de">mailing list</a> that I maintain. If you liked this article, please hit the ❤ button to recommend it. This will help other Medium users find it.</p><figure><a href="http://bit.ly/HackernoonFB"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*0hqOaABQ7XGPT-OYNgiUBg.png" /></a></figure><figure><a href="https://goo.gl/k7XYbx"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Vgw1jkA6hgnvwzTsfMlnpg.png" /></a></figure><figure><a href="https://goo.gl/4ofytp"><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gKBpq1ruUi0FVK2UM_I4tQ.png" /></a></figure><blockquote><a href="http://bit.ly/Hackernoon">Hacker Noon</a> is how hackers start their afternoons. We’re a part of the <a href="http://bit.ly/atAMIatAMI">@AMI</a> family. We are now <a href="http://bit.ly/hackernoonsubmission">accepting submissions</a> and happy to <a href="mailto:partners@amipublications.com">discuss advertising &amp; sponsorship</a> opportunities.</blockquote><blockquote>If you enjoyed this story, we recommend reading our <a href="http://bit.ly/hackernoonlatestt">latest tech stories</a> and <a href="https://hackernoon.com/trending">trending tech stories</a>. Until next time, don’t take the realities of the world for granted!</blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*35tCjoPcvq6LbB3I6Wegqw.jpeg" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9f831b6d430d" width="1" height="1" alt=""><hr><p><a href="https://medium.com/hackernoon/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d">Deep Learning for Noobs [Part 1]</a> was originally published in <a href="https://medium.com/hackernoon">HackerNoon.com</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Deep Learning in India]]></title>
            <link>https://medium.com/@debarko/deep-learning-in-india-289714a55d7d?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/289714a55d7d</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[artificial-intelligence]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[google]]></category>
            <category><![CDATA[india]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Sun, 22 Jan 2017 20:11:12 GMT</pubDate>
            <atom:updated>2017-01-22T20:11:12.831Z</atom:updated>
            <content:encoded><![CDATA[<h4>Current state of Deep Learning and it’s application in India</h4><p>What is Deep Learning? Well it’s the application of various small technologies to create and build this magical network of machines which takes it’s inspiration from human brain, and can start to marginally behave as one.</p><p>In India, Deep Learning and it’s application has started to crop up. Lot of new companies have started giving trained abilities to their customers so things are more personal, features are more accurate and errors are less. If you look at an interest graph you would notice that there is a sharp increase from the year 2015 and currently Deep Learning has the highest level of interest it has ever seen in the sub continent.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*PfvaoiLzbGHH0Dqtp7Mu_g.png" /><figcaption>Google Search Queries over the past 5 years for the term Deep Learning</figcaption></figure><p>A city wise breakup shows that Bengaluru leads in the number of enthusiasts being the top most followed by Hyderabad and other cities.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*45p3-vdTJ5T9xlCeL-lHfA.png" /><figcaption>Bengaluru, startup capital on India leading in Deep Learning queries</figcaption></figure><p>It’s a very new field and a lot can change soon. Only time will tell, if Deep Learning is going to bring a change in the lives of a billion people.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=289714a55d7d" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to install PyTorch on a Mac OS X]]></title>
            <link>https://medium.com/@debarko/how-to-install-pytorch-on-a-mac-os-x-97a79e28c70?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/97a79e28c70</guid>
            <category><![CDATA[neural-networks]]></category>
            <category><![CDATA[pytorch-installation]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[macos]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Sun, 22 Jan 2017 19:45:23 GMT</pubDate>
            <atom:updated>2017-01-22T19:45:23.458Z</atom:updated>
            <content:encoded><![CDATA[<h4>Tensors and Dynamic neural networks in Python with strong GPU acceleration.</h4><p><a href="https://github.com/pytorch/pytorch">PyTorch</a> was recently launched. This is a small article on how to install PyTorch on your system. It is available on anaconda the package manager.</p><p>Firstly you need to install Anaconda Package Manager for Mac. You can download the latest Anaconda package manager from <a href="https://www.continuum.io/downloads#osx">https://www.continuum.io/downloads#osx</a>. Download the command line interface as it is lighter by around 60mb and it’s shows installation errors easier. Depending on the version you have on your system download the 2.7 or 3.x version. Run the following command to start installation</p><pre>bash Anaconda2–4.2.0-MacOSX-x86_64.sh</pre><p>It will first as you to read the license and then will install at the default location of [/Users/&lt;username&gt;/anaconda2]. Which is generally fine. You may want to customise if required. The installation takes sometime, so go have some coffee around this time. At the end it will also add the command conda to the bash command list.</p><p>Once conda is installed you can start installing PyTorch in your system. Run the below command:</p><pre>conda install pytorch torchvision -c soumith</pre><p>Output will look something like this below…</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*C0DRdJG_ZzAkYNBm0f0KVQ.png" /><figcaption>PyTorch installation screen</figcaption></figure><p>The above screen can change with future releases. Since the software is freshly baked expect changes to happen frequently.</p><p>You can follow the below link to start building you next Deep Learning and Neural Networks project on a dynamically generated graph from PyTorch</p><p><a href="https://github.com/pytorch/tutorials">pytorch/tutorials</a></p><p>Any doubts, drop a comment below.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=97a79e28c70" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How is PyTorch different from Tensorflow?]]></title>
            <link>https://medium.com/hackernoon/how-is-pytorch-different-from-tensorflow-2c90f44747d6?source=rss-6a4320201780------2</link>
            <guid isPermaLink="false">https://medium.com/p/2c90f44747d6</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[torch]]></category>
            <category><![CDATA[pytorch]]></category>
            <dc:creator><![CDATA[Debarko De ]]></dc:creator>
            <pubDate>Thu, 19 Jan 2017 04:12:21 GMT</pubDate>
            <atom:updated>2017-07-14T21:57:20.818Z</atom:updated>
            <content:encoded><![CDATA[<p>PyTorch early release version was announced yesterday 1/19. PyTorch is currently maintained by <a href="https://apaszke.github.io/">Adam Paszke</a>, <a href="https://github.com/colesbury">Sam Gross</a> and <a href="http://soumith.ch/">Soumith Chintala</a>. The first question that comes to mind is <strong>What exactly is PyTorch? </strong>Well to put in the words of the makers, PyTorch gives</p><blockquote>GPU Tensors, Dynamic Neural Networks and deep Python integration.</blockquote><p>It’s a Python first library, unlike others it doesn’t work like C-Extensions, with a minimal framework overhead, integrating with acceleration libraries such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximise speed.</p><p>Let’s take a pause here and try to realise that till last few months, people were under the assumption that the deep learning library ecosystem was stabilising but it was far from the ground reality. Cutting edge tech in that ecosystem is ensuring efficient support for dynamic computation graphs and PyTorch just aces that is all aspects.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*5PLIVNA5fIqEC8-kZ260KQ.gif" /></figure><blockquote>Dynamic computation graphs arise whenever the amount of work that needs to be done is variable. This may be when we’re processing text, one example being a few words while another being paragraphs of text, or when we are performing operations against a tree structure of variable size. This problem is particularly prominent in particular subfields, such as natural language processing, where I spend most of my time.</blockquote><p>PyTorch is heavily influenced by <a href="http://chainer.org/">Chainer</a> and <a href="https://github.com/clab/dynet">DyNet</a>. In Chainer’s words, it is a difference between “Define-and-Run” frameworks and “Define-by-Run” frameworks. TensorFlow is a “Define-and-Run” framework where one would define conditions and iterations in the graph structure whereas in comparison Chainer, DyNet, PyTorch are all “Define-by-Run” frameworks. In this case at runtime the system generates the graph structure. This is closer to writing code in any language as a for loop in code will behave as a for loop inside the graph structure as well. TensorFlow doesn’t handle dynamic graphs very well though there are some not so flexible and frankly quite limiting primitive dynamic constructs.</p><p>Do follow me on <a href="http://twitter.com/debarko">twitter</a> and you can also signup for a small and infrequent <a href="http://debarko.de">mailing list</a> that I maintain. If you want to understand Deep Learning, go through this <a href="https://medium.com/@debarko/supervised-deep-learning-in-image-classification-for-noobs-part-1-9f831b6d430d#.rvo9n9os5">Medium post</a>.</p><blockquote><a href="http://bit.ly/Hackernoon">Hacker Noon</a> is how hackers start their afternoons. We’re a part of the <a href="http://bit.ly/atAMIatAMI">@AMI</a>family. We are now <a href="http://bit.ly/hackernoonsubmission">accepting submissions</a> and happy to <a href="mailto:partners@amipublications.com">discuss advertising &amp;sponsorship</a> opportunities.</blockquote><blockquote>To learn more, <a href="https://goo.gl/4ofytp">read our about page</a>, <a href="http://bit.ly/HackernoonFB">like/message us on Facebook</a>, or simply, <a href="https://goo.gl/k7XYbx">tweet/DM @HackerNoon.</a></blockquote><blockquote>If you enjoyed this story, we recommend reading our <a href="http://bit.ly/hackernoonlatestt">latest tech stories</a> and <a href="https://hackernoon.com/trending">trending tech stories</a>. Until next time, don’t take the realities of the world for granted!</blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2c90f44747d6" width="1" height="1" alt=""><hr><p><a href="https://medium.com/hackernoon/how-is-pytorch-different-from-tensorflow-2c90f44747d6">How is PyTorch different from Tensorflow?</a> was originally published in <a href="https://medium.com/hackernoon">HackerNoon.com</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>