<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Linh Tran on Medium]]></title>
        <description><![CDATA[Stories by Linh Tran on Medium]]></description>
        <link>https://medium.com/@linh-ktran?source=rss-cdba55a434e5------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*nA2Mn8PpdxICaAyuPxYPMg.png</url>
            <title>Stories by Linh Tran on Medium</title>
            <link>https://medium.com/@linh-ktran?source=rss-cdba55a434e5------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 30 May 2026 17:32:07 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@linh-ktran/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Demystifying Deep Learning, Part 2: Learning and Backpropagation]]></title>
            <link>https://medium.com/@linh-ktran/demystifying-deep-learning-part-2-learning-and-backpropagation-179aad6b09df?source=rss-cdba55a434e5------2</link>
            <guid isPermaLink="false">https://medium.com/p/179aad6b09df</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[neural-networks]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Linh Tran]]></dc:creator>
            <pubDate>Tue, 03 Sep 2024 12:56:14 GMT</pubDate>
            <atom:updated>2024-09-09T12:00:45.875Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*AAu3_Lt-icU4hbsDLdi6bw.png" /></figure><p>In <a href="https://medium.com/@linhha53/demystifying-deep-learning-part-1-single-and-multilayer-perceptrons-4bd445da6c0d">our previous post</a>, we discussed how to compute the output of a network using a given set of parameters (weights and biases) along with input data. This process is known as the forward pass or forward propagation.</p><p>To determine the set of parameters that best models the data, we must solve an optimization problem. This post will explain how to approach this problem and apply the method to a spiral data classification case.</p><h3>The learning problem</h3><p>Our previous post observed the network’s capability once it was trained. Let’s revisit the multi-layer perceptron (MLP) model, which has an input/output relationship utilizing, for example, two hidden layers:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/496/1*xDeRO-fNoQ5hESKN3K6CEw.png" /></figure><p>For a given input <strong>x</strong>, the output provides the model prediction, which could correspond to one of the three colors in the spiral dataset. To ensure the model predicts well, we need to evaluate its accuracy using a <strong>cost function</strong>. The cost function measures the error between the model’s prediction and the actual training data. Let us say we have N training data.</p><p>A simple choice for the cost function is the sum of the squared errors for each training data, labeled by 1 ≤ i ≤ N</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/478/1*VL2xDahE0YPc259HkaL2Uw.png" /><figcaption>Example of a cost function that takes the least-squares distance between the training and model data.</figcaption></figure><p>This cost function calculates the least-squares distance between the predicted and true values, emphasizing the dependency on all the model’s parameters — weights and biases — that need to be optimized.</p><p>A high value of the cost function indicates poor model predictions, while a low value suggests that the model is likely predicting well. The goal is to find the set of weights and biases that minimize this cost.</p><p>Note that for the spiral dataset, we will use a different cost function, which we will discuss later in the post.</p><h3>A recipe to minimize the cost function</h3><p>Minimizing the cost function is a numerical optimization problem and is a complex topic. We describe a simple way to do so, which is standard in most deep learning problems.</p><p>The idea is to follow an iterative process: we randomly choose a set of starting parameters (weights and biases) of the model, and update them at each iteration to gradually reduce the cost function. The key question is: how can we update these parameters to decrease the cost?</p><p>To answer this, we need to understand how sensitive the cost function is to changes in the parameters. The gradient of the cost function captures this sensitivity. The gradient is a vector that provides the direction and rate of the steepest increase in the cost function. Each component of this vector indicates how much the cost would change if a particular parameter were adjusted.</p><p>For a model with parameters (w, b), the gradient provides the information needed to update these parameters in a way that reduces the cost.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/233/1*g1vTplOoIuG1C_sAK6-WrA.png" /><figcaption>Notation for the gradient vector of the cost with respect to the weight and bias parameters.</figcaption></figure><p>We now have a recipe to minimize the cost function as an iterative process:</p><ol><li>At each iteration, start from an initial guess for the parameters.</li><li>Compute the gradient of the cost function with respect to each parameter.</li><li>Update the parameters in the direction that decreases the cost.</li></ol><p>This method is known as <strong>gradient descent</strong>. At each iteration, the idea is to update the next bias and weights as follows</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/127/1*xpHoUmiLjxEm40vJkSrehw.png" /><figcaption>Gradient descent update.</figcaption></figure><p>where η is a positive number known as the <strong>learning rate</strong> or <strong>step size</strong>, which controls the size of each update. While we fix this quantity for simplicity, more sophisticated methods adjust the learning rate dynamically.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YrQJj00PXEmLgY-Bpg2IYg.jpeg" /><figcaption><a href="https://en.wikipedia.org/wiki/Gradient_descent">Illustration of the gradient descent in 2D</a>: for a grid of initial guesses (dots on the left), the dots move toward the minima of the function (blue areas). The steps η are tracked by the tails of the dots.</figcaption></figure><p><strong>Analogy:</strong> Imagine you’re in the dark on a steep mountain and want to descend to a village below. You would use a stick to probe the ground and step in the direction where the slope decreases the most. You can interpret the picture above as being with a bunch of friends, starting from different places, looking for a way down to the village. Similarly, in gradient descent, you update parameters by following the direction of the steepest descent to minimize the cost function.</p><p>In deep learning, the cost function landscape can be extremely complex due to the model’s complexity and the large number of parameters. As you can see in the illustration there is no guarantee of finding the global minimum, gradient descent is quite effective in practice and forms the basis for more advanced optimization techniques. Because computing the gradient can be computationally expensive, deep learning methods often use <strong>stochastic gradient descent (SGD)</strong>, where the gradient is estimated using a subset of the training data.</p><h3>Computing the gradient — Automatic differentiation</h3><p>The gradient of the cost function is crucial for solving deep learning problems. The next challenge consists in how to compute the gradient.</p><p>For simple models, like the spiral classification case, we can manually compute the gradient. However, for more complex models, this manual approach is impractical. Fortunately, <strong>automatic differentiation</strong> <strong>(AD)</strong> provides a powerful solution. In this post, we’ll give a brief overview of how automatic differentiation works.</p><p>Automatic differentiation follows a divide-and-conquer strategy by breaking the mathematical expression of the cost function into elementary operations (+, -, *, /, sqrt, log, exp, etc.) through a <strong>computational graph</strong>. The two key components of automatic differentiation are:</p><ol><li><strong>Differentiation Rules</strong> for Elementary Operations: These rules define how to compute derivatives for basic operations.</li><li>Application of the<strong> Chain Rule:</strong> The chain rule allows us to compute the derivative of a composite function by combining the derivatives of its constituent tasks.</li></ol><p>Once we have this in mind, we can break any mathematical expression (such as given by a neural network) into a computational graph, and compute the derivatives between connected nodes for each operation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mGqs_ruH9OlXP6L-h_Q50Q.png" /><figcaption>Computational graph for the simple expression e = c*d = (a+b)*(b+1). Example of values with a=2 and b=1. The derivatives are interpreted as connections between two nodes.</figcaption></figure><p>Here’s how automatic differentiation works:</p><ol><li><strong>Construct a Computational Graph (Divide):</strong> Decompose the mathematical expression of the cost function into a graph where nodes represent operations and edges represent dependencies. For example, for the expression e = c*d=(a + b)*(b + 1), you create a graph showing how <em>a</em> and <em>b</em> contribute to <em>e</em> through the intermediate quantities <em>c</em> and <em>d</em>.</li><li><strong>Apply Differentiation Rules:</strong> Use predefined differentiation rules for each operation between connected nodes in the graph. In our example, we need the differentiation rule for multiplication and addition. The first connection gives d<em>e</em>/d<em>c</em> = d(<em>c</em>*<em>d</em>)/d<em>c</em> = <em>d</em>. The next one gives d<em>c</em>/d<em>a</em>=d(<em>a</em>+<em>b</em>)/d<em>a</em> = 1, etc. Do so and fill each edge of the graph.</li><li><strong>Gradient Accumulation (Conquer):</strong> At the end of the day, we want to know how changes in <em>a </em>and <em>b</em> affect <em>e, </em>that is <em>de/da</em> and <em>de/db</em>. Thanks to the <strong>chain rule</strong>, we can accumulate the results from step 2 to obtain the global changes. Applied to <em>a</em> the chain rule gives d<em>e</em>/d<em>a</em> = d<em>e</em>/d<em>c</em>*d<em>c</em>/d<em>a</em>, which from step 2 gives <em>d*1=b+1</em>. Applied to<em> b </em>the chain rule sums from both edges of the graph and we get d<em>e</em>/d<em>b = </em>d<em>e</em>/d<em>c</em>*d<em>c</em>/d<em>b</em> + d<em>e</em>/d<em>d</em>*d<em>d</em>/d<em>b</em> = <em>d</em>*1<em> + c</em>*1 = <em>d+c = 2b</em> + <em>a</em> + 1.</li></ol><p>What we have described is known as <strong>reverse-gradient accumulation</strong>, because we start from the output <em>e</em> and go back to the inputs <em>a</em> and <em>b</em>. Put shortly, the process accumulates gradients through each edge by multiplying derivatives along the path, and finally add all possible path together.</p><p>This is exactly the <strong>backpropagation </strong>algorithm, it computes the gradient of the cost through reverse-gradient accumulation. The strength of the approach is that the gradient computation is <em>exact</em>, there is no approximation.</p><p>Automatic differentiation is implemented in most deep learning software such as Pytorch, TensorFlow, JAX, etc. With one of these tools, you can define any custom forward model and directly hit the “train” or “fit” function: this will launch an optimization algorithm such as the gradient descent, including the automatic computation of the gradient.</p><h3>Training the spiral classification problem</h3><p>Now that we have discussed some core concepts of deep learning, let us apply them to classify the spiral dataset.</p><p>We’ll first focus on the single-layer perceptron (SLP) and then extend the approach to a multi-layer perceptron (MLP) with one hidden layer.</p><p>Before that, we need to use a suitable <em>cost function</em>. For each training data, the output of the SLP or MLP is y_i, a [3x1] vector with 3 arbitrary numbers. However we would like the output to represent one of the three colors classes {red, purple, green} to compare with the exact data. This conversion is done via a <a href="https://en.wikipedia.org/wiki/Softmax_function">softmax function</a>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/488/1*sMd9oT6H41Z-vsDxR-E-dw.png" /><figcaption>The softmax converts numbers into probabilities. The cost takes the natural log of the probability vector, which values a probability close to 1; that is likely to represent the correct color.</figcaption></figure><p>To compute the cost, we take the natural log of the probability vector, and select the component associated to the color of the training data. Finally we average over all the training points. The mathematical expression is cumbersome, but translates as follows</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/559/1*3WKMdZt8LnIbRl_Y0PHImQ.png" /><figcaption>The cost function for the spiral classification learning case. Here we have K=3 classes, indexed by 1 ≤ k ≤ K. (C_i)_k and (y_i)_k are the k-th component of the vectors C_i and y_i.</figcaption></figure><p>This is known as the <em>cross-entropy cost. </em>In short, the choice of the cost function is not obvious and depends on the problem.</p><ol><li><strong>Single-Layer Perceptron (SLP)</strong></li></ol><p>To compute the gradient of this cost, we follow the logic of automatic differentiation. We do it by hand for the model with no hidden layer to understand how things work. We decompose the gradient of the cost according to the chain rule, compute the elementary gradients thanks to differentiation rules and multiply the results together. The output of the SLP network is y_i = W*x + b, and computations give, for the weights and biases,</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/350/1*TyaD0Oh4RhyEDpHCepD96g.png" /><figcaption>Partial derivatives of the cost C_i with respect to the weights and biases for the single-layer perceptron model. The gradient of the total cost C follows by linearity.</figcaption></figure><p>Now that we have the gradient of the cost , we can implement the gradient descent iterations inside a loop to update the weights and biases. Inside the loop, we i) compute the output of the network and the cost function, ii) the derivatives of the cost, and iii) finally update the weights and biases. Here is a draft of the code</p><pre># Train a Linear Classifier<br><br># initialize parameters randomly, <br># with here D=2 (dimension) and K=3 (number of classes)<br>W = 0.01 * np.random.randn(D,K) # matrix of weights<br>b = np.zeros((1,K)) # vector of biases<br><br>num_examples = X.shape[0] # X are our ~200 data points<br>for n in range(300): # gradient descent loop, 300 iterations<br>  # 1 - output of the single-layer perceptron given the data X<br>  y_i = np.dot(X, W) + b <br>  # compute the class probabilities<br>  probs = np.exp(y_i) / np.sum(np.exp(y_i), axis=1, keepdims=True)<br>  # 1 - compute the average cross-entropy cost<br>  C_ik = -np.log(probs[range(num_examples),k]) # k gives the color of the data<br>  cost = np.sum(C_ik)/num_examples<br>  <br>  if n % 10 == 0:<br>    print (&quot;iteration %d: loss %f&quot; % (n, cost))<br>  <br>  # 2 - gradient computation <br>  # dC/dyi<br>  dscores = probs<br>  dscores[range(num_examples),k] -= 1 # p_i - 1<br>  dscores /= num_examples # average over training points<br>  # backpropagate the gradient to parameters (W,b) + sum over training data<br>  dW = np.dot(X.T, dscores) # dC/dw = (p_i-1)*x^T<br>  db = np.sum(dscores, axis=0, keepdims=True) # dC/db = p_i - 1<br>  <br>  # 4 - gradient descent update, with learning rate eta=1<br>  W -= dW <br>  b -= db</pre><p>2. <strong>Multi-Layer Perceptron (MLP)</strong></p><p>For the MLP model with one hidden layer, we must compute the intermediate partial derivatives for the weights and biases corresponding to the first and hidden layers (W⁰, W¹, b⁰, b¹), and also account for the derivative of the activation function in the hidden layer. Even though we can take advantage of the chain rule and reuse some computations, the derivation becomes quite cumbersome. We can now see why automatic differentiation is so useful !</p><p>We have added the contribution of the hidden layer in the forward model and the gradient computation in the code that is available <a href="https://github.com/linh-ktran/demystifying-deep-learning/blob/main/minimal_net.ipynb">here.</a> The principle is exactly the same as for the SLP.</p><p>We let the gradient descent run to obtain our final set of parameters (w, b). We obtain a training accuracy of 50% for the single-layer model SLP and 98% for the multi-layer model MLP, and also get the boundary classification from the first part of the article :-)</p><p>That’s it! I hope you understand the basics of training a neural network. it relies on a forward model, a gradient descent algorithm and an accurate computation of the gradients of the cost function. This paves the way to study more advanced forward propagation models, such as convolutional or recurrent neural networks.</p><h4>Code:</h4><p><a href="https://github.com/linh-ktran/demystifying-deep-learning">GitHub - linh-ktran/demystifying-deep-learning</a></p><h4>Resources:</h4><ul><li><a href="https://cs231n.github.io/neural-networks-case-study/">CS231n Convolutional Neural Networks for Visual Recognition</a></li><li><a href="https://colah.github.io/posts/2015-08-Backprop/">Calculus on Computational Graphs: Backpropagation</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=179aad6b09df" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Demystifying Deep Learning, Part 1: Single and Multilayer Perceptrons]]></title>
            <link>https://medium.com/@linh-ktran/demystifying-deep-learning-part-1-single-and-multilayer-perceptrons-4bd445da6c0d?source=rss-cdba55a434e5------2</link>
            <guid isPermaLink="false">https://medium.com/p/4bd445da6c0d</guid>
            <category><![CDATA[python]]></category>
            <category><![CDATA[neural-networks]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Linh Tran]]></dc:creator>
            <pubDate>Fri, 24 May 2024 13:36:57 GMT</pubDate>
            <atom:updated>2024-09-06T14:41:01.677Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*C7d559pdqi-EdDISABAl3Q.jpeg" /></figure><p>Deep Learning has become of utmost importance in the past years, impacting almost every engineering and scientific discipline. In this post, we aim to clarify what is hidden behind deep learning and review its core underlying principles. This post only assumes knowledge of basic calculus.</p><h4><strong>What do we mean by Deep Learning?</strong></h4><blockquote>If we look at Wikipedia, Deep Learning is defined as <em>a subset of machine learning methods based on neural networks with representation learning</em>.</blockquote><p>It is fair to think of deep learning as a specific category of machine learning models. Deep learning uses representation learning, which means these models are designed to learn features from data. A feature is simply a measurable property (a numeric value) that characterizes a dataset: it can be an age, a color, an animal type, a movie gender, etc.</p><p>So what do we mean by learning? Learning refers to finding an accurate representation, or an accurate model of the data. For example, a famous learning task consists of recognizing handwritten digits from a series of images.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/566/1*-KrFuhtoT9Rr-o525mgL3w.png" /><figcaption>Given a handwritten image, can a mathematical model distinguish between digits from 0 to 9?</figcaption></figure><p>In this task, we want to find a mathematical model that can distinguish between 10 digits. This is usually achieved in two steps :</p><ol><li><strong>Training</strong>: We fit a model to a given dataset. The model is a mathematical function described by a number of unknown parameters.</li><li><strong>Evaluation</strong>: We test our model on unknown data.</li></ol><p>If the model can evaluate the correct output, we can say that the machine has learned a representation of handwritten digits.</p><p>Deep learning is a subset of machine learning and uses a specific mathematical model for this task called artificial neural networks. The rest of this post is devoted to how to train/fit a basic artificial neural network to a given dataset, such as handwritten digits.</p><h4><strong>Artificial neural networks</strong></h4><p>Artificial neural networks are a mathematical model inspired by the neurons of the human brain. They relate <strong>input</strong> data <strong><em>x</em></strong> (e.g. the pixel grayscale value of a handwritten digit image) to <strong>output</strong> data <strong><em>y</em></strong> (e.g. an integer from 0 to 9) through a function <em>f</em> such that: <strong><em>y </em></strong><em>= f(</em><strong><em>x</em></strong><em>)</em>.</p><p>An artificial neural network, or neural net, refers to a specific structure of this function. We explain the concept of a multi-layer network in two building blocks.</p><p><strong>1. The Artificial Neuron Model (Perceptron)</strong>: An output value <em>y</em> is related to an input vector <strong><em>x </em></strong><em>= (x_1, x_2, …, x_n)</em> by a weighted sum, parametrized by a vector of weights <strong><em>w </em></strong><em>= (w_1, w_2, …, w_n)</em> and a bias value <em>b</em>. The weighted sum is then passed through an activation function <em>g</em>. The activation function, which adds nonlinear behavior, is a choice of the model. For this post, we do not explain how to choose it.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/462/1*bDvZpfjE3_Dt5ihpK33k9A.png" /><figcaption>A simple perceptron model with three inputs (x_1, x_2, x_3) and one output y.</figcaption></figure><p>The perceptron encodes a relationship between the inputs and the output through a set of parameters: the weights and the bias. By choosing the bias and weights appropriately, the model can represent elementary relationships between data.</p><p>2. <strong>Multi-Layer Networks</strong>: Relations between data are often more complex. The perceptron can be interpreted as one layer of a neural network, sometimes called a <em>single-layer</em> perceptron. An essential concept of deep learning is to build a large (or deep) network of perceptrons arranged in <em>multiple layers</em>, which leads to the <strong>multi-layer perceptron</strong> model.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/967/1*NZnOBKrSNWoPAQP_fUYTPw.png" /><figcaption>Structure of a multi-layer perceptron model with two hidden layers.</figcaption></figure><p>Each hidden/output node is now related to all the nodes of the previous layer: we say that the network is fully connected. For each layer, we can write the output/input relation between the layer <em>l + 1</em> and the layer <em>l:</em></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/496/1*xDeRO-fNoQ5hESKN3K6CEw.png" /></figure><p>We have used vector notations <strong>in bold </strong>to encode the values of <strong>x</strong>, <strong>h,</strong> and <strong>b </strong>at the l-th layer, and denoted <em>W </em>the matrix of weights. Following the figure we have for example <strong>h</strong>¹ = (h¹₁, h¹₂, h¹₃, h¹₄). The weighted sum from the perceptron is extended to be a matrix-vector product for the multi-layer perceptron, for example</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/488/1*DIjNID2Xe3gCycMV59Pqzg.png" /></figure><p>In short, we can think about the <em>deep</em> structure of the multi-layer perceptron as a multiple composition of perceptrons. This gives more unknown weights and bias parameters to be found, but more freedom for the model to adapt to complex data relationships. The action of obtaining the output <strong><em>y</em></strong> from the input <strong><em>x</em></strong> is called a <strong>forward pass</strong>.</p><h4><strong>Implementing the forward pass in Python</strong></h4><p>We can implement the forward pass of a multi-layer perceptron in a few lines. We use random weights and bias for the example:</p><pre>import numpy as np<br><br>sizes = [3, 4, 3, 1]  # number of neurons in each layer<br>num_layers = len(sizes)  # one input layer, 2 hidden layers and one output layer<br><br># the first layer does not have bias<br>biases = [np.random.randn(sizes[i], 1) for i in range(1, num_layers)]<br># we use the transpose of the weight matrices<br>weights = [np.random.randn(sizes[i], sizes[i-1]) for i in range(1, num_layers)]</pre><p>Once we have initialized the network size and its parameters, we are ready to implement the forward pass. We can then feed the network with any arbitrary input.</p><pre># Define an activation function<br>def g(z):<br>    &quot;&quot;&quot;Sigmoid function.&quot;&quot;&quot;<br>    return 1.0/(1.0 + np.exp(-z)) <br><br>def network_forward(h): # define output/input relation for a MLP<br>    &quot;&quot;&quot;Return the output of the network given input h.&quot;&quot;&quot;<br>    for b, w in zip(biases, weights):<br>        z = np.dot(w, h) + b # mat-vect product + bias<br>        h = g(z) # apply activation function<br>    return h<br><br># Example input<br>input = np.array([[1.],[2.],[3.]]) # a column vector (1,2,3)^T<br>output = network_forward(input)<br>print(output)</pre><h4><strong>Comparison between a single and multi-layer perceptron</strong></h4><p>We compare the capabilities of the single and multi-layer perceptron models on a toy classification example. The input dataset consists of points in the two-dimensional plane arranged in a spiral fashion. The output is converted into class probabilities that represent three colors: red, green, and purple. We use two models</p><ul><li><strong>Single-layer Perceptron</strong>: 2 inputs (x-y coordinates) and 3 outputs (3 colors)</li><li><strong>Multi-layer Perceptron</strong>: 2 inputs (x-y coordinates), one hidden layer with 50 connected neurons, and 3 outputs (3 colors)</li></ul><p>The networks are trained to find/learn the “good” values for the weights and biases. Thanks to its hidden layer, the multi-layer perceptron can draw a more complex boundary decision compared to the single-layer perceptron. We gain classification accuracy, but we need to pay the price of training all the parameters introduced by the hidden layer.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*IvJzHtMyZhYAzmM5d0rMig.png" /><figcaption>Training spiral data and its classification with a single-layer perceptron (left) and multi-layer perceptron (right).</figcaption></figure><p>We have seen a simple example of a deep neural net architecture and its modeling capability. The same model structure can be used for recognizing handwritten digits. To complete our introduction to deep learning, we need to shed light on <em>how to train the network</em>, that is how to find the best parameters that fit the data. This will be the topic of the second part of this post.</p><p>Stay tuned!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4bd445da6c0d" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Analyzing MINLP Solver Performance: A Case Study of Cooling Production Systems]]></title>
            <link>https://medium.com/@linh-ktran/analyzing-minlp-solver-performance-a-case-study-of-cooling-production-systems-9ead558c4619?source=rss-cdba55a434e5------2</link>
            <guid isPermaLink="false">https://medium.com/p/9ead558c4619</guid>
            <category><![CDATA[solver]]></category>
            <category><![CDATA[optimization]]></category>
            <category><![CDATA[optimization-algorithms]]></category>
            <category><![CDATA[algorithms]]></category>
            <category><![CDATA[operations-research]]></category>
            <dc:creator><![CDATA[Linh Tran]]></dc:creator>
            <pubDate>Thu, 16 May 2024 14:16:27 GMT</pubDate>
            <atom:updated>2024-05-16T14:16:27.407Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/700/0*QvfFZakrvOCarSML.jpeg" /><figcaption>Figure 1. A pictorial representation of non-linear optimization [1]</figcaption></figure><p>In mathematical optimization, solving Mixed-Integer Nonlinear Programming (MINLP) problems pose unique challenges, especially when dealing with complex systems like cooling production. In this analysis, we delve into the performance of three prominent MINLP solvers — Baron, Bonmin, and Couenne — using a concrete example of a cooling production system. Through the use of the optimization framework <a href="https://www.pyomo.org/">Pyomo</a>, I propose in this post to explore their speed, reliability, and suitability for convex and non-convex problem sets.</p><h3>What is mixed-integer nonlinear programming?</h3><p>Before we dive into the heart of the topic, let’s define briefly what we mean by MINLP. MINLP seeks to minimize a function that depends on both continuous and discrete variables, making it a versatile tool for optimization. Convex MINLP problems enjoy certain advantages, while non-convex ones introduce complexities such as local optima, challenging the solver’s efficiency and accuracy. Here is how it looks in mathematical terms:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1012/1*9XngQoCQovHnv7p64UW7fQ.png" /></figure><h3>Strategy: Algorithms Employed by MINLP Solvers</h3><p>MINLP solvers leverage a blend of Linear Programming, Integer Programming, and Nonlinear Programming algorithms. Two classical methods, Branch-and-Bound (BB) and Outer Approximation (OA), are commonly used to address MINLPs.</p><p><a href="https://en.wikipedia.org/wiki/Branch_and_bound"><strong>Branch-and-Bound</strong></a><strong> (BB)</strong>: A foundational technique that iteratively partitions the problem space into smaller sub-problems, converging towards an optimal solution.</p><p>In the illustration, the partition of the problem into sub-problems is represented with a tree structure where each of these new constraints represents a branching decision. The tree is terminated until the solution of the continuous relaxation of the sub-problem is either (i) infeasible, (ii) integer feasible or (iii) lower bound value is not anymore smaller than the value of the incumbent solution.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/375/1*-HF8OE8gAJGxhnbBEqnBtw.png" /><figcaption>Figure 2: Illustration of the Branch-and-Bound algorithm.</figcaption></figure><p><strong>Outer Approximation (OA)</strong>: Utilizes decomposition and relaxation to solve a sequence of nonlinear programming subproblems and mixed-integer linear master programs alternately. This method adds cuts at solutions of NLP subproblems and uses the solutions of NLP subproblems to generate cuts which are added to the new relaxations. In the end, there should be enough cuts so that the solution of the MIP relaxation will be also valid to the nonlinear constraints.</p><p>The flow chart below shows us the approach of the outer approximation in solving the MINLP problem. The idea is to iteratively build a sequence y0, y1 … yk from an initial solution guess y0, and update the solution if the MIP is still feasible.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/912/1*DCZr__bycKtrJwjEK_-gfA.png" /><figcaption>Figure 3: Outer Approximation algorithm flowchart.</figcaption></figure><h3>Convex vs. Non-Convex MINLP</h3><p>Convex MINLP problems benefit from well-defined properties, while non-convex ones present greater computational challenges due to the presence of local optima. Techniques like Convexification, Convex Envelopes, and Spatial Branch-and-Bound are employed to tackle these complexities effectively.</p><h3>MINLP Solvers: A Closer Look</h3><h4>Major (Local) MINLP Solvers</h4><p><a href="https://www.coin-or.org/Bonmin/"><strong>Bonmin</strong></a>: This open-source solver offers a range of algorithms catering to both convex and non-convex problems. Notably, its Branch-and-Bound algorithm serves as a heuristic for non-convex scenarios.</p><h4>Global Solvers</h4><p><a href="https://www.coin-or.org/Couenne/"><strong>Couenne</strong></a>: Developed on top of Bonmin, Couenne specializes in finding global optima for non-convex MINLPs. Based on a spatial branch-and-bound framework, this solver obtains a lower bound through an LP relaxation using the reformulation techniques. The algorithm is enhanced by several bound-tightening procedures, a separator of disjunctive cuts, as well as a recently introduced feasibility pump heuristic, and symmetry handling.</p><p><a href="https://www.minlp.com/baron-solver"><strong>Baron</strong></a>: A commercial solver renowned for its efficiency in solving non-convex optimization problems to global optimality. Its spatial branch-and-bound technique, coupled with domain reduction strategies, makes it a top choice for complex scenarios.</p><p>The table below presents the main features of each solver:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*b6nt1lmTbevXHxTemI6grQ.png" /><figcaption>Table 1: Comparison of the main characteristics of the Bonmin, Couenne and Baron solvers [2, 3, 4].</figcaption></figure><h3>Computational Study:</h3><h4>Test Problem Formulation</h4><p>An example of a cooling production system with 3 chillers A, B and C.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*n8c5oTI2jmxvKMvomuQ1dw.png" /><figcaption>Figure 4: A cooling production system.</figcaption></figure><p>The key decision for this problem is to find the optimal combination of chiller status (on/off) together with the operational parameters for the chillers: water flows and return water temperature in the primary loop.</p><h4>Cost function:</h4><p>The cost function to be optimized represents the total thermal energy consumption, here the sum of the 3 thermal chillers</p><p>The cost function takes the form :</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/960/1*FFjNx4BPAu6KeSWP7ZCOqA.png" /></figure><p>Sets:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/854/1*4_CxJQxfcy4b2eZ5gBvJlg.png" /></figure><p>Variables:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/852/1*eG0WcUHA4vD23DlHQ1-h4g.png" /></figure><p>Constraints:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/850/1*nTnMYDQ4d6PAhI59toLA-w.png" /></figure><h4>Results</h4><p>In our computational study of a cooling production system, we evaluate the performance of Baron, Bonmin, and Couenne across varying problem sizes, focusing on resolution time and objective values. While all three solvers employ a branch-and-bound approach, their effectiveness varies across convex and non-convex problem sets.</p><h4>Observations:</h4><p>For Convex Problems: Bonmin or Baron are preferable options, with Bonmin being open source and Baron offering commercial-grade accuracy.</p><p>For Non-Convex Problems: While Bonmin operates as a heuristic, both Couenne and Baron are exact solvers. However, Baron’s performance on resolution time may exhibit instability, especially in specific cases like the chiller optimization use case.</p><p><strong>Resolution Time</strong>: Despite being global solvers, the stability of resolution time for Baron may raise concerns, indicating potential variability in performance.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/662/1*DKh2K0AIgrigPOPYryfytA.png" /><figcaption>Figure 5: Resolution time using Baron, Bonmin, and Couenne solvers.</figcaption></figure><p>In conclusion, the choice of MINLP solver hinges on the problem’s convexity and the balance between accuracy and computational efficiency. While global solvers like Baron and Couenne offer comprehensive solutions, local solvers like Bonmin provide viable alternatives for convex optimization. By understanding the nuances of each solver, practitioners can make informed decisions to optimize their problem-solving workflows effectively.</p><h3>References</h3><p>[1] Amini, Alexander, et al. “Spatial uncertainty sampling for end-to-end control.” <em>arXiv preprint arXiv:1805.04829</em> (2018).</p><p>[2] Pietro Belotti. Couenne: a user’s manual. Technical report, Technical report, Lehigh University, 2009.</p><p>[3] Pierre Bonami and Jon Lee. Bonmin user’s manual. Numer Math, 4:1–32, 2007.</p><p>[4] Nikolaos V Sahinidis. Baron — branch and reduce optimization navigator. User’s Manual Version, 4, 2000.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9ead558c4619" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Analyzing CHIRPS rainfall data using Google Earth Engine]]></title>
            <link>https://medium.com/@linh-ktran/analyzing-chirps-rainfall-data-using-google-earth-engine-bb4901ca29b7?source=rss-cdba55a434e5------2</link>
            <guid isPermaLink="false">https://medium.com/p/bb4901ca29b7</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[python]]></category>
            <category><![CDATA[google-earth-engine]]></category>
            <dc:creator><![CDATA[Linh Tran]]></dc:creator>
            <pubDate>Sat, 30 Mar 2024 16:47:33 GMT</pubDate>
            <atom:updated>2024-03-30T16:47:33.740Z</atom:updated>
            <content:encoded><![CDATA[<p><strong>Introduction:</strong> Recently, I had the opportunity to delve into a project that involved analyzing rainfall data for a specific area of interest. The task was to compute the average number of rainy days per month over several years, utilizing the <a href="https://www.chc.ucsb.edu/data/chirps">CHIRPS dataset</a>. In this post, I’ll walk you through the process I followed, leveraging the powerful capabilities of Google Earth Engine.</p><p><strong>Objective:</strong> The task was clear: for each month, calculate the average number of rainy days based on the CHIRPS dataset. To accomplish this, I needed to pre-process the data using Google Earth Engine. For me, I decided to use Google Earth Engine’s Python API as I’m more familiar with Python, but there are also JavaScript options available for those who prefer that language.</p><p><strong>Environment Setup:</strong> Before diving into the implementation, let’s set up our environment. We’ll use Google Colab for this project. First, we need to install the required packages and authenticate with Google Earth Engine.</p><pre># Environment settings<br>!pip install earthengine-api --upgrade<br>!pip install pycrs<br>!pip install rasterio<br> <br># Authentication and initialization<br>import ee<br>ee.Authenticate()<br>ee.Initialize(project=&#39;ee-linhha53&#39;)</pre><p><strong>Using Google Earth Engine:</strong> Now that our environment is set up, let’s proceed with utilizing Google Earth Engine to pre-process the CHIRPS data.</p><ul><li>Objective: The average number of rainy days per month for the area of interest</li><li>Dataset: CHIRPS DAILY The daily rain data</li><li>Tool: Using Google Earth Engine through Python API</li></ul><p>STEP 1: Create an ImageCollection with CHIRPS daily precipitation data.</p><pre># Define start and end years, dates (1981 - 2023)<br>start_year = 1981<br>end_year = 2023<br>start_date = ee.Date.fromYMD(start_year, 1, 1)<br>end_date = ee.Date.fromYMD(end_year, 12, 31)<br> <br># Initialize the library.<br>chirps = ee.ImageCollection(&#39;UCSB-CHG/CHIRPS/DAILY&#39;)<br>           .filterDate(start_date, end_date)</pre><p>STEP 2: Create a binary mask for rainy days.</p><p>Rainy day: precipitation &gt; 0 mm/day. threshold=0</p><pre>def create_rainy_mask(image, threshold=0):<br>    &quot;&quot;&quot;Function to create a binary rainy day mask. &quot;&quot;&quot;<br>    rainy_mask = image.select([&#39;precipitation&#39;]).gt(threshold)<br>    return image.addBands(rainy_mask.rename(&#39;rainy_mask&#39;))<br> <br># Map the function over the ImageCollection<br>rainy_mask = chirps.map(create_rainy_mask).select(&#39;rainy_mask&#39;)</pre><p>STEP 3: Calculate the sum of rainy days for a given month in a given year</p><p>Create years and months sequences:</p><pre># Create an ee.List object with a sequence of integer numbers between a start year and an end year<br>years = ee.List.sequence(start_year, end_year)<br># Create an ee.List object with a sequence of integer numbers between 1 and 12 (months)<br>months = ee.List.sequence(1, 12)<br> <br>def yearly_monthly_sum(year):<br>    &quot;&quot;&quot;Function to map the monthly_sum() to every month in the months sequences.&quot;&quot;&quot;<br>    def monthly_sum(month):<br>        &quot;&quot;&quot;Function to calculate the sum of rainy days for a given month in a given year.&quot;&quot;&quot;<br>        w = rainy_mask.filter(ee.Filter.calendarRange(year, year, &#39;year&#39;)) \<br>                      .filter(ee.Filter.calendarRange(month, month, &#39;month&#39;)) \<br>                      .sum()<br>        return w.set(&#39;year&#39;, year) \<br>                .set(&#39;month&#39;, month) \<br>                .set(&#39;system:time_start&#39;, ee.Date.fromYMD(year, month, 1))<br>    return months.map(monthly_sum)<br> <br># Map the yearly_monthly_sum() function to every year in the years sequence to get an ImageCollection with total rainy days in each month in each year<br>monthly_rainy_days = ee.ImageCollection.fromImages(<br>    years.map(yearly_monthly_sum).flatten()<br>)</pre><p>STEP 4: Calculate the average rainy days for a given month across all years</p><pre>def monthly_mean(month):<br>    &quot;&quot;&quot;Function to calculate the average rainy days for a given month across all years.&quot;&quot;&quot;<br>    # Round to the nearest integer.<br>    w = monthly_rainy_days.filter(ee.Filter.eq(&#39;month&#39;, month)).mean().round()<br>    return w.set(&#39;month&#39;, month) \<br>            .set(&#39;system:time_start&#39;, ee.Date.fromYMD(1, month, 1))<br> <br># Map the monthly_mean() function to every month in the months sequence and make an ImageCollection from the obtained images<br>mean_monthly_rainy_days = ee.ImageCollection.fromImages(<br>    months.map(monthly_mean).flatten()<br>)</pre><p>STEP 5: Clip to the area of interest</p><pre># Add the area of interest<br>aoi = geemap.shp_to_ee(&quot;/content/drive/MyDrive/data/aoi.shp&quot;)<br> <br>def clip_image_collection(image):<br>    &quot;&quot;&quot;Function to clip each image to the area of interest shapefile.&quot;&quot;&quot;<br>    return image.clip(aoi)<br> <br># Map the clip_image_collection() function to the mean_monthly_rainy_days collection <br>mean_monthly_rainy_days_clipped = mean_monthly_rainy_days.map(clip_image_collection)</pre><p><strong>Displaying the Results:</strong> Now that we have processed the data, let’s visualize the results.</p><ol><li>Interactive map:</li></ol><p>The geemap library can be used to display ee.Image objects on an interactive ipyleaflet map.</p><p>2. Visualizing results:</p><ul><li>Export the raster image to Drive in format Tif File</li><li>Visualizing Tif File Using Matplotlib and GDAL</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1023/0*GSys4V6GmQ5h3tSO" /></figure><p><strong>Conclusion:</strong> In conclusion, this project was a rewarding experience that showcased the power and versatility of Google Earth Engine in analyzing geospatial data. By leveraging this platform, I was able to efficiently process the CHIRPS dataset and derive meaningful insights into rainfall patterns for the specified area of interest. Moving forward, I’m excited to explore more projects that harness the potential of geospatial analysis tools for addressing real-world challenges.</p><p><strong>Additional Resources:</strong></p><ul><li><a href="https://github.com/linh-ktran/GEE_CHIRPS_project">Github Codes</a></li><li><a href="https://earthengine.google.com/">Google Earth Engine</a></li><li><a href="https://www.chc.ucsb.edu/data/chirps">CHIRPS Dataset</a></li></ul><p>Feel free to reach out if you have any questions or insights to share about this project!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=bb4901ca29b7" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>