<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Jim Fleming - Medium]]></title>
        <description><![CDATA[What I’m working on. - Medium]]></description>
        <link>https://medium.com/jim-fleming?source=rss----82978fcc2cbc---4</link>
        <image>
            <url>https://cdn-images-1.medium.com/proxy/1*TGH72Nnw24QL3iV9IOm4VA.png</url>
            <title>Jim Fleming - Medium</title>
            <link>https://medium.com/jim-fleming?source=rss----82978fcc2cbc---4</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Wed, 20 May 2026 13:45:19 GMT</lastBuildDate>
        <atom:link href="https://medium.com/feed/jim-fleming" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Running TensorFlow (with GPU) on Kubernetes]]></title>
            <link>https://medium.com/jim-fleming/running-tensorflow-on-kubernetes-ca00d0e67539?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/ca00d0e67539</guid>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[kubernetes]]></category>
            <category><![CDATA[gpu]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Fri, 24 Mar 2017 22:50:02 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:48.442Z</atom:updated>
            <content:encoded><![CDATA[<p>While GPUs are a staple of deep learning, deploying on GPUs makes everything more complicated, including your Kubernetes cluster. This quick guide will walk through adding basic single-GPU support to Kubernetes.</p><p>The guide assumes that <a href="http://kubernetes.io/">Kubernetes</a> is already running on Ubuntu. A LTS release is preferable, with 14.04 being most preferable due to NVIDIA recommendations for driver hosts. <strong><em>Warning: Ubuntu 14.04 is not well supported by Kubernetes. Feel free to use a different distro. </em></strong>This guide also assumes that the proper GPU drivers and CUDA version have been installed. Plenty of other guides cover those topics.</p><p><strong><em>TL;DR:</em></strong><em> start with nvidia-docker, then whittle away it’s functionality so that just plain docker remains. Then add that functionality to Kubernetes.</em></p><h3>Working without nvidia-docker</h3><p>A common way to run containerized GPU applications is to use <a href="https://github.com/NVIDIA/nvidia-docker">nvidia-docker</a>. Here is an example of running TensorFlow with full GPU support inside a container.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/e00e0a7edbb9cc6e01dc73929dfb4f25/href">https://medium.com/media/e00e0a7edbb9cc6e01dc73929dfb4f25/href</a></iframe><p>Simple! If all goes well the output should look something like this:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/74ea3bbe84e4f164815b9e55c650e789/href">https://medium.com/media/74ea3bbe84e4f164815b9e55c650e789/href</a></iframe><p>Unfortunately it’s not current possible to use nvidia-docker directly from Kubernetes. Additionally, Kubernetes does not support the <a href="https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker-plugin">nvidia-docker-plugin</a> since Kubernetes does not use Docker’s volume mechanism.</p><p>The goal is to manually replicate the functionality provided by nvidia-docker (and it’s plugin). For demonstration, query the nvidia-docker-plugin REST API to query the command line arguments:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/fd936ae4933d3f8b769c724fc0cd5321/href">https://medium.com/media/fd936ae4933d3f8b769c724fc0cd5321/href</a></iframe><p>Which will feed into docker, running the same python command:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d14ed8674a73ef2b3fee62a2c45aee68/href">https://medium.com/media/d14ed8674a73ef2b3fee62a2c45aee68/href</a></iframe><p>If all does well, TensorFlow should find everything correctly and you should see the same output as before.</p><p>Finally, the dependency on nvidia-docker-plugin by manually specifying the driver path and manually mounting the devices and CUDA volumes.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b9a25c0d5c8fb2e23244e5158ad596ba/href">https://medium.com/media/b9a25c0d5c8fb2e23244e5158ad596ba/href</a></iframe><p>Note that this still uses nvidia-docker’s driver volume for discovery. While Kubernetes cannot call the plugin directly we can use the filesystem.</p><h3>Enabling GPU devices</h3><p>With the knowledge of what Docker needs to be able to run a GPU-enabled container it is straightforward to add this to Kubernetes. The first step is to enable an experiment flag on all of the GPU nodes. In the Kubelet options (found in /etc/default/kubelet if you use upstart for services), add --experimental-nvidia-gpus=1. This does two things… First, it allows GPU resources on the node for use by the scheduler. Second, when a GPU resource is requested, it will add the appropriate device flags to the docker command. This post describes a little more about what and why this flag exists:</p><p><a href="http://blog.clarifai.com/how-to-scale-your-gpu-cloud-infrastructure-with-kubernetes">http://blog.clarifai.com/how-to-scale-your-gpu-cloud-infrastructure-with-kubernetes</a></p><p>The full GPU proposal, including the existing flag and future steps can be found here:</p><p><a href="https://github.com/kubernetes/community/blob/master/contributors/design-proposals/gpu-support.md">https://github.com/kubernetes/community/blob/master/contributors/design-proposals/gpu-support.md</a></p><h3>Pod Spec</h3><p>With the device flags added by the experimental GPU flag the final step requires adding the necessary volumes to the pod spec. A sample pod spec is provided below:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/8b5a3131123caae80e6d75acb7f863f4/href">https://medium.com/media/8b5a3131123caae80e6d75acb7f863f4/href</a></iframe><p>If set up correctly the output should match the output from running the nvidia-docker container output at the beginning:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/74ea3bbe84e4f164815b9e55c650e789/href">https://medium.com/media/74ea3bbe84e4f164815b9e55c650e789/href</a></iframe><h3>Conclusion</h3><p>Hopefully this guide helps someone wade through these undocumented features to make use of GPUs in their cluster.</p><p>Follow me on <a href="https://twitter.com/jimmfleming">Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ca00d0e67539" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/running-tensorflow-on-kubernetes-ca00d0e67539">Running TensorFlow (with GPU) on Kubernetes</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Notes on Hierarchical Multiscale Recurrent Neural Networks]]></title>
            <link>https://medium.com/jim-fleming/notes-on-hierarchical-multiscale-recurrent-neural-networks-7362532f3b64?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/7362532f3b64</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Fri, 24 Mar 2017 22:49:53 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:44.386Z</atom:updated>
            <content:encoded><![CDATA[<h4>Introduces a novel update mechanism to learn latent hierarchical representations from data.</h4><h4>Introduction</h4><p>State-of-the-art on <a href="https://www.cis.upenn.edu/~treebank/">PTB</a>, <a href="http://www.mattmahoney.net/dc/textdata">Text8</a> and <a href="http://www.fki.inf.unibe.ch/databases/iam-on-line-handwriting-database">IAM On-Line Handwriting DB</a>. Tied for SotA on <a href="http://prize.hutter1.net/">Hutter Wikipedia</a>.</p><p>Lots of prior work with hierarchy (<a href="https://papers.nips.cc/paper/1102-hierarchical-recurrent-neural-networks-for-long-term-dependencies.pdf">hierarchical RNN</a> / <a href="http://www.cs.toronto.edu/~graves/ijcai_2007.pdf">stacked RNN</a>) and multi-scale (<a href="http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf">LSTM</a>, <a href="https://arxiv.org/abs/1402.3511">clockwork RNN</a>) but they all rely on pre-defined boundaries, pre-defined scales, or soft non-hierarchical boundaries.</p><p>Two benefits of discrete hierarchical representations:</p><ul><li>Helps vanishing gradient since information is held at higher levels for more steps.</li><li>More computationally efficient in the discrete case since higher layers update less frequently.</li></ul><h4>Model</h4><p>Uses parameterized binary boundary detectors at each layer. Avoids “soft” gating which leads to “curse of updating every timestep”.</p><p>Boundary detectors determine operations for modifying RNN state: <strong><em>COPY</em></strong>, <strong><em>FLUSH</em></strong>, <strong><em>UPDATE</em></strong>:</p><ul><li><strong><em>UPDATE</em></strong>: similar to LSTM but sparse, according to boundary detector.</li><li><strong><em>COPY</em></strong>: copies cell and hidden states from the previous timestep to the current timestep. Similar to <a href="http://arxiv.org/abs/1606.01305">Zoneout</a> (recurrent generalization of stochastic depth) which uses Bernoulli distribution to copy hidden state across timesteps.</li><li><strong><em>FLUSH</em></strong>: sends summary to next layer and re-initializes current layer’s state.</li></ul><p>Discrete (binary) decisions are difficult to optimize due to non-smooth gradients. Uses <a href="https://arxiv.org/abs/1308.3432">straight-through estimator</a> (as an alternative to <a href="http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf">REINFORCE</a>) to learn discrete variables. The simplest variant uses a step function on the forward pass and a hard sigmoid on backward pass for gradient estimation.</p><p>The slope annealing trick on the hard sigmoid compensates for the biased estimator but minimal improvement from experimental results. Also introduces more hyperparameters.</p><p>Implemented as a variant of LSTM (HM-LSTM) with custom operations above. No experimental results for variant with regular RNN (HM-RNN).</p><h4>Results</h4><p>Learns useful boundary detectors, visualized in the paper.</p><p>Latent representations possibly imperfect, or at least, not human: spaces, tree breaks, some bigrams, some prefix delineation (“dur”: during, duration, durable).</p><p>Only results on character-level compression tasks and handwriting, no explicit NLP tasks, e.g. machine translation, question-answering, or named entity recognition.</p><h3>Conclusion</h3><p>Thanks to <a href="http://www.meetup.com/deep-learning-sf/events/234124741/">those who attended</a> the reading group session for their discussion of this paper! Lots of good insights from everyone.</p><p>Follow me <a href="https://twitter.com/jimmfleming">on Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7362532f3b64" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/notes-on-hierarchical-multiscale-recurrent-neural-networks-7362532f3b64">Notes on Hierarchical Multiscale Recurrent Neural Networks</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Notes on the Numerai ML Competition]]></title>
            <link>https://medium.com/jim-fleming/notes-on-the-numerai-ml-competition-14e3d42c19f3?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/14e3d42c19f3</guid>
            <category><![CDATA[data-science]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Mon, 19 Sep 2016 16:24:44 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:40.834Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eLKP8uRAF44zEWzEL8t4fg.jpeg" /><figcaption>Photo courtesy <a href="https://unsplash.com/">Unsplash</a></figcaption></figure><p>Last week I spent some time diving into the Numerai machine learning competition. Below are my notes on the competition: things I tried, what worked and what didn’t. First an introduction to Numerai and the competition…</p><p>Numerai is a hedge fund which uses the competition to source predictions for a large ensemble that they use internally to make trades. Another detail that makes the competition unique is that the provided data has been <a href="https://en.wikipedia.org/wiki/Homomorphic_encryption">encrypted</a> in a way that still allows it to be used for predictions. Each week, Numerai releases a new dataset and the competition resets. After briefly controlling 1st-2nd place in both score and originality, by the end of the week I was still “controlling capital” with a log loss of 0.68714. In all this earned about $8.17 USD worth of Bitcoin.</p><p>Here’s a sample of the training data:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6dae8db6b89a14004443bfb44201e62d/href">https://medium.com/media/6dae8db6b89a14004443bfb44201e62d/href</a></iframe><h3>Validation</h3><p>My first step in the competition was to generate a validation set so that I could run models locally and get a sense for how the models would do on the leaderboard. Using a simple stratified split that maintains the target distribution turned out not to be representative of the leaderboard so I turned to “adversarial validation”. This clever idea was introduced by <a href="https://twitter.com/fastml">@fastml</a> in a blog post <a href="http://fastml.com/adversarial-validation-part-one/">here</a>. Basically:</p><ol><li>Train a classifier to identify whether data comes from the train or test set.</li><li>Sort the training data by it’s probability of being in the test set.</li><li>Select the training data most similar to the test data as your validation set.</li></ol><p>This was much more representative with a validation loss corresponding to within ~0.001 log loss on the public leaderboard. Interestingly, the only reason this works is that the test data is dissimilar from much of the training data which violates <a href="https://en.wikipedia.org/wiki/Independent_and_identically_distributed_random_variables">IID</a>.</p><h3>Baseline Model</h3><p>Now that I had a good validation set I wanted to get a baseline model trained, validated and uploaded. As a starting point I used logistic regression with default settings and no feature engineering. This gets about 0.69290 validation loss and 0.69162 on the public leaderboard. It’s not great but now I know what a simple model can do. For comparison, first place is currently 0.64669, so the baseline is only about 6.5% off. This means any improvements are going to be really small. We can push this a little further with L2 regularization at 1e-2 which gets to 0.69286 (-0.006% from baseline).</p><h3>Neural Networks</h3><p>I took a quick divergence into neural networks before beginning feature engineering. Ideally, the networks would learn their own features with enough data, unfortunately none of the architectures I tried had much improvement over simple logistic regression. Additionally, deep neural networks can have far more learned parameters than logistic regression so I needed to regularize the parameters heavily with L2 and <a href="https://arxiv.org/abs/1502.03167">batch normalization</a> (which can act as a regularizer per the paper). Dropout sometimes helped too depending on the architecture.</p><p>One interesting architecture that worked okay was using a single very wide hidden layer (2048 parameters) with very high dropout (0.9) and then leaving it’s initialized parameters fixed during training. This creates an ensemble of many random discriminators. While this worked pretty well (with a logloss around 0.689) the model hurt the final ensemble so it was removed. In the end neural networks did not yield enough improvement to continue their use here and would still rely on feature engineering which defeated my intentions.</p><h3>Data Analysis &amp; Feature Engineering</h3><p>Now I need to dig into the data, starting with a simple plot of each of the feature distributions:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/709/1*wb7H05Ajdt9RoKauoe012A@2x.png" /><figcaption>Violin plot of the distributions for each feature.</figcaption></figure><p>The distributions are pretty similar for each feature and target. How about correlations between features:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/691/1*U5q21kgjhCTt3vrjtMSWPQ@2x.png" /><figcaption>Correlation matrix showing feature interactions.</figcaption></figure><p>Okay, so many of the features are strongly correlated. We can make use of this in our model by including polynomial features (e.g. <em>PolynomialFeatures(degree=2)</em> from scikit-learn). Adding these brings our validation loss down to 0.69256 (-0.05% from baseline).</p><p>Now dimensionality reduction. I take the features and run <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">principal component analysis</a> (a linear method) to reduce the original features down to two dimensions for visualization:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/708/1*rznNVMcz1OyCy5KeBtw76w@2x.png" /><figcaption>PCA dimensionality reduction over original features.</figcaption></figure><p>This does not contain much useful information. How about with the polynomial features:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/698/1*DJTfZ763--x_kyyXW52V2w@2x.png" /><figcaption>PCA dimensionality reduction over polynomial features.</figcaption></figure><p>The polynomial PCA produces a slightly better result by pulling many of the target “1” values towards the edges and many of the target “0” values towards the center. Still not great so I opted to omit PCA for now.</p><p>Instead I’ll use a fancier dimensionality reduction method called <a href="https://lvdmaaten.github.io/tsne/">t-SNE</a> or “t-Distributed Stochastic Neighbor Embedding”. t-SNE is often used for visualization of high-dimensional data but it has a useful property not found in PCA: t-SNE is non-linear and works on the probability of two points being selected as neighbors.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/708/1*sAa81iio2bkFgRKIIhJNLA@2x.png" /><figcaption><a href="https://lvdmaaten.github.io/tsne/">t-SNE</a> embedding over the features; clusters colored using <a href="http://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html">DBSCAN</a>.</figcaption></figure><p>Here t-SNE captured really good features for visualization (e.g. local clusters), and incidentally for classification too! I add in these 2D features to the model to get the best validation loss so far: 0.68947 (-0.5% from baseline). I suspect the reason this helps is that there are actually many local features that logistic regression cannot pull out but are useful in classifying for the target. By running an unsupervised method specifically designed to align the data by pairwise similarities the model is able to use that information.</p><p>Since t-SNE is stochastic, multiple runs will produce different embeddings. To exploit this I’ll run t-SNE 5 or 6 times at different perplexities and dimensions (2D and 3D) then incorporate these extra features. Now the validation loss is 0.68839 (-0.65% from baseline).</p><p>Note, some implementations of t-SNE do not work correctly in 3D. Plot them to make sure you’re seeing a blob, not a pyramid shape.</p><h4>Additional Embeddings</h4><p>Since t-SNE worked so well, I implemented several other embedding methods including autoencoders, denoising autoencoders, and generative adversarial networks. The autoencoders learned excellent reconstructions with &gt;95% accuracy, even with noise but their learned embeddings did not improve the model. The GAN, including semi-supervised variant, did not outperform logistic regression. I also briefly experimented with <a href="https://en.wikipedia.org/wiki/Kernel_principal_component_analysis">kernel PCA</a> and <a href="https://en.wikipedia.org/wiki/Isomap">isomaps</a> (also non-linear dimensionality reduction methods). Both improved the validation loss slightly but took significantly longer to run, reducing my ability to iterate quickly, so they were ultimately discarded. I never tried LargeVis or parametric t-SNE but they might be worth exploring. Parametric t-SNE would be particularly interesting since it allows fitting on a test holdout, rather than learning an embedding of all of the samples at once.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/698/1*sfIwEb_GaAijXOZUMWtZmw@2x.png" /><figcaption>Isomap embedding of the original features.</figcaption></figure><h3>Pairwise Interactions</h3><p>One of the models that made it into the final ensemble was to explicitly model pairwise interactions. Basically, given features from two samples predict which of the two had a greater probability of being classified as “1”. This provides significantly more data since you’re modeling interactions between samples, rather than individual samples. It also hopefully learns useful features for classifying by the intended target. To make predictions for the target classification I take the average of each sample’s prediction against all other samples. (It’s probably worth exploring more sophisticated averaging techniques.) This performed similarly to logistic regression and produced different enough results to add to the ensemble.</p><h3>Hyperparameter Search</h3><p>Now that we have useful features and a few models that perform well I wanted to run a hyperparameter search and see if it could outperform the existing models. Since scikit-learn’s <em>GridSearchCV</em> and <em>RandomSearchCV</em> only explore hyperparameters, not entire architectures, I opted to use <a href="https://github.com/rhiever/tpot">tpot</a> which searches over both. This discovered that using randomized PCA would outperform PCA and that L1 regularization (sparsity) slightly outperformed L2 regularization (smoothing), especially when paired with random PCA. Unfortunately neither of the discovered interactions made it into the final ensemble: hand engineering won out.</p><h3>Ensemble</h3><p>With a few models complete it’s time to ensemble their predictions. There are a number of methods for doing this covered <a href="http://mlwave.com/kaggle-ensembling-guide/">here</a> but I opted for a simple average using the <a href="https://en.wikipedia.org/wiki/Geometric_mean">geometric mean</a>.</p><p>The final ensemble consisted of 4 models: logistic regression, gradient boosted trees, factorization machines and the pairwise model described above. I used the same features for each model, consisting of the original 21 features and five runs of T-SNE in 2D at perplexities of 5.0, 10.0, 15.0, 30.0, and 50.0 and one run of T-SNE in 3D at a perplexity of 30 (I only included a single run because it takes significantly longer in 3D). These features were combined with polynomial interactions and run through the models to produce the final log loss of 0.68714 on the leaderboard.</p><h3>Conclusion</h3><p>Overall it was an interesting competition—very different from something like Kaggle. I especially enjoyed experimenting with the encrypted data which was a first for me. While the payouts and “originality” bonuses are interesting mechanics, it’s often better to look at the rewards as points, more than currency, as this made the competition overall more fun. On the other hand, now I have my first bitcoin… :)</p><blockquote><strong>Code:</strong> <a href="https://github.com/jimfleming/numerai">https://github.com/jimfleming/numerai</a></blockquote><p>Follow me <a href="https://twitter.com/jimmfleming">on Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=14e3d42c19f3" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/notes-on-the-numerai-ml-competition-14e3d42c19f3">Notes on the Numerai ML Competition</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Before AlphaGo there was TD-Gammon]]></title>
            <link>https://medium.com/jim-fleming/before-alphago-there-was-td-gammon-13deff866197?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/13deff866197</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[neural-networks]]></category>
            <category><![CDATA[deep-learning]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Mon, 04 Apr 2016 15:35:05 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:37.556Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*Iecxhb2D7v99kWS9KGjaHQ@2x.jpeg" /><figcaption>Théodore Rombouts — The Backgammon Players</figcaption></figure><p><strong>TL;DR</strong> Introduces temporal difference learning, TD-Lambda / TD-Gammon, and eligibility traces. Check out the <a href="https://github.com/fomorians/td-gammon">Github repo</a> for an implementation of TD-Gammon with TensorFlow.</p><p>A few weeks ago <a href="https://en.wikipedia.org/wiki/AlphaGo#Match_against_Lee_Sedol">AlphaGo won</a> a historic tournament playing the <a href="https://en.wikipedia.org/wiki/Go_%28game%29">game of Go</a> against Lee Sedol, one of the top Go players in the world. Many people have compared AlphaGo to <a href="https://en.wikipedia.org/wiki/Deep_Blue_versus_Garry_Kasparov">DeepBlue</a>, which won a series of famous chess matches against Gary Kasparov, but a different comparison may be made for the <a href="https://en.wikipedia.org/wiki/Backgammon">game of backgammon</a>.</p><p>Before DeepMind tackled playing <a href="https://deepmind.com/dqn.html">Atari</a> games or built <a href="https://deepmind.com/alpha-go.html">AlphaGo</a> there was <a href="http://scholar.googleusercontent.com/scholar?q=cache:JKitnBbxekUJ:scholar.google.com/+td-gammon&amp;hl=en&amp;as_sdt=0,5">TD-Gammon</a>, the first algorithm to reach an expert level of play in backgammon. Gerald Tesauro published his paper in 1992 describing TD-Gammon as a neural network trained with reinforcement learning. It is referenced in both Atari and AlphaGo research papers and helped set the groundwork for many of the advancements made in the last few years.</p><h4>Temporal-Difference Learning</h4><p>TD-Gammon consists of a simple three-layer neural network trained using a reinforcement learning technique known as <a href="https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node74.html">TD-Lambda</a> or <a href="https://webdocs.cs.ualberta.ca/~sutton/book/ebook/node60.html">temporal-difference learning</a> with a trace decay parameter lambda (<strong><em>λ</em></strong>). The neural network acts as a “value function” which predicts the value, or <em>reward</em>, of a particular state of the game for the current player.</p><p>During training, the neural network iterates over all possible moves for the current player and evaluates each valid move and the move with the highest value is selected. Because the network evaluates moves for both players, it’s effectively playing against itself. Using TD-Lambda we want to improve the neural network so that it can reasonably predict the most likely outcome of a game from a given board state. It does this by learning to reduce the difference between the value for the next state and the current state.</p><p>Let’s start with a loss function, which describes how well the network is performing for any state at time <strong><em>t</em></strong>:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/481/1*TocSaBh93EXb5h8VTlLitA@2x.png" /><figcaption>Loss function: mean squared error of the difference between our neural network’s output for the next state and the output for the current state. The variable <strong><em>α</em></strong> is a small scalar to control the learning rate.</figcaption></figure><p>Here we want to minimize the mean squared error of the difference between the next prediction and the current prediction. Basically, we want our predictions about the present to match our predictions about the future. This in itself isn’t very useful until we know how the game ends so for the final step of the game we modify the loss function:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/355/1*1VuNNmi7bmQV8PnLda9ocQ@2x.png" /><figcaption>Same as above, but <strong><em>z</em></strong> represents the true outcome of the game.</figcaption></figure><p>Where <strong><em>z</em></strong> is the actual outcome of the game. Together these two loss functions work <em>okay</em> but the network will converge slowly and never reach a strong level of play.</p><h4>Temporal Credit Assignment</h4><p>To make our predictions more useful we need to solve the problem of temporal credit assignment. Basically, which actions did the player take in the past that resulted in the desired outcome in the future. Right now the loss only incorporates two consecutive steps and we want to stretch that out.</p><p>With the loss function above our parameter updates will look something like this:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/343/1*XISfhAm294yZvTe5VLZnaQ@2x.png" /><figcaption>Parameter updates for the loss function <strong><em>L</em></strong>.</figcaption></figure><p>Where <strong><em>θ</em></strong> is the network’s parameters (weights), <strong><em>α</em> </strong>is the learning rate and <strong><em>δ</em></strong> is the difference we defined above:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/610/1*D5fCz19fsfEaJFt2jTgJRg@2x.png" /><figcaption>Definition of <strong><em>δ</em></strong> for intermediate and end-game states where <strong>f</strong> is the final time-step of the game.</figcaption></figure><p>Now rather than include a single gradient we want to include all past gradients while paying more attention to the most recent. This is accomplished keeping a history of gradients then decaying each by increasing amounts of <strong><em>λ</em></strong> that reflect how old the gradient has become:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/506/1*wLAY5oDslR4ZQYQbxJJBOQ@2x.png" /><figcaption>The full definition for TD-Lambda<strong><em> </em></strong>includes a sum over all previous gradients, decayed by <strong><em>λ</em></strong>.</figcaption></figure><h4>Eligibility Traces</h4><p>Keeping a running history of gradients can become memory intensive depending on the size of the network and the length of the game. An elegant solution to this problem is to use something called an “eligibility trace”. Eligibility traces replace the gradient sum of the parameter update with a single moving gradient. The eligibility trace is defined as:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/308/1*lejMKiPTHrU7lK1u-LrdfA@2x.png" /><figcaption>Definition of an eligibility trace decayed by <strong><em>λ</em></strong>.</figcaption></figure><p>Basically, we decay our eligibility trace by <strong><em>λ</em></strong> then add the new gradient. With this, our parameter update becomes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/305/1*K_rZi2WOF5DFkBouEbzpmg@2x.png" /><figcaption>New parameter update for TD-Lambda, using an eligibility trace in place of the gradient.</figcaption></figure><p>This effectively allows our parameter updates to take into account decisions made in the past. Now when we backpropagate the end game state, we take into account the gradients from earlier states in the game while we avoid keeping a complete history of gradients.</p><h4>Results</h4><p>At the start of training, each game can take hundreds or thousands of turns to complete, effectively taking a random strategy. As the network learns, games require only around 50–100 turns and will outperform an opponent making random moves after around 1000 games (about an hour of training).</p><p>The average loss for a game can never really reach zero because there’s more uncertainty at the beginning of a game but it can be useful to visualize convergence:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/868/1*Cx1AZa_Vbu_EefKlkarVdQ@2x.png" /><figcaption>Average loss for each of 5,000 games.</figcaption></figure><h4>Conclusion</h4><p>Hopefully, this post shed some light on a small part of the history of recent deep reinforcement learning papers and the temporal-difference learning algorithm. If you’re interested in learning more about reinforcement learning definitely check out <a href="https://webdocs.cs.ualberta.ca/~sutton/book/ebook/">Richard Sutton’s book</a> on the topic. You can also <a href="https://github.com/fomorians/td-gammon">download the code</a> for this implementation of TD-Gammon and play against the pre-trained network included in the repo.</p><p>Follow me <a href="https://twitter.com/jimmfleming">on Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=13deff866197" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/before-alphago-there-was-td-gammon-13deff866197">Before AlphaGo there was TD-Gammon</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[An LSTM Odyssey]]></title>
            <link>https://medium.com/jim-fleming/implementing-lstm-a-search-space-odyssey-7d50c3bacf93?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/7d50c3bacf93</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Tue, 26 Jan 2016 16:31:19 GMT</pubDate>
            <atom:updated>2019-05-23T21:30:36.098Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MUVK7IiQ3E_FAZlkTJxB1g@2x.jpeg" /><figcaption>Photo from <a href="http://unsplash.com/">Unsplash</a></figcaption></figure><p>This week I read <a href="http://arxiv.org/abs/1503.04069">LSTM: A Search Space Odyssey</a>. It’s an excellent paper that systematically evaluates the different internal mechanisms of an <a href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">LSTM</a> (long short-term memory) block by disabling each mechanism in turn and comparing their performance. We’re going to implement each of the variants in <a href="http://tensorflow.org/">TensorFlow</a> and evaluate their performance on the <a href="https://www.cis.upenn.edu/~treebank/">Penn Tree Bank</a> (PTB) dataset. This will obviously not be as thorough as the original paper but it allows us to see, and try out, the impact of each variant for ourselves.</p><p><strong>TL;DR</strong> Check out the <a href="https://github.com/fomorians/lstm-odyssey">Github repo</a> for results and variant definitions.</p><h4>Vanilla LSTM</h4><p>We’ll start with a setup similar to TensorFlow’s <a href="https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html">RNN tutorial</a>. The primary difference is that we’re going to use a very simple re-implementation for the LSTM cell defined as follows:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/416/1*2laRoPsYEvMFTBJ0TKlSfg@2x.png" /><figcaption>LSTM equations from section 2.</figcaption></figure><p>This corresponds to the “vanilla” LSTM from the paper. Each equation defines a particular component of the block: block input (<strong>z</strong>), input gate (<strong>i</strong>), forget gate (<strong>f</strong>), cell state (<strong>c</strong>), output gate (<strong>o</strong>) and block output (<strong>y</strong>). Both <em>g</em> and <em>h</em> represent the hyperbolic tangent function and sigma represents the sigmoid activation function. The circle dot represents element-wise multiplication.</p><p>Here’s the same thing in code:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/4ab196d3a391dbb882f9fce7e3507248/href">https://medium.com/media/4ab196d3a391dbb882f9fce7e3507248/href</a></iframe><p>Be sure to check out the <a href="https://github.com/jimfleming/lstm_search/blob/master/variants/vanilla.py#L4">full source</a> for the rest of the cell definition. Mostly we create a new class inheriting from <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py#L38"><strong><em>RNNCell</em></strong></a> and use the above code as the body of <strong><em>__call__</em></strong>. The nice part about this setup is that we can utilize <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py#L595"><strong><em>MultiRNNCell</em></strong></a> to stack the LSTMs into multiple layers.</p><p>Notice that we initialize all of our parameters using <a href="https://www.tensorflow.org/versions/master/api_docs/python/state_ops.html#get_variable"><strong><em>get_variable</em></strong></a>. This is necessary so that we can reuse these variables for each time step rather than creating new parameters at each step. Also, all parameters are transposed from the paper’s definitions to avoid additional graph operations.</p><p>Then we define each equation as operations in the graph. Many of the operations have reversed inputs from the equations so that the matrix multiplications produce the correct dimensionality. Other than these details we’re directly translating the equations.</p><p>Note that from a performance perspective, this is a naïve implementation. If you look at the source for TensorFlow’s <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell.py#L219">LSTMCell</a> you’ll see that all of the cell inputs and states are concatenated together before doing any matrix multiplication. This is to improve performance, however, since we’re more interested in taking the LSTM apart, we’ll keep things simple.</p><p>Running this vanilla LSTM on the included notebook we obtain a test perplexity (<strong>e^cost</strong>) of less than 100. So far so good. This will serve as our baseline to compare to the other variants. Below is the cost (average negative log probability of the target words) on the validation set after each epoch:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/489/1*f5J_8yT5FMKstZsykjY0qA@2x.png" /><figcaption>Vanilla cost on the validation set</figcaption></figure><h4>Variants</h4><p>The most helpful bits for implementing each of the variants can be found in appendix A3 of the paper. The gate omission variants such as <strong>no input gate (NIG)</strong>, <strong>no forget gate (NFG)</strong>, and <strong>no output gate (NOG) </strong>simply set their respective gates to <strong><em>1</em></strong> (be sure to use floats, not integers, here):</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/59/1*nke_UOhqBs2pN3lXkTipIw@2x.png" /><figcaption><strong>NIG</strong> sets <strong>i</strong> to <strong>1</strong>, <strong>NFG</strong> sets <strong>f</strong> to <strong>1</strong> and <strong>NOG</strong> sets <strong>o</strong> to <strong>1</strong>.</figcaption></figure><p>The <strong>no input activation function (NIAF) </strong>and <strong>no output activation function (NOAF) </strong>variants remove their input or output activation functions, respectively:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/261/1*i0_pFMGoMiRup3qUH4CTIw@2x.png" /><figcaption><strong>NIAF</strong> removes the <strong><em>g(x)</em></strong><em> activation function, while </em><strong><em>NOAF</em></strong><em> removes the </em><strong><em>h(x)</em></strong><em> activation function.</em></figcaption></figure><p>The <strong>no peepholes (NP) </strong>variant removes peepholes from all three gates:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/295/1*hIygvvg-dsJaTexVgXk61Q@2x.png" /><figcaption>For all three gates remove the peepholes.</figcaption></figure><p>The <strong>coupled input-forget gate (CIFG) </strong>variant sets the forget gate like so:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/96/1*8uhjNPQh1xp08WABjp2bvA@2x.png" /></figure><p>The final variant, <strong>full gate recurrence (FGR)</strong>, is the most complex, essentially allowing each gate’s previous state to interact with each gate’s next state:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/728/1*srLgjNbYD_rRdSTkmakgUA@2x.png" /><figcaption>Recurrent connections are added for each of the gates.</figcaption></figure><p>In many of the variants, we can remove parameters no longer needed to compute the cell. The <strong>FGR</strong> variant, however, adds significantly more parameters (9 additional square matrices) which also increases training time.</p><p>To implement each, we’ll simply duplicate our vanilla LSTM cell implementation and make the necessary modifications for the variant. There are too many to show here but you can view the full source for each variant on <a href="https://github.com/jimfleming/lstm_search/tree/master/variants">Github</a>. To train each, we’ll use the same hyperparameters from the vanilla LSTM trial. This probably isn’t fair and a more thorough analysis (as performed in the paper) would try to find the best hyperparameters for each variant.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*_Y7dgUK81plW_MhcFeyC5g@2x.jpeg" /><figcaption>Training progress of model variants.</figcaption></figure><h4>Results</h4><p>The <strong>NFG</strong> and <strong>NOG</strong> variants fail to converge to anything useful while the <strong>NIAF</strong> variant diverges significantly after around the 8th epoch. (This divergence could probably be fixed with learning rate decay which I omitted for simplicity.)</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/474/1*uN8-BcWT4HYv9LgI3xQZ6w@2x.png" /><figcaption>Diverging variants</figcaption></figure><p>In contrast, the <strong>NIG</strong>, <strong>CIFG</strong>, <strong>NP</strong> and <strong>FGR </strong>variants all converge. The <strong>NIG</strong> and <strong>FGR</strong> variants do not produce great results while the <strong>NP</strong> and <strong>CIFG</strong> variants perform similarly to the vanilla LSTM.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/477/1*OukAVY37D7-Suk77jHCOoA@2x.png" /><figcaption>Converging variants</figcaption></figure><p>Finally the <strong>NOAF</strong> variant. Its poor performance is likely due to the lack of clamping from the output activation function so its cost explodes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/497/1*grdseU-_OxiNxWwmqX0AWA@2x.png" /></figure><p>Here are the test perplexities for each variant:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/492/1*FigArJdlO6B01lmscGo4RQ@2x.png" /></figure><h4>Conclusion</h4><p>Overall it’s been fun dissecting the LSTM. Feel free to <a href="https://github.com/fomorians/lstm-odyssey">try out the code</a> yourself and if you’re interested in taking this further I recommend running comparisons with GRUs, looking at <a href="http://fanova.readthedocs.org/en/latest/">fANOVA</a> or extending what’s here with more thorough analysis.</p><p>Follow me <a href="https://twitter.com/jimmfleming">on Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=7d50c3bacf93" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/implementing-lstm-a-search-space-odyssey-7d50c3bacf93">An LSTM Odyssey</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Highway Networks with TensorFlow]]></title>
            <link>https://medium.com/jim-fleming/highway-networks-with-tensorflow-1e6dfa667daa?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/1e6dfa667daa</guid>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[machine-learning]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Tue, 29 Dec 2015 17:58:31 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:29.364Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/640/1*Z1DlaJC54ZrgIIyNijtjNQ@2x.jpeg" /></figure><p>This week I implemented <a href="http://arxiv.org/abs/1507.06228">highway networks</a> to get an intuition for how they work. Highway networks, inspired by <a href="https://en.wikipedia.org/wiki/Long_short-term_memory">LSTMs</a>, are a method of constructing networks with hundreds, even thousands, of layers. Let’s see how we construct them using TensorFlow.</p><p><strong>TL;DR </strong><a href="https://github.com/fomorians/highway-fcn">Fully-connected highway repo</a> and <a href="https://github.com/fomorians/highway-cnn">convolutional highway repo</a>.</p><h4>Implementation</h4><p>For comparison, let’s start with a standard fully-connected (or “dense”) layer. We need a weight matrix and a bias vector then we’ll compute the following for the layer output:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/217/1*ubW7KAjiyQN9MQKDeBn9og.png" /><figcaption>Computing the output of a dense layer. (Bias omitted for simplicity and to match the paper.)</figcaption></figure><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6a7f893e3428ecdeae5ec8558f617d16/href">https://medium.com/media/6a7f893e3428ecdeae5ec8558f617d16/href</a></iframe><p>Here’s what a dense layer looks like as a graph in TensorBoard:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*UUvCzRKpo0_V1bHuF6T2pQ.jpeg" /><figcaption>A dense layer in TensorBoard.</figcaption></figure><p>For the highway layer what we want are two “gates” that control the flow of information. The “transform” gate controls how much of the activation we pass through and the “carry” gate controls how much of the unmodified input we pass through. Otherwise, the layer largely resembles a dense layer with a few additions:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/622/1*R2Efhfe3zChnRcavAspQbA.png" /><figcaption>Computing the highway layer output. (Bias omitted for simplicity and to match the paper.)</figcaption></figure><ul><li>An extra set of weights and biases to be learned for the gates.</li><li>The transform gate operation (<strong><em>T</em></strong>).</li><li>The carry gate operation (<strong><em>C</em></strong> or just <strong><em>1 - T</em></strong>).</li><li>The layer output (<strong><em>y</em></strong>) with the new gates.</li></ul><p>What happens is that when the transform gate is <strong><em>1</em></strong>, we pass through our activation (<strong><em>H</em></strong>) and suppress the carry gate (since it will be <strong><em>0</em></strong>). When the carry gate is <strong><em>1</em></strong>, we pass through the unmodified input (<strong><em>x</em></strong>), while the activation is suppressed.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/69b56f2c29f3742ffecb589ca6e81cfd/href">https://medium.com/media/69b56f2c29f3742ffecb589ca6e81cfd/href</a></iframe><p>Here’s what the highway layer graph looks in TensorBoard:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*_eLXwz5E-LCXTkyPtxm8GQ.jpeg" /><figcaption>A highway layer in TensorBoard.</figcaption></figure><p>Using a highway layer in a network is also straightforward. One detail to keep in mind is that consecutive highway layers must be the same size but you can use fully-connected layers to change dimensionality. This becomes especially complicated in convolutional layers where each layer can change the output dimensions. We can use padding (<strong><em>‘SAME’</em></strong>) to maintain each layers dimensionality.</p><p>Otherwise, by simply using hyperparameters from the TensorFlow docs (i.e. no hyperparameter search) the fully-connected highway network performed much better than a fully-connected network. Using MNIST as my simple trial:</p><ul><li>20 fully-connected layers fail to achieve more than 15% accuracy.</li><li>18 highway layers (with two fully-connected layers to transform the input and output) achieves ~95% accuracy. Which is also much better than a shallow network which only reaches 91%.</li></ul><p>Now that we have a highway network, I wanted to answer a few questions that came up for me while reading the paper. For instance, how deep will the network converge? The paper briefly mentions 1000 layers:</p><blockquote>In pilot experiments, SGD did not stall for networks with more than 1000 layers. (2.2)</blockquote><p><strong>Can we train with 1000 layers on MNIST?</strong></p><p>Yes, also reaching around 95% accuracy. Try it out with a carry bias around <strong><em>-20.0</em></strong> for MNIST (from the paper the network will only utilize ~15 layers anyway). The network can probably even go deeper since the it’s just learning to carry the last 980 layers or so. We can’t do much useful at or past 1000 layers so that seems sufficient for now.</p><p><strong>What happens if you set very low or very high carry biases?</strong></p><p>In either extreme the network simply fails to converge in a reasonable amount of time. In the case of low biases (more positive), the network starts as if the carry gates aren’t present at all. In the case of high biases (more negative), we’re putting more emphasis on carrying and the network can take a long time to overcome that. Otherwise, the biases don’t seem to need to be exact, at least on this simple example. When in doubt start with high biases (more negative) since it’s easier to learn to overcome carrying than without carry gates (which is just a plain network).</p><h4>Conclusion</h4><p>Overall I was happy with how easy highway networks were to implement. They’re fully differentiable with only a single additional hyperparameter for the initial carry bias. One downside is that highway layers do require additional parameters for the transform weights and biases. However, since we can go deeper, the layers do not need to be as wide which can compensate.</p><p>Here’s are the complete notebooks if you want to play with the code: <a href="https://github.com/fomorians/highway-fcn">fully-connected highway repo</a> and <a href="https://github.com/fomorians/highway-cnn">convolutional highway repo</a>.</p><p>Follow me <a href="https://twitter.com/jimmfleming">on Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1e6dfa667daa" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/highway-networks-with-tensorflow-1e6dfa667daa">Highway Networks with TensorFlow</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Loading TensorFlow graphs from Node.js]]></title>
            <link>https://medium.com/jim-fleming/loading-tensorflow-graphs-via-host-languages-be10fd81876f?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/be10fd81876f</guid>
            <category><![CDATA[tensorflow]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Fri, 04 Dec 2015 18:04:36 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:26.490Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ngvDss8Q0iIHREPzsjcIKQ@2x.png" /></figure><p><strong><em>Check out the related post: </em></strong><a href="https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.z4qeoyfb0"><strong><em>Loading a TensorFlow graph with the C++ API</em></strong></a><strong><em>.</em></strong></p><p>Even though the full C API for TensorFlow is not yet available, we can still use it load TensorFlow graphs and evaluate them from other languages. This is incredibly useful for embedding pre-trained models in other applications. Embedding is one of the most interesting use cases for TensorFlow as it cannot be accomplished as easily with Theano.</p><p>Note that while all of the examples here will use Node.js the steps are nearly identical in any language with C FFI support (e.g. Rust, Go, C#, etc.)</p><h4>Requirements</h4><ul><li><a href="http://bazel.io/docs/install.html">Install Bazel</a>: Google’s build tool used to compile things for TensorFlow.</li><li>Clone the <a href="https://github.com/tensorflow/tensorflow.git">TensorFlow repo</a>.</li></ul><pre>git clone --recursive <a href="https://github.com/tensorflow/tensorflow">https://github.com/tensorflow/tensorflow</a></pre><h4>Compiling a shared library</h4><p>We’ll start by compiling a shared library from TensorFlow using Bazel.</p><p><strong>UPDATE</strong>: The following build rule for creating a shared library is now part of TensorFlow: <a href="https://github.com/tensorflow/tensorflow/pull/695">https://github.com/tensorflow/tensorflow/pull/695</a></p><ol><li>Create a new folder in the TensorFlow repo at <strong><em>tensorflow/tensorflow/libtensorflow/</em></strong>.</li><li>Inside this folder we’re going to create a new <strong><em>BUILD</em></strong> file which will contain a single call to <a href="http://bazel.io/docs/be/c-cpp.html#cc_binary"><strong><em>cc_binary</em></strong></a> with the <a href="http://bazel.io/docs/be/c-cpp.html#cc_binary.linkshared"><strong><em>linkshared</em></strong></a> option set to <strong><em>1 </em></strong>so that we get a <strong><em>.so</em></strong> from the build. <em>The name of the binary must end in .so or it will not work.</em></li></ol><p>Here’s the final directory structure:</p><ul><li>tensorflow/tensorflow/libtensorflow/</li><li>tensorflow/tensorflow/libtensorflow/BUILD</li></ul><p>Below is the complete <strong><em>BUILD</em></strong> file:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/1b4a46704d499d219f682b5dca1d164f/href">https://medium.com/media/1b4a46704d499d219f682b5dca1d164f/href</a></iframe><ol><li>From the root of the repository, run <strong><em>./configure</em></strong>.</li><li>Compile the shared library with <strong><em>bazel build :libtensorflow.so</em></strong> and locate the generated file from the repo’s root: <strong><em>bazel-bin/tensorflow/libtensorflow/libtensorflow.so</em></strong></li></ol><p>Now that we have our shared library, create a new folder for the host language. Since this is for Node.js I’ll name it <strong><em>tensorflowjs/</em></strong>. This folder can exist outside of the TensorFlow repo since we now have everything needed in the shared library. Copy <strong><em>libtensorflow.so</em></strong> into the new folder.</p><p><em>If you’re on OS X and using Node.js you’ll need to rename the shared library from </em><strong><em>libtensorflow.so</em></strong><em> to </em><strong><em>libtensorflow.dylib</em></strong><em>. TensorFlow produces an .so however the standard on OS X is dylib. The Node FFI library doesn’t look for .so, only .dylib; however it can read both formats, so we just rename it.</em></p><h4>Creating the graph</h4><p>Just like with the <a href="https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f#.fpum2o8kb">previous C++ tutorial</a> we’re going to create a minimal graph and write it to a protobuf file. (Be sure to name your variables and operations.)</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/256c36ac384aa51eebef5844af26b285/href">https://medium.com/media/256c36ac384aa51eebef5844af26b285/href</a></iframe><h4>Creating the bindings</h4><p>Now we can go through the <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/public/tensor_c_api.h">TensorFlow C API header</a>, almost line by line, and write the appropriate binding. Most of the time this is fairly direct, simply copying the signature of the function. I also created variables for many of the common types so they were more legible. For example, any structs which map to <strong><em>void*</em></strong> I declared as variables named after the struct. We can also use the <strong><em>ref-array</em></strong> Node module which provides helpers for types like <strong><em>long long*</em></strong> (essentially an array of <a href="https://en.wikipedia.org/wiki/C_data_types#Basic_types"><strong><em>long long</em></strong></a> types) so we’ll define a <strong><em>LongLongArray</em></strong> type to correspond. Otherwise, we just copy the signature:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b9f40ea7881902040d9b007e4bab1cac/href">https://medium.com/media/b9f40ea7881902040d9b007e4bab1cac/href</a></iframe><p>I also defined a few helper functions to eliminate some of the boilerplate when working with the TensorFlow interface. The first is <strong><em>TF_Destructor</em></strong>, a default tensor destructor for <strong><em>TF_NewTensor</em></strong>. This comment in the TensorFlow source makes it sound like it’s optional but it’s not:</p><blockquote><em>Clients can provide a custom deallocator function so they can pass in memory managed by something like numpy.</em></blockquote><p>Additionally, many TensorFlow functions return a <strong><em>TF_Status</em></strong> struct and checking the status can get tedious. So I defined a function called <strong><em>TF_CheckOK</em></strong> that simply checks if the status code is <strong><em>TF_OK</em></strong> using <strong><em>TF_GetCode</em></strong>. If its not, we throw an error using <strong><em>TF_Message</em></strong> to hopefully get a useful error message. (This function loosely corresponds to <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/public/status.h#L106"><strong><em>TF_CHECK_OK</em></strong></a> in the TensorFlow source.)</p><p>And finally, reading a tensor with <strong><em>TF_TensorData</em></strong> only returns a pointer but to actually read the data we need to extend the returned <strong><em>Buffer</em></strong> to the appropriate length. Creating a <strong><em>Buffer</em></strong> with the correct size is a few lines of boiler plate so I wrapped <strong><em>TF_TensorData</em></strong> to create <strong><em>TF_ReadTensorData</em></strong> which handles that boilerplate for us. Here are the helpers:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0b1969f862d7e06d2d6954743cb07521/href">https://medium.com/media/0b1969f862d7e06d2d6954743cb07521/href</a></iframe><p>Now that we’ve defined our interface the steps for loading the graph are the same as with C++:</p><ol><li>Initialize a TensorFlow session.</li><li>Read in the graph we exported above.</li><li>Add the graph to the session.</li><li>Setup our inputs and outputs.</li><li>Run the graph, populating the outputs.</li><li>Read values from the outputs.</li><li>Close the session to release resources.</li></ol><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/3d0e0c01aa5564c2a835c3fdcd972a41/href">https://medium.com/media/3d0e0c01aa5564c2a835c3fdcd972a41/href</a></iframe><p>We can load and execute TensorFlow graphs from Node.js! I’ve put the whole thing together into a repo here (you’ll need to provide <strong><em>graph.pb</em></strong> and <strong><em>libtensorflow.dylib</em></strong> since they’re kinda large): <a href="https://github.com/jimfleming/tensorflowjs">https://github.com/jimfleming/tensorflowjs</a></p><p>Follow me <a href="https://twitter.com/jimmfleming">on Twitter</a> for more posts like these. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=be10fd81876f" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/loading-tensorflow-graphs-via-host-languages-be10fd81876f">Loading TensorFlow graphs from Node.js</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Loading a TensorFlow graph with the C++ API]]></title>
            <link>https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/4caaff88463f</guid>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[deep-learning]]></category>
            <category><![CDATA[tensorflow]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Sat, 21 Nov 2015 20:07:02 GMT</pubDate>
            <atom:updated>2019-01-09T19:10:18.778Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ngvDss8Q0iIHREPzsjcIKQ@2x.png" /></figure><p><strong><em>Check out the related post: </em></strong><a href="https://medium.com/jim-fleming/loading-tensorflow-graphs-via-host-languages-be10fd81876f#.wbjr0rbnh"><strong><em>Loading TensorFlow graphs from Node.js</em></strong></a><strong><em> (using the C API).</em></strong></p><p>The current documentation around loading a graph with C++ is <a href="http://tensorflow.org/api_docs/cc/index.html">pretty</a> <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/core/public#c-api-example">sparse</a> so I spent some time setting up a barebones example. In the TensorFlow repo there are more involved examples, such as <a href="https://github.com/tensorflow/tensorflow/blob/master/tensorflow/cc/tutorials/example_trainer.cc">building a graph in C++</a>. However, the C++ API for <em>constructing</em> graphs is not as complete as the Python API. Many features (including automatic gradient computation) are not available from C++ yet. Another example in the repo demonstrates <a href="https://github.com/tensorflow/tensorflow/tree/master/tensorflow/cc/ops">defining your own operations</a> but most users will never need this. I imagine the most common use case for the C++ API is for loading pre-trained graphs to be standalone or embedded in other applications.</p><p>Be aware, there are some caveats to this approach that I’ll cover at the end.</p><h4>Requirements</h4><ul><li><a href="http://bazel.io/docs/install.html">Install Bazel</a>: Google’s build tool used to compile things for TensorFlow.</li><li>Clone the <a href="https://github.com/tensorflow/tensorflow.git">TensorFlow repo</a>. Be sure to include submodules using the recursive flag (thanks to <a href="http://twitter.com/kristophergiesing">@kristophergiesing</a> for catching this):</li></ul><pre>git clone --recursive <a href="https://github.com/tensorflow/tensorflow">https://github.com/tensorflow/tensorflow</a></pre><h4>Creating the graph</h4><p>Let’s start by creating a minimal TensorFlow graph and write it out as a protobuf file. Make sure to assign names to your inputs and operations so they’re easier to assign when we execute the graph later. The node’s do have default names but they aren’t very useful: <strong><em>Variable_1</em></strong> or <strong><em>Mul_3</em></strong>. Here’s an example created with Jupyter:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/256c36ac384aa51eebef5844af26b285/href">https://medium.com/media/256c36ac384aa51eebef5844af26b285/href</a></iframe><h4>Creating a simple binary or shared library</h4><p>Let’s create a new folder like <strong><em>tensorflow/tensorflow/&lt;my project name&gt;</em></strong> for your binary or library to live. I’m going to call the project <strong><em>loader</em></strong> since it will be loading a graph.</p><p>Inside this project folder we’ll create a new file called <strong><em>&lt;my project name&gt;.cc</em></strong> (e.g. <strong><em>loader.cc</em></strong>). If you’re curious, the .cc extension is essentially the same as .cpp but is preferred by Google’s code guidelines.</p><p>Inside <strong><em>loader.cc</em></strong> we’re going to do a few things:</p><ol><li>Initialize a TensorFlow session.</li><li>Read in the graph we exported above.</li><li>Add the graph to the session.</li><li>Setup our inputs and outputs.</li><li>Run the graph, populating the outputs.</li><li>Read values from the outputs.</li><li>Close the session to release resources.</li></ol><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/363f120546fba9e1303772b43cfde666/href">https://medium.com/media/363f120546fba9e1303772b43cfde666/href</a></iframe><p>Now we create a <a href="http://bazel.io/docs/build-ref.html#BUILD_files"><strong><em>BUILD</em></strong></a> file for our project. This tells Bazel what to compile. Inside we want to define a <a href="http://bazel.io/docs/be/c-cpp.html#cc_binary"><strong><em>cc_binary</em></strong></a> for our program. You can also use the <a href="http://bazel.io/docs/be/c-cpp.html#cc_binary.linkshared"><strong><em>linkshared</em></strong></a> option on the binary to produce a shared library or the <a href="http://bazel.io/docs/be/c-cpp.html#cc_library"><strong><em>cc_library</em></strong></a> rule if you’re going to link it using Bazel.</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/a2e2ec31a200fea40d00fa601a204d47/href">https://medium.com/media/a2e2ec31a200fea40d00fa601a204d47/href</a></iframe><p>Here’s the final directory structure:</p><ul><li>tensorflow/tensorflow/loader/</li><li>tensorflow/tensorflow/loader/loader.cc</li><li>tensorflow/tensorflow/loader/BUILD</li></ul><h4>Compile &amp; Run</h4><ul><li>From the root of the tensorflow repo, run <strong><em>./configure</em></strong></li><li>From inside the project folder call <strong><em>bazel build :loader</em></strong></li><li>From the repository root, go into <strong><em>bazel-bin/tensorflow/loader</em></strong></li><li>Copy the graph protobuf to <strong><em>models/graph.pb</em></strong></li><li>Then run <strong><em>./loader</em></strong> and check the output!</li></ul><p>You could also call <strong><em>bazel run :loader</em></strong> to run the executable directly, however the working directory for <strong><em>bazel run</em></strong> is buried in a temporary folder and <strong><em>ReadBinaryProto</em></strong> looks in the current working directory for relative paths.</p><p>And that should be all we need to do to compile and run C++ code for TensorFlow.</p><p>The last thing to cover are the caveats I mentioned:</p><ol><li>The build is huge, coming in at 103MB, even for this simple example. Much of this is for TensorFlow, CUDA support and numerous dependencies we never use. This is especially true since the C++ API doesn’t support much functionality right now, as a large portion of the TensorFlow API is Python-only. There is probably a better way of linking to TensorFlow (e.g. shared library) but I haven’t gotten it working yet.</li><li>There doesn’t seem to be a straightforward way of building this <em>outside</em> of the TensorFlow repo because of Bazel (many of the modules needed to link to are marked as internal). Again, there is probably a solution to this, it’s just non-obvious.</li></ol><h3>Conclusion</h3><p>Hopefully someone can shed some light on these last points so we can begin to embed TensorFlow graphs in applications. If you are that person, message me on <a href="https://twitter.com/jimmfleming">Twitter</a> or <a href="mailto:jim@fomoro.com">email</a>. We also do <a href="https://fomoro.com">applied research</a> to solve machine learning challenges.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=4caaff88463f" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/loading-a-tensorflow-graph-with-the-c-api-4caaff88463f">Loading a TensorFlow graph with the C++ API</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Complex types with Rust’s FFI]]></title>
            <link>https://medium.com/jim-fleming/complex-types-with-rust-s-ffi-315d14619479?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/315d14619479</guid>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[programming]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Thu, 09 Jul 2015 21:25:20 GMT</pubDate>
            <atom:updated>2019-01-09T17:06:31.383Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uFINNsYNbYuPIC3sUMyy6w.jpeg" /></figure><h4>Interop with object methods, structs, and arrays</h4><p>When I wrote about <a href="https://medium.com/@jimfleming/rust-lang-in-unity3d-eeaeb47f3a77">calling Rust functions from Unity3D</a> , it was my first time working with a foreign function interface (<a href="http://en.wikipedia.org/wiki/Foreign_function_interface">FFI</a>) and there was a lot I didn’t understand beyond calling simple functions with primitives.</p><p>How do I call methods? How do I pass arrays? How do I pass structs back and forth? Here’s what I’ve come up with…</p><p><em>Note that all of the examples below use Node.js. The principles are the same in Unity3D, C#, and other languages.</em></p><h3>A quick note about usize</h3><p>Often, marshaling between types is pretty straightforward: <strong><em>f64</em></strong> to <strong><em>double</em></strong>, <strong><em>u64</em></strong> to <strong><em>ulong</em></strong>, or simply <strong><em>i32</em></strong> to <strong><em>int</em></strong>. Rust’s <strong><em>usize</em></strong>, however, turned out to be the most varied, and most ambiguous, type-mapping amongst host languages. The <strong><em>usize</em></strong> type represents an unsigned number the width of a pointer (like 32-bit or 64-bit). This varies by the host platform’s OS so while you could use a <strong><em>ulong</em></strong> or <strong><em>uint32</em></strong> on your machine it might break elsewhere. Since Rust uses <strong><em>usize</em></strong> quite often for ranges and indices: always make sure to use a type that represents a platform-specific width. In Node.js you’ll want <strong><em>size_t</em></strong> and in C# (or Unity3D) <strong><em>UIntPtr</em></strong> seems to do the trick.</p><h3>Working with methods</h3><p>Since we’re effectively passing memory references around, the notion of an object with methods doesn’t really exist across the FFI boundary. To work around this limitation, we can define static functions that operates on pointers that we reinterpret as the original object. The host then holds this pointer and uses it when calling these functions.</p><p>Here’s a simple counter struct with increment and decrement methods that we’ll use as the basis for our examples:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/501ff3d905716ac9fd26e22435829af3/href">https://medium.com/media/501ff3d905716ac9fd26e22435829af3/href</a></iframe><p>Now let’s add our FFI. At a minimum we need to provide:</p><ol><li>A constructor — the constructor instantiates an object in memory and returns a pointer to it.</li><li>A destructor for the instantiated objects. We’re responsible for cleaning up memory allocated by the foreign language.</li><li>A function to act as a proxy for each method on the object that we want to call.</li></ol><p>Here’s what that looks like:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/6ec27d545c15d715fb571517da78d0ae/href">https://medium.com/media/6ec27d545c15d715fb571517da78d0ae/href</a></iframe><p>We utilize Rust for memory allocation to create our counter on the heap, using <a href="https://doc.rust-lang.org/std/boxed/struct.Box.html"><strong><em>Box</em></strong></a>, then <a href="https://doc.rust-lang.org/std/mem/fn.transmute.html"><strong><em>transmute</em></strong></a> this box into a raw pointer. This trickery avoids having to manually allocate the memory and seems to be <a href="http://www.reddit.com/r/rust/comments/2fmvcy/rust_ffi_and_opaque_pointer_idiom/">the most canonical way</a> to allocate the counter. Our destructor works similarly by transmuting the counter’s pointer back into a <strong><em>Box</em></strong> then letting it automatically <a href="https://doc.rust-lang.org/std/ops/trait.Drop.html">drop</a>.</p><p>Finally, each function acting as a proxy takes a pointer as its first argument. The function converts this pointer to the original type and calls the desired method passing through any arguments, and finally returning the result (if any). Unlike our destructor, we don’t want to transmute back these pointers into a box until we’re ready to destroy it.</p><p>Calling the FFI is pretty straightforward, relying on the host language’s pointer type:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/524799f8b75f765039a92808d9d602d5/href">https://medium.com/media/524799f8b75f765039a92808d9d602d5/href</a></iframe><h3>Working with structs</h3><p>Sometimes functions may require a number of arguments. To avoid a complicated function signature, we can use configuration structs to group related arguments. Structs work well for this task because they can be described linearly in memory with a flat structure (matching the <a href="http://en.wikipedia.org/wiki/Struct_(C_programming_language)">C struct definition</a>) so passing a struct in and out of Rust is pretty straightforward. Classes, on the other hand, involve more indirection and, therefore, cannot be easily passed.</p><p>The main concern for the host language is the memory layout of the struct properties. Dynamic languages like Node.js <a href="https://github.com/TooTallNate/ref-struct">provide tools</a> for defining structs with the appropriate layout. In C# you can use the <strong><em>StructLayout</em></strong> attribute with <strong><em>LayoutKind.Sequential</em></strong>.</p><p>In this example, the counter is modified to accept a configuration struct containing the initial value and the amount to increment and decrement by:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/05d180992dc7c78db077d6405bd5e8c7/href">https://medium.com/media/05d180992dc7c78db077d6405bd5e8c7/href</a></iframe><p>With the FFI, Rust handles the struct conversion directly so we don’t need to do anything special:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/e8ff6aa31e896cb9fce0f2e10f384ea2/href">https://medium.com/media/e8ff6aa31e896cb9fce0f2e10f384ea2/href</a></iframe><p>In Node, we define a matching struct type for <strong><em>Args</em></strong> and use it in our interface specification:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d4ce1636b2ac19359fe7eae35fe12f0d/href">https://medium.com/media/d4ce1636b2ac19359fe7eae35fe12f0d/href</a></iframe><h3>Working with arrays</h3><p>Passing an array turns out to be the least straightforward of the three techniques since we cannot simply pass the array back and forth like we can with pointers or structs. An array can most generally be represented by a pointer to the first element in the array and a length so that’s what we’ll use.</p><p>Another issue is ownership: who owns the array’s memory? The safest option is to let the host be responsible for the memory since it has the most information about how the memory should be freed. You pass an array in, manipulate it in place and then, instead of returning the array, the caller can simply read its contents when the function is complete.</p><p>The <a href="https://doc.rust-lang.org/book/primitive-types.html#arrays">array</a> type in Rust must have a known length at compile time so we need to use a <a href="https://doc.rust-lang.org/book/primitive-types.html#slices">slice</a>, or a “view” into an array, which we’ll sum into our counter:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b740b44e73af4eead7f2d77e258bd753/href">https://medium.com/media/b740b44e73af4eead7f2d77e258bd753/href</a></iframe><p>In the FFI we need a pointer to the first value in the slice and its length. Then we can use <strong><em>std::slice::from_raw_parts</em></strong> to reassemble the slice (or <strong><em>std::vec::Vec::from_raw_parts</em></strong> to create a vector).</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/f811477cabfd6867e2bd195969abc6c3/href">https://medium.com/media/f811477cabfd6867e2bd195969abc6c3/href</a></iframe><p>From the host language we can simply specify an array type as the argument:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/d809de5717a73791e27a4ed7088e2cfd/href">https://medium.com/media/d809de5717a73791e27a4ed7088e2cfd/href</a></iframe><h3>A better interface</h3><p>To make things even cleaner, let’s wrap up our host FFI into a class that exposes a more natural interface. Most importantly we can hide the use of the pointer since the caller should not need to worry about it (and misuse of the pointer can cause errors or unexpected behavior).</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/0468d0437d7d7bb8efc6c3befb03bf46/href">https://medium.com/media/0468d0437d7d7bb8efc6c3befb03bf46/href</a></iframe><h3>Conclusion</h3><p>And that’s it! You can play around with the code samples <a href="https://github.com/jimfleming/rust-ffi-complex-types">on Github</a>.</p><p>Below are some of the resources I used when researching how to do the things in this post.</p><p>If I got anything wrong or if you have any questions please let me know via <a href="https://twitter.com/jimmfleming">Twitter</a> or <a href="http://jim@fomoro.com">email</a>.</p><h4>References</h4><ol><li><a href="https://doc.rust-lang.org/book/ffi.html">https://doc.rust-lang.org/book/ffi.html</a></li><li><a href="https://doc.rust-lang.org/reference.html#ffi-attributes">https://doc.rust-lang.org/reference.html#ffi-attributes</a></li><li><a href="http://oppenlander.me/articles/rust-ffi">http://oppenlander.me/articles/rust-ffi</a></li><li><a href="http://www.aimlesslygoingforward.com/2014/09/18/safe-rust-callback-bindings/">http://www.aimlesslygoingforward.com/2014/09/18/safe-rust-callback-bindings/</a></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=315d14619479" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/complex-types-with-rust-s-ffi-315d14619479">Complex types with Rust’s FFI</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Rust(lang) in Unity3D]]></title>
            <link>https://medium.com/jim-fleming/rust-lang-in-unity3d-eeaeb47f3a77?source=rss----82978fcc2cbc---4</link>
            <guid isPermaLink="false">https://medium.com/p/eeaeb47f3a77</guid>
            <category><![CDATA[rust]]></category>
            <category><![CDATA[programming]]></category>
            <category><![CDATA[unity]]></category>
            <dc:creator><![CDATA[Jim Fleming]]></dc:creator>
            <pubDate>Mon, 08 Jun 2015 23:52:58 GMT</pubDate>
            <atom:updated>2015-07-12T20:20:31.822Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*E6ssxQBG0NKZfSfrbEUcAQ.jpeg" /></figure><h4>How to use Unity’s Native Plugin interface to call fast, safe code in Rust</h4><p>Lot’s of people are excited about Rust for its applications to game development. Writing native plugins in Unity3D usually means C, C++ or Objective-C and no real memory safety within the underlying code. Now that Rust has hit 1.0 I looked into calling Rust from Unity3D and it turns out to be surprisingly simple.</p><p><em>I should note that this guide targets OS X, not Windows. The process should be similar, likely substituting “dll” for each “dylib”. Check the referenced </em><a href="http://Unity Native Plugins documentation: http://docs.unity3d.com/Manual/NativePlugins.html"><em>Unity Native Plugin guide</em></a><em> for Windows-specifics.</em></p><h4>From Rust</h4><p>Let’s start with two simple Rust functions that return their doubled and tripled their integer inputs, respectively:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/73f4fe4b56c94f85b2d547138266c339/href">https://medium.com/media/73f4fe4b56c94f85b2d547138266c339/href</a></iframe><p>Rust uses a simple config file to define the build output and a command-line tool called Cargo to perform the actual builds. Since Unity loads libraries dynamically we want to specify “dylib” as our “crate-type” in our Cargo manifest:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/76382b410828be7e68e8e7c236ebb022/href">https://medium.com/media/76382b410828be7e68e8e7c236ebb022/href</a></iframe><p>Next, we run <strong><em>cargo build</em></strong> and we’re done. Pretty much the most straightforward compile process I can imagine.</p><p>Inside our <strong>target/debug/</strong> (or <strong>target/release/</strong> for release builds) folder should be a file named <strong>lib&lt;name&gt;.dylib</strong> where name corresponds to the lib name property in the config file above.</p><h4>From Unity3D</h4><p>We want to copy this library into our Unity project’s <strong>Assets/Plugins</strong> folder. On OS X, Unity expects native plugin’s to have a “.bundle” extension. We can simply rename our compiled lib’s extension from “.dylib” to “.bundle”. This works because the underlying command that loads the library understands both formats.</p><p>In Unity, we create a <strong><em>MonoBehaviour</em></strong> (a regular class works too), add a <strong><em>static extern</em></strong> function interface and tag it with the <strong><em>DLLImport</em></strong> attribute pointing to the name of our library in <strong>Assets/Plugins</strong> (omitting the extension):</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/1df8acc2761db4194a9165bb44ed3acc/href">https://medium.com/media/1df8acc2761db4194a9165bb44ed3acc/href</a></iframe><p>If all goes well, playing Unity3D should produce the following output:</p><iframe src="" width="0" height="0" frameborder="0" scrolling="no"><a href="https://medium.com/media/b514e1c65f556dfcd31f4fa212d09de5/href">https://medium.com/media/b514e1c65f556dfcd31f4fa212d09de5/href</a></iframe><h4>Conclusion</h4><p>I’ve avoided writing native plugins for my games in the past due to the inherent complications around memory handling in production code. Rust makes it easier to write fast, safe code to be run within Unity, giving us an alternative over C, C++ or Objective-C for areas of high-performance code.</p><p><strong><em>If you’re interested in working with more complex types via Rust’s FFI then I’ve written a </em></strong><a href="https://medium.com/jim-fleming/complex-types-with-rust-s-ffi-315d14619479"><strong><em>follow up on my experiences</em></strong></a><strong><em> that covers object methods, structs and arrays.</em></strong></p><p>I’m happy to answer any questions on Twitter: <a href="https://twitter.com/jimmfleming">@jimmfleming</a></p><h4>References</h4><ol><li>Complete project for Unity5: <a href="https://github.com/jimfleming/unity-to-rust">https://github.com/jimfleming/unity-to-rust</a></li><li>Rust Once, Run Everywhere: <a href="http://blog.rust-lang.org/2015/04/24/Rust-Once-Run-Everywhere.html">http://blog.rust-lang.org/2015/04/24/Rust-Once-Run-Everywhere.html</a></li><li>Rust FFI documentation: <a href="https://doc.rust-lang.org/book/ffi.html">https://doc.rust-lang.org/book/ffi.html</a></li><li>Rust FFI examples for other languages (Node, Python, Ruby, C, etc.): <a href="https://github.com/alexcrichton/rust-ffi-examples">https://github.com/alexcrichton/rust-ffi-examples</a></li><li>Cargo manifest documentation for dynamic and static libraries: <a href="http://doc.crates.io/manifest.html#building-dynamic-or-static-libraries">http://doc.crates.io/manifest.html#building-dynamic-or-static-libraries</a></li><li>Unity Native Plugins documentation: <a href="http://docs.unity3d.com/Manual/NativePlugins.html">http://docs.unity3d.com/Manual/NativePlugins.html</a></li></ol><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=eeaeb47f3a77" width="1" height="1" alt=""><hr><p><a href="https://medium.com/jim-fleming/rust-lang-in-unity3d-eeaeb47f3a77">Rust(lang) in Unity3D</a> was originally published in <a href="https://medium.com/jim-fleming">Jim Fleming</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
    </channel>
</rss>