Sign in

machine learning engineer; lover of cats, languages, and elegant systems; professional curious person.

It’s an oft-lamented fact that the capabilities of modern machine learning tend to be narrow and brittle: while a given technique can be applied to a number of tasks, an individual learned model specializes in only one and needs a lot of data to acquire that specialized competence.

Meta Learning asks: instead of starting from scratch on each new task, is there a way to train a model across tasks so that the acquisition of specific new tasks is faster and more data-efficient? Approaches in meta learning and the related discipline of few-shot learning have taken many shapes — from…


Batch Normalization, and the zoo of related normalization strategies that have grown up around it, have played an interesting array of roles in recent deep learning research: as a wunderkind optimization trick, a focal point for discussions about theoretical rigor and, importantly, but somewhat more in the sidelines, as a flexible and broadly successful avenue for injecting conditioning information into models.

Conditional renormalization started humbly enough, as a clever trick for training more flexible style transfer models, but over the years this originally-simple trick has grown in complexity and conceptual scope. I kept seeing new variants of this strategy pop…


I’m going to tell a story: one you’ve almost certainly heard before, but with a different emphasis than you’re used to.

To a first (order) approximation, all modern deep learning models are trained using gradient descent. At each step of gradient descent, your parameter values begin at some starting point, and you move them in the direction of greatest loss reduction. You do this by taking the derivative of your loss with respect to your whole vector of parameters, otherwise called the Jacobian. However, this is just the first derivative of your loss, and it doesn’t tell you anything about…


If you followed any machine learning news towards the end of last year, you probably saw images like this circulating, the results of a new Generative Adversarial Network (GAN) architecture specified by a recent paper by a team at NVIDIA:

An apparently randomly selected set of faces produced by NVIDIA’s “Style-Based” GAN

Hearing that jaw-dropping results are being produced by some novel flavor of GAN is hardly a new experience if you follow the field, but even by recently heightened standards, these images are stunning. For the first time, I’m confident I wouldn’t be personally able to differentiate these from real images. Reading between the lines of the paper framing, it seems…


“Graph” is a one of those terms that’s fallen prey to natural language’s tendency to be less precise than its mathematical counterpart: in everyday parlance, a graph can often be used to mean a plot, a chart, a data visualization more generally. However, from a strictly quantitative perspective, “graph” refers to a quite particular data structure, composed (at minimum) of some N-sized set of nodes, some set of edges, and properties that can be attached to either nodes or edges. …


In the world of modern machine learning, the convolution operator occupies the strange position: it’s both trivially familiar to anyone who’s read a neural network paper since 2012, and simultaneously an object whose deeper mathematical foundations are often poorly understood. In audio processing contexts, you might hear it described as a signal smoother. In convolutional nets, it aggregates information from nearby pixels and is also described as a pattern-matching filter, activating more strongly in response to specific local pixel patterns. …


Reinforcement Learning strikes me as the wild west of machine learning right now: a place full of drama and progress, with dreams of grand futures hovering on everyone’s horizons. But, also, a place a bit removed from the rules and assumptions governing the worlds of supervised and unsupervised learning. One of the most salient breaches from typical assumptions is the fact that in policy gradients in particular, you can’t reliably know whether the direction you’re moving in will actually improve your reward. …


(If you haven’t done so yet, I recommend going back and reading Part 1 of this series on VAE failure modes; I spent more time there explaining the basics of generative models in general and VAEs in particular, and, since this post will pretty much jump right in where it left off, that one will provide useful background)

A machine learning idea I find particularly compelling is that of embeddings, representations, encodings: all of these vector spaces that can seem nigh-on magical when you zoom in and see the ways that a web of concepts can be beautifully mapped into…


It’s a truth universally acknowledged: that data not in possession of labels must be in want of unsupervised learning.

Glibness aside, it’s commonly understood that supervised learning has meaningful downsides: labels are costly, noisy, and direct your problem towards the achievement of a somewhat artificial goal, rather than simply learning meaningful contours of the data in a more neutral way. However, they do give us something very valuable in the context of learning: a straightforward objective to maximize. All modern neural network systems are built off of gradient descent, which modifies parameter values in order to to optimize an outcome…


The premise of meta learning was an intoxicating one to me, when I first of heard it: the project of building machines that are not only able to learn, but are able to learn how to learn. The dreamed-of aspiration of meta learning is algorithms able to modify fundamental aspects of their architecture and parameter-space in response to signals of performance, algorithms able to leverage accumulated experience when they confront new environments. In short: when futurists weave us dreams of generally competent AI, components that fit this description are integral to those visions.

The goal of this blog post is…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store