# NIPS Day 1: Deep Queues

It was 8 am, and what appeared to be the population of a small midwestern town was curled back and forth in front of the Long Beach Convention Center, waiting to get their passes for NIPS, the truly massive ML conference I’ve come here to attend. Inside the tutorial I was attending, on “Reinforcement Learning By and For the People,” the presenter, Emma Brunskill, asked the room who was familiar with the mathematical foundations of RL; 95% of the room raised their hands. “Well,” she said, after a beat, “I suppose if you made it here this early, you’re motivated.”

### Reinforcement Learning By and For the People

The first tutorial of my day (tutorials are a special kind of session for this initial day of NIPS; each one lasts about two hours) dealt with different ways Reinforcement Learning systems interact with people: either as objects of the algorithm that introduce special concerns, or as potential aids to the algorithms learning and performance.

To start out with, she gave a broad overview of Reinforcement Learning, which I won’t repeat here, but the majority of which I’ve covered in my RL summary series here.

Starting out with the first framing, “RL for the people,” Brunskill highlighted a core problem with building reinforcement learning systems — like, say, a system that learns a dynamic curriculum policy to give a student, based on their progress: they’re hard to simulate at useful timescales. When you’re playing the game of Go, the rules are fixed, and easy for humans to codify, so we can easily write a digital simulator that perfectly replicates the reward system that the AI would faced if it played a human player. It’s less obvious how to do this effectively when the system whose rewards (e.g. educational outcomes) you want to learn to predict well is a person. If you want to do any kind of online or policy learning — that is, any kinds of learning that rely on your observations coming from the policy you’re working on optimizing — you’d need to build a simulation of student behavior, and use that to generate your rewards in the way a game engine works. Or, alternatively, you need to use batch, often observational/historical data.

That motivates Brunskill’s main points: the necessity of **sample efficiency**, since observations derived from humans are likely to be costly and thus rare. Another reason for thinking about data efficiency and the related problem of **transfer learning **is the fact that humans are not a monolithic group. And, in order to be principled about things, we might want to learn separate policies for each individual. However, given that samples are costly to begin with, it’s pretty undesirable to have to learn each individual’s policy from scratch, without being informed by what we’ve learned from others.

A few approaches suggested for handling this problem were

- Assuming that people fell into on one a small number of groups; a model could then be learned per group (balance between specialization and parameter reuse)
- Assuming that individuals differ from each other based on the value of a latent variable, that’s parameterized in continuous space (so, instead of having to learn multiple functions, we can just interact our learned function with the value of the latent variable)

As mentioned, when we don’t have an a priori simulator of our reward-generating-system (like we do with a Physics engine), we can either build a simulated model (model-based), or else sample from data generated in a single batch. However, you run into the issue that, when you’re evaluating a new potential policy, you’re still using data generated using a totally different policy. That issue motivates the use of Importance Sampling, which basically just means that when we evaluate a policy on the old batch set, we weight each observation based on how likely it would be to appear under the policy we’re evaluating. However, each of these two approaches has trade-offs

- The model-based approach can fall prey to bias, particularly if the model class used to estimate the rewards is wrong (if your model is misspecified, like, trying to learn the parameters of a Gaussian when what you really need is an Exponential, no amount of new data can fix that). But it has low variance.
- The importance sampling approach has low bias, but, due to the potentially widely varying weights, high variance.

A frequent approach in Brunskill’s work is apparently just finding ways to combine these two approaches, to get a good bias/variance trade off.

Some ideas that came out of the “by the people” section, which focused on how to facilitate people giving feedback to RL systems, were:

- That machines could perform “Inverse Reward Design”, to deal with the fact that the rewards humans say they want might only be an incomplete representation of their true reward; this approach places Bayesian priors on what the model believes the human reward function is, and narrows the posterior bounds using data, keeping track of it’s uncertainty.
- Another approach, even more bare metal, would be to have machines actually learn an underlying human reward function, by watching demonstrations.
- Work that’s trying to get the RL model to identify where we might want a new action, not specified in the original set, to exist. This would give the machine even more room to diverge from past human behavior, in order to find an optimum.

### Fairness in ML

After copious caffeination, it was time for the second tutorial, this one focusing on the problem of fairness in ML applications. While a lot of these ideas were familiar to me through previous work on applying ML in the lending space, I thought the presenters did a really stellar job of outlining the conceptual frames in which we should think about fairness in ML.

One point that was emphasized, that I think is worth highlighting within the technical space, is that there’s a difference between asserting that a feature (for instance: a group membership) has no relevance to prediction of the target class, and that that feature **should**, from a normative perspective, due to a history of that feature being unjustly used to discriminate, not be allowed to influence our decisions. It’s a subtle distinction, but an important one, particularly when it comes to technical ML experts communicating with legal experts.

The speakers them addressed two legal paradigms of fairness:

- Disparate Treatment, which forbids explicit use of group-membership information, and is mostly motivated by a concern with procedural fairness and equal opportunity
- Disparate Impact, which requires that outcomes between groups be the same, unless a strong business justification can be found to explain those differing outcomes. This frame puts more emphasis on distributive justice, and promoting equality of outcome.

The rest of the talk focused on providing three broad categories of what someone might want when they say they want an algorithm to be fair:

- Independence: that the scores of your classifier be entirely independent of class membership (basically: equivalent score distributions between group)
- Separation: that the scores of your classifier, be independent of class membership, conditional on the target variable (basically: equivalent score distributions between groups, when you compare only within target=1 or target=0)
- Sufficiency: That your outcome Y be independent of race, given the scores of the classifier. This one is admittedly a bit weird to me, and seems to be promoting the capture of *more* group-based information into the score, rather than less

If you want to gain some intuition about the trade offs between criteria like these, I recommend playing around with this tool Google built, which lets you try different settings or solving of the problem, and see how you do on additional criteria.

### Geometric Deep Learning

By this point in the day, my caffeine levels were at neither a local nor global maximum, and so I took a break during the second half of the talk. The first half focused on how we could apply ideas from conv nets to input data that took the form of a graph. The main difficulty of this, as framed very cogently by one of the presenters, is that graphs vary in fundamental structure, as well as scale, from graph to graph. This is not the case with images, which have a fixed grid structure. So, the goal of creating a graph convolution operator became: creating an operator that was insensitive to permuting of the ordering of vertices, and to adding more vertices to the graph.

A simple approach, and the one their methods built on, is to have two sets of weights: one to multiply by the vector corresponding to the “current” node, and one to multiply by the mean of all other neighboring points. This works since taking the mean of a collection of values isn’t order-dependent, and won’t fundamentally break if you add an additional value. A downside of this is that all the filters learned in this fashion are radial, meaning that they treat every direction the same and don’t know the meaning of things like “up” and “down” and “left” and “right” since those ideas don’t make much sense on a graph.

### Powering the Future

In the last talk of the day, and the first invited talk, we saw John Platt, who is working with Applied Science at Google on something typically Google-esque and cool: providing machine learning support to help in the development of fusion research. The premise of this project is that fusion is a Great White Hope for getting humanity to our future energy needs, and that it’s a high value research opportunity right now. The team is working with a physics laboratory, which has a machine to induce plasma by superheating the deuterium inside.

The overall goal of the project is to use Variational Inference to help suggest which regions of parameter space are reasonable vs those that are likely to make the machine break. This is useful because, to bring this technology past Proof of Concept, the physicists need to prove that they can get the plasma to very high temperatures, and keep it stable at those temperatures. Giving the physicists a better informed way to explore parameter space, with reduced probability of engineering error, will hopefully help this Proof of Concept come into existence sooner rather than later.

### Picture of the Day:

### Tweet of the Day:

### Quote of the Day:

“When they called to ask me to be program chair, I asked what I always ask in these cases: ‘are you sure you didn’t mean to call my brother’?” — Samy Bengio

### Collective Mood of the Day:

“Ooh, did you get an invite to the Tesla party?”