Meet Professor Bryon Aragam

Professor Aragam is a Lecturer of Statistics at UCLA. He currently specializes in high-dimensional statistics and machine learning, having started his career in pure math. This is our Q&A,

What’s your background in statistics?

Long story short, I started off doing a lot of pure math, and then I wanted to do some applications. I worked for a marketing company and did analytics and programming for a year. That drove me towards statistics.

What areas do you specialize in?

My dissertation was about high dimensional statistics. The motivation for high dimensional statistics is computational biology. Usually your dataset has a fixed number of variables, and mathematically you have many more samples than you have columns. But in biology, that’s very rarely the case. You find yourself in situations where you want to collect data on twenty thousand genes, but it’s very costly to do that. So you’re only able to collect gene samples from ten or twenty people. Then you have lots of columns and very few samples.

Posted on http://home.isr.umich.edu/

The other part of my work is on graphical models. The main idea of graphical models is trying to figure out how variables are related to or independent from one another. My dissertation was about a specific class of graphical models that involved directed graphs called Bayesian networks.

How are graphical models used?

One main application of Bayesian network is in computational biology. We may be interested in figuring out how different biomolecules interact with each other.

Another application is in deep learning and neural networks. They are special classes of graphical models. You’re performing a very special kind of inference on these models.

There’s a key difference between deep learning and what I do. In deep learning you specify what you think the connections are and you attempt to find the model that best fits that structure — you infer the parameters given a known graphical structure. In my work in computational biology, we don’t know what the structure is. My task is to figure out what the graphical structure is and how arrows connect in the graph. These are two different problems in graphical models.

Posted on http://www.eecs.berkeley.edu/~wainwrig/icml08/tutorial_icml08.html

What’s your favorite distribution?

I don’t have one. A lot of my work involves not thinking about specific distributions.

Are distributions not important for your work?

Specific distributions are not that important. People in machine learning assume the data is generated from some probability distribution, but they can’t guess in advance what it’s going to be. So you have to think about all the crazy possibilities. It’s different from setting up a problem mathematically and supposing the data is normally distributed.

If you had to pick a distribution from the list of distributions on Wikipedia?

I’m going with Rademacher distribution. Most people don’t know about it but it’s incredibly important. It’s similar to Bernoulli but takes the value +1 and -1 with equal probability. It’s very useful for measuring the complexity of sets, which is an important concept in machine learning.

Posted on http://domathtogether.com/math-circles-meeting-april-18th-2015/

This interview was conducted on April 15, 2016 and was edited for length and clarity. Stay tuned for more Q&A’s like this — with other professors!