Nick Dexter: Advancing Machine Learning with Math

Nick Dexter, a Pacific Institute for the Mathematical Science (PIMS) postdoctoral fellow, is at the forefront of machine learning in Canada.

By Jimmy Fryers

MRIs scans have revolutionized medicine: doctors are able to get a detailed look at what’s going on inside a patient without having to resort to invasive, not to mention expensive, exploratory operations.

Dr. Nick Dexter

However, while MRIs save time in the short-term, doctors are still required to analyze the scans, potentially leading to delays and introducing human error into the diagnosis.

The rise of sophisticated computing power has led to dramatic advances in engineering, but self-driving cars aren't (yet) turning up at our homes and driving us to work.

Even though technology has improved significantly with the advances in AI and machine learning over the last decade, we’re not yet at a point where computers can be let loose diagnosing patients in hospitals or navigating inner-city traffic.

But the future is coming and we will undoubtedly get to the point relatively soon where computers are playing a greater role in assisting doctors with diagnosis and driving us to the office.

Sparse recovery techniques such as compressed sensing can often provide high-accuracy approximations from under-sampled data. Left: Photo of Nick’s dog Benji taken in Haw Ridge Park, Oak Ridge, Tennessee. Right: Image obtained after setting 99.75% of the coefficients in the bi-orthogonal wavelet transform to 0, preserving 99.23% of the energy of the original image.

It may come as a surprise to many, but much of the fundamental work required to make this future become a reality will be done by mathematicians like Nick and his colleague, PIMS SFU Site Director, Prof. Ben Adcock.

PIMS sat down with Nick to discuss his work in computational mathematics, deep neural networks and machine learning, as well as his love of travel, the outdoors and cooking.

What are you working on for your PIMS Postdoctoral Fellowship?

I am working on a few different areas during my PIMS Postdoctoral Fellowship. The first area is testing the performance of machine learning (ML) with deep neural networks (DNNs).

A DNN is an artificial neural network originally inspired by the workings of the human brain. The basic building blocks of these models are artificial neurons which activate in specific patterns to achieve a desired result. The connections of these neurons and the intensity of their activations determine how data, e.g., images or text, pass through the network. These connections and intensities are trained by observing many examples of the data, and learning occurs through a mathematical optimization procedure. Many experts define deep neural networks as networks that have an input layer, an output layer, and at least one hidden layer in between (though modern models often have 10’s or 100’s of layers).

DNNs have become very popular recently due to breakthrough-level improvements in results achieved on a great number of historically-challenging problem areas.

A computational graph associated with a deep neural network composed of L layers. Information flows forward from the input layer, through the intermediate hidden layers, and finally to the output layer.

These successes are largely driven by advances in specialized hardware and software for training DNNs, along with massive amounts of data now available for feature detection. DNNs have been the driving force behind many of the impressive results in image classification¹, speech recognition², and autonomous vehicles achieved by companies like Apple, Google, and Tesla Motors in recent years.³ ⁴

Deep learning (DL) techniques have seen the most success when applied in data-rich environments, e.g., the image classification networks developed by Google and Facebook, which draw from their vast stores of user-uploaded content. These methods are also increasingly being applied to solve problems in science, medicine, and engineering, where data is often limited and may possess noise or corruption.

However, numerous recent studies, including work by my postdoctoral supervisor Professor Ben Adcock of SFU, have shown that standard training procedures produce DNNs that are not robust to this noise and can fail to detect small structural changes to data.⁵ ⁶ ⁷ ⁸ ⁹ This lack of robustness often means that trained DNNs, which provide a super-human level classification of noiseless images on standardized tests, can provide wildly inaccurate predictions given images with small, imperceptible amounts of noise, e.g., labelling a noisy image of a panda as a gibbon (see below).

An imperceptible amount of noise added to a picture of a panda which is originally labelled “panda” by the neural network with 57% confidence score, after which the noise is labelled as a “gibbon” with 99.3% confidence.¹⁰ Examples like these highlight why it’s important to study the stability properties of neural networks.

In order for DL to be embraced in critically important areas such as assisting in medical diagnosis, it is important to understand: 1) how much data is needed to train DNNs to yield accurate predictions, and 2) how robust these trained DNN architectures are to unknown sources of noise.

My work seeks to answer both of these questions by developing a more rigorous testing and analysis framework for approximation with DNNs, and proposing changes to DNN architectures and training processes that improve stability and robustness.

Are there any real-world implications/uses for your research? If so, what are they?

It is hard to overstate how large of an effect the DL revolution is already having on the world economy and will continue to have on society as a whole. PricewaterhouseCoopers estimates that DL and artificial intelligence will contribute up to $15.7 trillion to the global economy in 2030.¹¹

We’re already seeing entire industries transformed through DL, as complicated decision-making processes and routine tasks are increasingly being automated with specialized DNN architectures trained on real-world data.

This shows another problem with neural networks. Trained neural networks can often fail to detect these structural changes in the data (adding “can u see it” to the brain scan), which leads to poor reconstruction performance. The exact reasons for this phenomenon are still under investigation.¹²

Improving the reliability of these methods or reducing the complexity of training, even a small amount, can lead to very large cost-savings in industrial applications.

You’re partway through the PIMS Collaborative Research Group (CRG) work — what point are you hoping to get to when it finishes.

The questions posed by the PIMS Collaborative Research Group (CRG) on “High Dimensional Data Analysis” have opened up a whole new direction of research for me.

A key goal in the CRG is to better understand the relationship between the complexity of the sample and the generalization performance of DL, i.e., how well-trained DNNs are able to predict new values on unseen data. We’re approaching this problem both through novel theoretical ideas and rigorous numerical experiments.

So far we’ve shown a new theoretical result which implies the existence of a DNN architecture and training procedure that performs as well (up to a constant) as compressed sensing (CS) on certain tasks.

CS is a sparse signal recovery technique that achieves state of the art results in reconstructing MRI images, and has been incorporated into software running on MRI machines produced by companies like Siemens and Philips. CS theory also offers rigorous guarantees on the stability and robustness of the reconstruction process with respect to noise.

The existence of a DNN architecture that achieves the same sample complexity bounds as CS suggests that DL should be competitive with CS on the same problems. However, our recent numerical experiments show that trained DNNs often don’t perform as well as CS when the target function is sparse, highlighting a key disconnect between the theory and practice of DL.

Approximating the indicator function of a square with a shallow neural network (left) and a deeper and wider neural network (right). Deeper networks are theorized to have superior approximation capabilities over shallow networks, but training deeper networks on simple problems often results in overfitting.

To facilitate the technology transfer goals of the CRG, we plan on releasing our open-source testing framework written in Google’s TensorFlow software for DL¹³ on GitHub.

You’re from the USA, how and why did you end up at PIMS-SFU and Vancouver.

I met Professor Adcock in April 2016 at the SIAM Conference on Uncertainty Quantification in Lausanne, Switzerland, and I remembered the good conversations on compressed sensing and approximation theory we had. Over the last couple of years, we kept in touch, and when I was getting close to graduation, I knew I wanted to apply to the PIMS Postdoctoral Fellowship to work with Ben.

Nick at Kinkaku-ji, a Zen temple in Kyoto, Japan

If you were given an unlimited research budget, what would you like to work on and why?

One of the huge challenges of doing work in machine learning is the cost of access to hardware. The deep neural network models I’ve described are typically trained on Graphical Processing Units (GPUs) or specialized Tensor Processing Units (TPUs), and a single piece of hardware can often range between $2,000-$15,000. Models may require tens or hundreds of these specialized cards for training.

Also, the training process itself is very energy-intensive and produces a large amount of excess CO2 emissions. For example, a recent study estimated the cost of training a single model in natural language processing for 274,000 GPU hours runs at somewhere between $942,000-$3,300,000 and produces an excess of 626,00 lbs of CO2¹⁴.

In addition to these costs, there’s also development time, hyperparameter tuning and scaling runs which must be performed, and maintenance costs for the facilities.

Here in Vancouver, we are lucky to have access to Compute Canada resources, which offer small allocations for principal investigators at Canadian universities. As a result of this and working with Ben, I’ve had access to the Cedar system at SFU with its vast resources of GPUs for training.

Without this, I would not be able to complete work on this project, due to the significant amount of computing time necessary to train the DNN models. In the process of investigating the stability and robustness properties of the DNNs, I’ve run over 100,000 independent simulations and generated terabytes of data, taking well-over 1.5 years in single-GPU compute time on Cedar!

If given an unlimited budget for research, I would design a plan for research and development that counted all of these considerations (hardware, software development, and training), in addition to supporting other researchers to work on these key challenges in machine learning.

What is your favourite aspect of teaching?

My favourite aspect of teaching is when I can connect something that I’m passionate about to what they’re learning in the class. I enjoy showing the students my research and the work that I’m doing and explaining to them how to apply the concepts they’ve learned to real-world problems.

Nick teaching during the ‘PIMS CRG Summer School: Deep Learning for Computational Mathematics’ event

I also enjoy revisiting the material and gaining a deeper understanding through teaching the students.

You enjoy hiking and have taken some great photos of your travels. Where is your favourite location that you’ve visited?

I visited Japan with my brother in 2015. We met Japanese relatives overseas and explored Tokyo, Osaka, Kyoto, Hiroshima, and the Japanese countryside.

Nick at Sensō-ji, a Buddhist temple in Asakusa, Japan

My favourite part of that trip was biking the Shimanami Kaido, a set of bridges over beautiful islands in the Seto Inland Sea. The views were amazing and there were many interesting sites to stop and see, including Ōkunoshima, an island that was once used for chemical weapons testing but is now overrun with thousands of feral rabbits!

If you could wake up to one view on your next camping trip, what would it be?

When I would go camping in the past, it was usually with a small group of people and we would generally car camp. In Tennessee, I liked to camp in the Tellico Plains area southwest of Knoxville, near streams in the forest.

Taking a well-deserved camping break

I’d love to try camping near a lake where I could fish, or hike to a primitive campsite with a nice view. Someday it would be nice to open my tent and see the view down a mountain slope.

I heard you’re a good cook. What food do you like to cook?

I love to cook a lot of different types of food. I grew up around a lot of Japanese food, and I was never afraid to try new things as a kid.

My favourite food to cook is ramen, though since moving to Vancouver we have so many good ramen shops that I’ve been mostly eating out for ramen instead of cooking it. Cooking ramen takes a long time and is very labour-intensive, but it can be very rewarding, especially when cooking for friends.

Contact Nick

Nick Dexter, Department of Mathematics, Simon Fraser University
Email: nicholas_dexter@sfu.ca
Webpage
Curriculum Vitae

Publications

B. Adcock, N. Dexter. High-dimensional function approximation with ReLU deep neural networks. In preparation(2019).

N. Dexter, H. Tran, C. Webster. A mixed 1regularization approach for sparse simultaneous approximation of parameterized PDEs. ESAIM: Mathematical Modelling and Numerical Analysis (2019).

N. Dexter, H. Tran, C. Webster. On the strong convergence of forward-backward splitting in reconstructing jointly sparse signals. Preprint available on arXiv:1711.02591 (2017).

A. Chkifa, N. Dexter, H. Tran, C. Webster. Polynomial approximation via compressed sensing of high-dimensional functions on lower sets. Mathematics of Computation (2016).

N. Dexter, C. Webster, G. Zhang. Explicit cost bounds of stochastic Galerkin approximations for parameterized PDEs with random coefficients. Computers & Mathematics with Applications (2016).

Bibliography

  1. A. Krizhevsky, I. Sutskever, G. E. Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems. 2012.
  2. G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition, IEEE Transactions on Audio, Speech, and Language PRocessing, 20 (2012), pp. 30–42.
  3. R. Hadsell, P. Sermanet, J. Ben, A. Erkan, M. Scoffier, K. Kavukcuoglu, U. Muller, and Y. Lecun, Learning long-range vision for autonomous off-road driving, Journal of Field Robotics, 26 (2009), pp. 120–144.
  4. C. Farabet, C. Couprie, L. Najman, and Y. LeCun, Scene parsing with multiscale feature learning, purity trees, and optimal covers, Proceedings of the 29th International Conference on Machine Learning (2012).
  5. A. Fawzi, S.-M. Moosavi-Dezfooli, and P. Frossard, The robustness of deep networks: A geometrical perspective, IEEE Signal Processing Magazine, 34 (2017), pp. 50–62.
  6. C. Kanbak, S.-M. Moosavi-Dezfooli, and P. Frossard, Geometric robustness of deep networks: analysis and improvement, (2017).
  7. S.-M. Moosavi-Dezfooli, A. Fawzi, O. Fawzi, and P. Frossard, Universal adversarial perturbations, (2016).
  8. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, Intriguing properties of neural networks, (2013).
  9. V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen, On instabilities of deep learning in image reconstruction — Does AI come at a cost? (2019).
  10. I. Goodfellow, J. Shlens, C. Szegedy, Explaining and Harnessing Adversarial Examples,
  11. A. Rao, G. Verweij, E. Cameron, Sizing the prize, PricewaterhouseCoopers Analysis, url:https://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf
  12. V. Antun, F. Renna, C. Poon, B. Adcock, and A. C. Hansen, On instabilities of deep learning in image reconstruction — Does AI come at a cost? (2019).
  13. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng: TensorFlow: Large-scale machine learning on heterogeneous systems (2015). url: https://www.tensorflow.org/.
  14. E. Strubell, A. Ganesh, and A. McCallum, Energy and Policy Considerations for Deep Learning in NLP, arXiv preprint arXiv:1906.02243, (2019)
  15. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016

--

--

Pacific Institute for the Mathematical Sciences
The Pacific Institute for the Mathematical Sciences

PIMS — A consortium of 10 universities promoting research in and application of the mathematical sciences of the highest international calibre.