High-quality math resources that helped me become an Amazon ML Scientist

Abhishek Divekar
6 min readMay 7, 2022

--

Photo by Ashin K Suresh on Unsplash

A previous post walked through my journey from Amazon Software Engineer to ML Scientist. Digging into the math behind various ML approaches is one of the biggest differentiators between my previous and current skillset. In this post, I am sharing the resources I used to help me build this depth.

As I’ve mentioned before:

“The math you need for ML is Probability, Statistics, Linear Algebra and Calculus” — Internet

While this is not untrue, it’s much more accurate to say:

“The math you need for Machine Learning is 50% Probability & Statistics, 35% Linear Algebra and 15% Calculus”

Probability and Statistics really is the mathematical language of Machine Learning (or Statistical Learning, as it used to be called).

Personally, I am a big believer in finding high-quality resources. My reasoning is that for most well-established topics (i.e. topics you would find in a textbook rather than only in research papers) there is always a resource somewhere that is simple enough for me to understand. I hoard these PDFs and weblinks quite greedily, and now I am sharing them. Please add your favorite resources in the comments.

Note that I do not necessarily real each book cover-to-cover. Usually the first 5–6 chapters lay a lot of ground-work and are worth understanding in depth and making notes. Later chapters cover advanced topics, and can be read individually when they are relevant.

With that, let’s Dive into the list.

Updates:
May 2022: first version of this article
Dec 2022: added Geoff Cummings book in “Statistics”

Probability

[Blitzstein and Hwang, 2019] Introduction to Probability, 2nd edition

By far my favorite book on Probability is “Introduction to Probability 2nd edition” by Harvard professors Joe Blitzstein and Jessica Hwang.

  • The PDF is available for free at http://probabilitybook.net, and the authors have created an animated EDX course which follows the book.
  • Great visualizations of the most important concepts, and clear explanations of the others, with several examples.
  • What I love about this book is that there is no mention of theoretical probability concepts: σ-algebra, Borel sets, and measure theory. Yet, it’s able to cover a fairly extensive syllabus: Counting, Conditional Probability, Discrete and Continuous Random Variables, Expectations, Variance and Moments, about a dozen popular distributions, the Central Limit Theorems, various probability inequalities, and Markov-chain Monte-Carlo. This book truly is an “introduction”, and each topic is explained with beautifully simple language.

[Pishro-Nik] Introduction to Probability, Statistics, and Random Processes

This is another easy-to-follow introduction to probability which does not go into measure theory. The author is a Professor of Electrical and Computer Engineering at UMass Amherst.

  • The entire book is available for free at: https://www.probabilitycourse.com. You can otherwise buy it on Amazon. There are also a limited number of lecture videos available.
  • I particularly like Chapter 8 (Statistical Inference I: Classical methods) and Chapter 9 (Statistical Inference II: Bayesian Inference), as it goes into Random Sampling, Confidence Intervals and Hypothesis Testing, in both the Frequentist and Bayesian setting.
  • This book does not have too many visualizations, unfortunately. However, it does have a lot of examples to walk through the concepts.

[Guy Lebanon] The Analysis of Data, Volume 1: Probability

TAOD (http://theanalysisofdata.com/probability) is a website which I was surprised I kept visiting. Till date, TAOD Chapter 4 is one of the only places I could find a compendium of formulae for multivariate random variables (i.e. random vectors), which are crucial in Machine Learning. The website may not be beautifully formatted, but the notation is very explicit and brings a lot of clarity.

Statistics

Probability is not the same as Statistics, but they are very intertwined. I like to think of Statistics as “applied” Probability. Statistics is also closer to Machine Learning as a discipline.

[Geoff Cummings and Robert Calin-Jageman, 2016] Introduction to the New Statistics: Estimation, Open Science, and Beyond.

An excellent introductory book on Statistics, perhaps the best I have found after extensive searching. The first author is a retired academic, whose research area was psychology, not statistics. The book is likewise written for budding scientific practitioners, not theorists.

I wish I had read this book back when I decided I really wanted to be an an ML Scientist. In addition to how to do statistics-driven research, I really liked that the authors explain why statistics has become so important in scientific research, and some wisdom about Science itself (including the bad parts, such as publishing bias and reproducibility issues).

There’s barely any math, so this can’t be your final book on statistics. But it is a great book if you have not been formally educated in statistics/research methods, or want an intuitive re-introduction. Topics covered are statistics fundamentals (normal distribution, sampling, confidence intervals, hypothesis tests, linear regression) and research fundamentals (reasoning about uncertainty, experimental design, meta-analysis, open science). Each topic is driven home by (i) explaining it from multiple angles; (ii) helpful visuals (like the one below). The exercises at the end of each chapter really helped solidify my understanding…each question only requires thinking for a minute, and the answers can be found on the next page.

The “dance of the means” depicted in Cummings book, is used to illustrate sampling variability.

[Wasserman 2004] All of Statistics

A classic text which I’ve only been introduced to recently. It does not go into measure theory, but it also does not hold your hand. It’s a rapid-fire revision of a wide variety of probability and statistics definitions (including applications such as regression, classification, maximum likelihood, causal inference, graphical models, etc).

Linear Algebra

[van de Geijn and Myers] LAFF at UTexas Austin

While I have heard Gilbert Strang’s book is great, but my own in-depth exposure to Linear Algebra came from http://www.ulaff.net, a site maintained by UTexas Austin professors Robert van de Geijn and Maggie Myers. The content here is also available for free.

  • Undergraduate book (ULAFF): can be downloaded from here. It introduces a very intuitive notation for computing with vectors and matrices, which blew my mind when I first saw it. There is also an accompanying EDX course, taught by the professors, which has very high-quality visualizations. You learn MATLAB through this course. The professors are super responsive on EDX.
  • Advanced Linear Algebra (ALAFF): this is a graduate course on Linear Algebra used for computing: matrix decompositions, least squares, etc. However, the important content is re-introduced from scratch, so you will not need to take ULAFF. The course website is a new way of presenting content, integrating videos and notes into an overall great experience. You also learn MATLAB through this course. I recommend solving all the practice problems.
  • I personally found the explanation of eigenvectors, least squares (i.e. Linear Regression) and SVD super useful for my Machine Learning work.

I was super lucky to have taken their Advanced Linear Algebra course as part of my UT curriculum. Fun note: the professors are married in real life!

Calculus

[Guichard 2022] Single and Multivariable calculus

One of the simplest books I have found on calculus. This is a community project hosted by the Mathematics and Statistics department at Whitman college, where Dr. Guichard works.

  • The PDF is available for free at the Whitman college website.
  • Great visualizations of various calculus topics. I personally found Chapter 14 “Partial Differentiation” to be very useful while understanding gradients and Hessians. Chapters 2–6 (regarding single-variable derivatives) and Chapters 7–9 (integration) were also a good refresher when I was studying Convex Optimization.

That’s all for now! I will update this list with more resources as I come across them. Please let me know in the comments if these resources helped you, or if I have missed your favorite resources!

P.S.! You probably don’t know this, but I don’t make money from Medium posts. And I don’t need it, either. However, you’d bring a smile to my face if you make a donation in my name to https://15outof10.org, the non-profit founded by the WeRateDogs® Twitter account to help dogs with medical issues become more adoptable. Seeing happy dogs on the internet motivates me to write more high-quality posts :)

Photo by Jamie Street on Unsplash

--

--