The Math Required for Machine Learning

For the past year, I’ve been working on implementing well known model architectures and building web applications, so I have a fair amount of refreshing to do when coming back to theoretical machine learning. A lot of it has to do with understanding machine learning’s underlying mathematics rigorously, to be able to reason with the field and validate radically new architectures. To that end, I’ve put together a short syllabus that I’ll be personally going through to review some Math

Keep in mind there are a lot of excellent resources out there. I’ll no doubt be updating with a better guide as I work through this material over the next few weeks.

Resources to Study Math

Having a fundamental understanding of mathematics is absolutely necessary to being able to reason with ML productively.

That being said, I’m of the stance that you can learn what you need to as you go along, so I’d recommend getting a basic familiarity through the Mathematics for Machine Learning Specialization on Coursera. Its pleasantly tough, and gets you to where you need to go fast.

If starting from complete scratch, the topics you should certainly review/cover, in any order are as follows:

  1. Linear Algebra — Professor Strang’s textbook and MIT Open Courseware course are recommended for good reason. Khan Academy also has some great resources, and there is a helpful set of review notes from Stanford.
  2. Multivariate Calculus — Again, MIT Open Courseware has good courses, and so does Khan Academy.
  3. Probability — Stanford’s CS 229, a course I’ve mentioned later, has an awesome probability review worth checking out.

Once you’ve finished the resources above, I’d say you’re in a great place to tackle the Andrew Ng Coursera Course or its more mature, mathematically rigorous older brother, CS 229.


Recently, I’ve been working on figuring out how to answer a nebulous question that formed in my mind as I work on my thesis: Are modular neural networks with adaptive topology capable of representing large, complex, hierarchical problem spaces effectively?

I’d be hard pressed to call it a research question, since its such a broad topic, but my intuition keeps serving up the answer yes. I’ll be writing more about this in a later post, and explain why I’m particularly excited about it.