Published in


[Short Review] MIT OCW 6.041 Probabilistic System Analysis and Applied Probability

learn on youtube and do a some of recitations


Why should we learn probability theory and its application when nowadays we have instant tool like machine learning(ML), artificial intelligence(AI), or even better platform which help us build ML without a single line of code. According to Causal Inference for the Brave and True, we should have better understanding, we should aim for the true value instead of just the tip of the iceberg. To clarify, I don’t mean to say we should ignore practical point of view and solely focus on theorectical concept, instead, we should know both.

“Last but not least, remember that there are no shortcuts. Knowledge in Math and Statistics are valuable precisely because they are hard to acquire. If everyone could do it, excess supply would drive its price down. So toughen up! Learn them as well as you can. And heck, why not? have fun along the way as we embark on this quest only for the Brave and True.” — Introduction to causality

Who’s this course for?

Any beginner who want to develop foundation of probabilisty or anyone who want to be reminded of how this thing works. Any ML, AI enthusiast or just want to tap into this field.

On my side, this subject used to be my pain. I studied this theory but loss connection between its application to the real world problem. All I could remember is a big mess. Now, I come back to this subject like when you come back to your scary memory facing undeniable fact that it’s invaluable.

Photo by Ana M. on Unsplash

Course Content Overview

This course composes of 2 main parts: Probabilistic Model and Application of Probability. From this 2 main parts, professor John Tsitsiklis elaborate everything in very simple yet rigorious manner.

On first part, professor guide you through foundation of probability with question on why and how. He would derived theorectical understanding from simple example, this would help you build up solid perception and step-by-step approach in your head. Once you complete first section, you would be curious on the application of what you’ve studied, then he would introduce you the implication of it through inference, important theory, and random process. I have listed course outline down below.

Probabilistic Model

  1. Probability Model&Axioms
  2. Conditional Probability & Bayes’ Rule
  3. Independence
  4. Counting
  5. Discrete Random Variables
  6. Continuous Random Variables
  7. Continuous Bayes’ Rule
  8. Derived Distribution
  9. Covariance
  10. Iterated Expectation

Applied Probability

  1. Bernoulli process
  2. Poisson process
  3. Markov Chain
  4. Weak Law of Large Number(WLLN)
  5. Central Limit Theorem(CLT)
  6. Bayesian Statistical Inference
  7. Classical Statistical Inference

Additional resource

In addition to course video on youtube, you could dive deeper into this probabilistic framework with textbook. In the book, you could practice your knowledge with both advanced theoretical problem and applied problem related to real application. While developing understanding, you would see wide range of how to apply it.

Probabilistic Model or framework is pretty flexible tool, that’s why it’s hard to grasp at first but benefit you in the long run. In each industry view probability with different word: risk, reliability, opportunity, score etc. I think it depends on you to explore this yourself and see how you want to frame the problem to answer your most beneficial question using this framework.

Impact & Comments

I finished this course on Q3 2021, I feel like I could use probabilistic framework more in my normal day-to-day work. When I approach the problem I think of conditional probability more often and have probability mechanism in my brain. I start to ask this type of question more often: Is it because of luck or this program really help? When I observed this thing happening Is it come from pure luck? How likely will this event happen? Does this event independence from another?

I could better read documentation of probabilistic related e.g. skewed normal distribution in python scipy, hypothesis testing etc. But it’s like any other profession or skillset, you need to keep practicing until it becomes one of your proficient tool. This skillset takes time to practice and it’s cool to remind you that most of people are very bad at probability regarding Drunkard Walk by Leonard Mlodinow. It’s less likely from two characteristic to present in the same time than one e.g. Data Scientist with strong mathematical background and proficient coding skill. Even though, we feel otherwise all the time this suggest that our brain is not particularly good at this.

Nevertheless, I admit that you might not have chance to use it very often if most of your day is all about ETL, dashboard, or QA data mismatch. But to truly derived value from the data this is one of the possible way you could consider. It give you very simple yet powerful way to frame and solve the problem such as conditional probability you could use it with bayes’ theorem which you could gain deeper insight when you know how likely is this thing to happen when you observe another thing happen. You could borrow partitioning sense from conditional probability and apply to any categorical variable in your data either boolean like ‘customer activate product A’ vs. ‘customer doesn’t activate product A’. Then you could calculate probability on top of this set, this could give you information which could back your decision making process.

Knowledge of this kind could help you improve your coding too. When you don’t know how to use python to solve problem, you might code it from simple form that you could think of. With theory or cool method, you could easily pick it up quite fast, your code turns out to be extremely clean, concise and faster. I still don’t have chance to test this out myself but recursive probability function with total probability theorem seems promising.

Photo by Clemens van Lay on Unsplash


  1. Tons of Willpower — If you are the one who had not been with this subject quite well.
  2. Time: 25 hours of video with nonstop view — I took roughly 60–75 hours to complete with taking notes
  3. Time: Recitation + Textbook — Depends on how you want to approach it, you may skip this part.


  1. Rigorous teaching style with visual understanding
  2. Highly intuition driven
  3. Clear and concise with bite-size knowledge
  4. Easily derived from simple example to general case
  5. Some useful examples related to real world problem


  1. low video quality, sound is too low
  2. No code related, and no example on code


  1. Full playlist on youtube

2. Full course content on MIT OCW website including recitation, assignment and solution. To get full understanding or challenge yourself more, please check it out!!!

Book website

If you love this course please consider, donate to MIT OCW :)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store