Test yourself! How many of these core statistical concepts are you able to explain?
CLT, CDF, Distribution, Estimate, Expected Value, Histogram, Kurtosis, MAD, Mean, Median, MGF, Mode, Moment, Parameter, Probability, PDF, Random Variable, Random Variate, Skewness, Standard Deviation, Tails, Variance
Got some gaps in your knowledge? Read on!
Note: If you see an unfamiliar term below, follow the link for an explanation.
A random variable (R.V.) is a mathematical function that turns reality into numbers. Think of it as a rule to decide what number you should record in your dataset after a real-world event happens.
A random variable is…
When we analyze machine learning model performance, we often focus on a single quality metric. With regression problems, this can be MAE, MAPE, RMSE, or whatever fits the problem domain best.
Optimizing for a single metric absolutely makes sense during training experiments. This way, we can compare different model runs and can choose the best one.
But when it comes to solving a real business problem and putting the model into production, we might need to know a bit more. How well does the model perform on different user groups? What types of errors does it make?
In this post…
Editor’s note: This episode is part of our podcast series on emerging problems in data science and machine learning, hosted by Jeremie Harris. Apart from hosting the podcast, Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below:
There’s a minor mystery in economics that may suggest that things are about to get really, really weird for humanity.
And that mystery is this: many economic models predict that, at some point, human economic output will become infinite.
As Machine Learning infrastructure has matured, the need for model monitoring has surged. Unfortunately this growing demand has not led to a foolproof playbook that explains to teams how to measure their model’s performance.
Performance analysis of production models can be complex, and every situation comes with its own set of challenges. Unfortunately, not every model application scenario has an obvious path to measuring performance like the toy problems that are taught in school.
In this piece we will cover a number of challenges connected to availability of ground truth and discuss the performance metrics that are available to measure…
Learning data science is hard. Figuring out which resources you should be using to learn data science is even harder.
In the last four years, the internet has become inundated with resources and tools to help people learn data science — so much so, that it can be intimidating to look at the wall of resources available to you and try to decide which one will help you the most.
After some experimentation and research, I’ve found that these ten tools and resources have been the most instrumental in streamlining my learning process. The best part is that they’re free!
You may have heard, Chartio is getting acquired by Atlassian. Following the acquisition, it looks like they will be sunsetting the analytics platform that many companies have come to rely on.
Current customers will have one year to transition to a new vendor to get their analytics needs met. Here’s what they said in their announcement/migration guide:
“If you haven’t seen our announcement, Chartio has joined Atlassian and the product will be shutting down on March 1, 2022. …
You often hear the horror stories which point to ageism:
“They won’t hire me because I have no experience” (too young)
“They won’t hire me because I have too much experience” (too old)
“They hired some younger cheaper person to replace me” (too expensive)
There is a lot of published research on this (some listed at the end of this article) — all point to an inconvenient truth — old people aren’t in demand.
This guide is to help describe one common path for engineers. There are many other paths to consider but most go from engineer, to tech lead…
This article takes you on a tour of the most popular parameters in statistics! If you’re not sure what a statistical parameter is or you’re foggy on how probability distributions work, I recommend scooting over to my beginner-friendly intro here in Part 1 before continuing here.
Note: If a concept is new to you, follow the link for my explanation. If the early stuff feels too technical, feel free to skip to the cuddly critter memes lower down.
Ready for the list of favorites? Let’s dive right in!
This word is pronounced “average.”
Before we touch on the main topic, let me introduce you first to Bioinformatics. Bioinformatics is a discipline that bridges computational studies (computer science, statistics, data engineering) and biology. bioinformaticians help biologists in storing very large biological data, perform computational analysis, and transform biological queries into understandable results.
If you are a bioinformatician or have worked with one before, you probably realize one thing. For most of their work, bioinformaticians do not use Windows.
The reason is quite simple really. It is because most Bioinformatics work can't be done in Windows. And even if it is possible, there are a…
Hypothesis testing is one of the most fundamental elements of inferential statistics. In modern languages like Python and R, these tests are easy to conduct — often with a single line of code. But it never fails to puzzle me how few people use them or understand how they work. In this article I want to use an example to show three common hypothesis tests and how they work under the hood, as well as showing how to run them in R and Python and to understand the results.
Hypothesis testing exists because it is almost never the case that…