Towards Data Science
A Medium publication sharing concepts, ideas and codes.

Back-to-basics on data science fundamentals

Test yourself! How many of these core statistical concepts are you able to explain?

CLT, CDF, Distribution, Estimate, Expected Value, Histogram, Kurtosis, MAD, Mean, Median, MGF, Mode, Moment, Parameter, Probability, PDF, Random Variable, Random Variate, Skewness, Standard Deviation, Tails, Variance

Got some gaps in your knowledge? Read on!

Note: If you see an unfamiliar term below, follow the link for an explanation.

Random variable

A random variable (R.V.) is a mathematical function that turns reality into numbers. Think of it as a rule to decide what number you should record in your dataset after a real-world event happens.

A random variable is…


Getting Started

How to find weak spots of a regression model

Image for post
Image for post
Image by Author

When we analyze machine learning model performance, we often focus on a single quality metric. With regression problems, this can be MAE, MAPE, RMSE, or whatever fits the problem domain best.

Optimizing for a single metric absolutely makes sense during training experiments. This way, we can compare different model runs and can choose the best one.

But when it comes to solving a real business problem and putting the model into production, we might need to know a bit more. How well does the model perform on different user groups? What types of errors does it make?

In this post…


Podcast

David Roodman on what happens when AI pushes us off the edge of the map

To select chapters, visit the Youtube video here.

Editor’s note: This episode is part of our podcast series on emerging problems in data science and machine learning, hosted by Jeremie Harris. Apart from hosting the podcast, Jeremie helps run a data science mentorship startup called SharpestMinds. You can listen to the podcast below:

APPLE | GOOGLE | SPOTIFY | OTHERS

There’s a minor mystery in economics that may suggest that things are about to get really, really weird for humanity.

And that mystery is this: many economic models predict that, at some point, human economic output will become infinite.

Now…


ML OBSERVABILITY SERIES

Performance Monitoring of ML Models

As Machine Learning infrastructure has matured, the need for model monitoring has surged. Unfortunately this growing demand has not led to a foolproof playbook that explains to teams how to measure their model’s performance.

Performance analysis of production models can be complex, and every situation comes with its own set of challenges. Unfortunately, not every model application scenario has an obvious path to measuring performance like the toy problems that are taught in school.

In this piece we will cover a number of challenges connected to availability of ground truth and discuss the performance metrics that are available to measure…


These 10 tools are helping me study data science in 2021

Image for post
Image for post
Photo by Roberto Catarinicchia on Unsplash

Learning data science is hard. Figuring out which resources you should be using to learn data science is even harder.

In the last four years, the internet has become inundated with resources and tools to help people learn data science — so much so, that it can be intimidating to look at the wall of resources available to you and try to decide which one will help you the most.

After some experimentation and research, I’ve found that these ten tools and resources have been the most instrumental in streamlining my learning process. The best part is that they’re free!

1. freeCodeCamp


Here are 3 options for analytics platforms to switch to

Image for post
Image for post
Chartio has joined Atlassian and will be sunsetting the Chartio analytics platform (image by author)

You may have heard, Chartio is getting acquired by Atlassian. Following the acquisition, it looks like they will be sunsetting the analytics platform that many companies have come to rely on.

Current customers will have one year to transition to a new vendor to get their analytics needs met. Here’s what they said in their announcement/migration guide:

“If you haven’t seen our announcement, Chartio has joined Atlassian and the product will be shutting down on March 1, 2022. …


What to know and how to fight ageism in technical roles

Image for post
Image for post
Photo by Alessio Ferretti on Unsplash

You often hear the horror stories which point to ageism:

“They won’t hire me because I have no experience” (too young)

“They won’t hire me because I have too much experience” (too old)

“They hired some younger cheaper person to replace me” (too expensive)

There is a lot of published research on this (some listed at the end of this article) — all point to an inconvenient truth — old people aren’t in demand.

This guide is to help describe one common path for engineers. There are many other paths to consider but most go from engineer, to tech lead…


Take a “moment” to explore some fundamentals

This article takes you on a tour of the most popular parameters in statistics! If you’re not sure what a statistical parameter is or you’re foggy on how probability distributions work, I recommend scooting over to my beginner-friendly intro here in Part 1 before continuing here.

Image for post
Image for post
Get your distribution basics in Part 1 if you’re new to this space. Image: SOURCE.

Note: If a concept is new to you, follow the link for my explanation. If the early stuff feels too technical, feel free to skip to the cuddly critter memes lower down.

Ready for the list of favorites? Let’s dive right in!

Mean

This word is pronounced “average.

Expected value

An expected value, written as E(X) or…


Bioinformatics

List of reasons why almost every bioinformatician use Linux instead of Windows

Image for post
Image for post
Image by Open Clip Art from Pixabay

Introduction

Before we touch on the main topic, let me introduce you first to Bioinformatics. Bioinformatics is a discipline that bridges computational studies (computer science, statistics, data engineering) and biology. bioinformaticians help biologists in storing very large biological data, perform computational analysis, and transform biological queries into understandable results.

If you are a bioinformatician or have worked with one before, you probably realize one thing. For most of their work, bioinformaticians do not use Windows.

The reason is quite simple really. It is because most Bioinformatics work can't be done in Windows. And even if it is possible, there are a…


With examples in R and Python

Hypothesis testing is one of the most fundamental elements of inferential statistics. In modern languages like Python and R, these tests are easy to conduct — often with a single line of code. But it never fails to puzzle me how few people use them or understand how they work. In this article I want to use an example to show three common hypothesis tests and how they work under the hood, as well as showing how to run them in R and Python and to understand the results.

The general principles and process of hypothesis testing

Hypothesis testing exists because it is almost never the case that…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store