# Optimizing Hyperparameters the right Way

## Efficiently exploring the parameter-search through Bayesian Optimization with skopt in Python. TL;DR: my hyperparameters are always better than yours.

In this post, we will build a machine learning pipeline using multiple optimizers and use the power of Bayesian Optimization to arrive at the most optimal configuration for all our parameters. All we need is the sklearn Pipeline and Skopt.
You can use your favorite ML models, as long as they have a sklearn wrapper (looking at you XGBoost or NGBoost).

The critical point for finding the best models that can solve a problem are not just the models. We need to find the optimal parameters to make our model work optimally, given the dataset. This is called finding or searching hyperparameters. …

# Millennial Suicides | a Probabilistic Change-Point Analysis

## A simple, yet meaningful probabilistic Pyro model to uncover change-points over time.

One profound claim and observations by the media is, that the rate of suicides for younger people in the UK have risen from the 1980s to the 2000s. You might find it generally on the news , in publications or it is just an accepted truth by the population. But how can you make this measurable?

# Making an assumption tangible

In order to make this claim testable we look for data and find an overview of the suicide rates, specifically England and Wales, at the Office for National Statistics (UK) together with an overall visualization.

https://www.ons.gov.uk/visualisations/dvc661/suicides/index.html

Generally, one type of essential questions to ask — in everyday life, business or academia — is when changes have occurred. You are under the assumption that something has fundamentally changed over time. In order to prove that you have to quantify it. So you get some data on the subject-matter and build a model to display points at which changes in values have occurred as well as their magnitude. We are going to look at exactly how to do that. …

# The Ugly Data

How do you handle missing data, gaps in your data-frames or noisy parameters?
You have spent hours at work, in the lab or in the wild to generate or curate a dataset given an interesting research question or hypothesis. Terribly enough, you find that some of the measurements for a parameter are missing!
Another case that might throw you off is unexpected noise that was introduced at some point in the experiment and has doomed some of your measurements to be extreme outliers. …

# 3 Probabilistic Frameworks You should know | The Bayesian Toolkit

## Build better Data Science workflows with probabilistic programming languages and counter the shortcomings of classical ML.

We should always aim to create better Data Science workflows.
But in order to achieve that we should find out what is lacking.

# Classical ML workflows are missing something

Classical Machine Learning is pipelines work great. The usual workflow looks like this:

1. Have a use-case or research question with a potential hypothesis,
2. build and curate a dataset that relates to the use-case or research question,
3. build a model,
4. train and validate the model,
5. maybe even cross-validate, while grid-searching hyper-parameters,
6. test the fitted model,
7. deploy the model for the use-case,
8. answer the research question or hypothesis you posed.

As you might have noticed, one severe shortcoming is to account for certainties of the model and confidence over the output. …

# Single-Parameter Models | Pyro vs. STAN

## Modeling U.S. cancer-death rates with two Bayesian approaches: MCMC in STAN and SVI in Pyro.

Single parameter models are an excellent way to get started with the topic of probabilistic modeling. These models comprise of one parameter that influences our observation and which we can infer from the given data. In this article we look at the performance and compare two well established frameworks — the statistical language STAN and the Pyro Probabilistic Programming Language (PPL).

# Kidney Cancer Data

One old and established dataset is the cases of kidney cancer in the U.S. from 1980–1989, which is available here (see [1]). Given are U.S. counties, their total population and the cases of reported cancer-deaths.
Our task is to infer the rate of death from the given data in a Bayesian way.
An elaborate walk-through of the task can be found in section 2.8 of “Bayesian Data Analysis 3” [1].

# 10.000 hours in Data Science | Gaining Proficiency

## The way from data novice to professional

There exists the idea that practicing something for over 10000 h (ten-thousand-hours) lets you acquire enough proficiency with the subject. The concept is based on the book Outliers by M. Gladwell. The mentioned 10k hours are how much time you spend practicing or studying a subject until you have a firm grasp and can be called proficient. Though this amount of hours is somewhat arbitrary, we will take a look on how those many hours can be spent to gain proficiency in the field of Data Science.

Imagine this as a learning budget in your Data-apprenticeship journey. If I were to start from scratch, this is how I would spend those 10 thousand hours to become a proficient Data Scientist. …

# Compute the Incomputable | How SVI and ELBO work

## One reason why Bayesian Modeling works with real world data. The approximate light-house in the sea of randomness.

When you want to gain more insights into your data you rely on programming frameworks that allow you to interact with probabilities. All you have in the beginning is a collection of data-points. It is just a glimpse into the underlying distribution from which your data comes. However, you not only want simple data-points in the end. What you want is elaborate, talkative density distributions with which you can perform tests. For this, you use probabilistic frameworks like TensorFlow Probability, Pyro or STAN to compute posteriors of probabilities.
As we will see, the computation of this is not always feasible and we rely on Markov Chain Monte Carlo (MCMC) methods or Stochastic Variational Inference (SVI) to solve those problems. Especially over large data-sets or even every-day, medium-sized datasets we have to perform the magic of sampling and inference to compute values and fit models. If these methods would not exist we would be stuck with a neat model that we had thought up, but no way to know if it actually makes sense. …

# How your model is optimized | Know your Optimization

## The answer to: “Why is my model running forever?” or the classic: “I think it might have converged?”

The driver behind a lot of models that the average Data Scientist or ML-engineer uses daily relies on numerical optimization methods. Studying the optimization and performance of different functions helps to gain a better understanding of how the process works.
The challenge we face on a daily basis is that someone gives us a model of how they think the world or their problem works. Now, you as a Data Scientist have to find the optimal solution to the problem. For example, you look at an energy-function and want to find the absolute, global minimum for your tool to work or your protein to be stable. Maybe you have modeled user-data and you want to find the ideal customer given all your input-features — hopefully, continuous ones. …

# Pyro Top-Down Forecasting | Application-case

## Connect the dots over time and forecast with confidence(-intervals).

Have you ever wondered how to account for uncertainties in time-series forecasts?
Have you ever thought there should be a way to generate data-points from previously seen data and make judgement calls about certainties? I know I have.
If you want to build models that capture probabilities and hold confidences we recommend using a probabilistic programming framework like Pyro.
In a previous article we have looked at NGBoosting and have applied it to the M5 forecasting challenge on Kaggle. As a quick recap — the M5 forecasting challenge asks us to predict how the sales of Walmart items will develop over time. It provides around 4–5 years of data for items of different categories from different stores across different states and asks us to forecast 28 days that we have no information about. As an overview over the challenge and data-set we still recommend this amazing notebook. …

# NGBoost incremental Forecasting

or how to predict something you don’t know with confidence-intervals.

Currently there is a prominent forecasting challenge happening on Kaggle, the M5 Forecasting Challenge . Given are the sales of product-items over a long time range (1913 days or around 5 years), calendar information and price-info. An excellent in-depth analysis by M. Henze can be found here. The goal of the challenge is, to make informed predictions on how the sales for the individual items will continue over the upcoming 28 days.
One can approach this challenge with different methods: LSTMs, MCMC methods and others come to mind. …