Sitemap
bigdatarepublic

DATA SCIENCE | BIG DATA ENGINEERING | BIG DATA ARCHITECTURES

How to obtain advanced probabilistic predictions for your data science use case

--

Many data science use cases involve predicting a continuous quantity. For instance, a grid operator might want to predict the energy consumption level for a group of households for next week. In order to deliver these predictions, the BigData Scientist will apply machine learning algorithms to a large collection of features, such as the family size, weather forecasts, property value and last weeks consumption levels. There are many use cases of this type, for example, predicting sales numbers, hotel rooms booked, money transfers or the time-to-failure of critical components. But what number do we actually want our algorithm to output?

A single number representing the expected amount of energy consumption per household? Or perhaps a lower and upper bound of this point prediction? Inevitably, the algorithm will be uncertain to a lesser or greater extent about its predictions. It is important to note that the business value of a data science use case may be greatly improved when we gain information on this range and/or the uncertainty of the algorithm. For instance, when we predict a particular household will likely have either a very low or a very high consumption, we can ignore the unlikely possibility of a medium consumption level in our risk simulations. Below, I illustrate the difference between different algorithm outputs.

The figure shows an example prediction for a single household, which depends on (i.e. is conditioned on) its features. By default, many algorithms (e.g. least squares regression) will output the conditional mean (A). The conditional mean tells us that a group of similar households will, on average, have medium-high consumption levels, and we should therefore expect this particular consumption level for this household. In a previous blog post, my colleague Benoit Descamps described how one can augment this point estimate with prediction intervals (B) for a complex nonlinear algorithm such as xgboost, using quantile regression. However, conditional density estimation (C) provides a more complete picture: a predicted probability for each potential target value (n.b. we can also use the conditional density to easily compute the mean and upper/lower bounds). In a recently published paper from some of my ex-colleagues at Radboud University, Kernel Mixture Networks are presented as a class of (deep) neural network algorithms to perform conditional density estimation. Together with fellow Data Scientist Jan van der Vegt, I implemented and explored a version of this algorithm. The reader is referred to Jan’s excellent blog for the technical details, animations and the source code of our TensorFlow implementation.

In sum, conditional density estimation with Kernel Mixture Networks provide an elegant way of gaining valuable probabilistic information about the predicted continuous quantities. This knowledge can be used to derive prediction intervals for planning or fed to large simulations for resource optimization, and ultimately deliver additional business value.

BigData Republic provides Data Science as-a-Service. Our Big Data Scientists have experience with advanced predictive modeling. Interested in what we can do for you? Feel free to contact us.

--

--

bigdatarepublic
bigdatarepublic

Published in bigdatarepublic

DATA SCIENCE | BIG DATA ENGINEERING | BIG DATA ARCHITECTURES

Alexander Backus
Alexander Backus

Written by Alexander Backus

Data scientist with cross-industry consultancy experience.

No responses yet