An application of machine learning interpretability and model selection with h2o and DALEX
In this post I estimate a number of models and assess their performance and fit to the data using a model-agnostic methodology that enables to compare traditional “glass-box” models and “black-box” models.
There are many libraries that help with Machine Learning Interpretability, feature explanation and general performance assessment and they all have gained in popularity in recent years, but for this study I’ve chosen DALEX.
Given that performance measures may reflect a different aspect of the predictive performance of a model, it is important to evaluate and…
Or how important it is NOT to stick with the default threshold @ Max F1
In this post I take a random forest model and run a multi-customer profit optimisation that reveals a potential additional expected profit of nearly £1.7 per customer (or £850k if you had a 500,000 customer base).
Furthermore, I introduce key concepts like the cut-off and F1 score and the precision-recall trade-off and and show how important it is NOT to stick with the threshold @ Max F1 that many machine learning modelling platforms select by default.
In this day and age, a business that leverages…
With the new year, I started to look for new employment opportunities and even managed to land a handful of final stage interviews before it all grounded to a halt following the corona-virus pandemic. Invariably, as part of the selection process I was asked to analyse a set of data and compile a number of data driven-recommendations to present in my final meeting.
In this post I retrace the steps I took for one of the take home analysis I was tasked with and revisit clustering, one of my favourite analytic methods. Only this time the set up is a…
In this day and age, a business that leverages data to understand the drivers of customers’ behaviour has a true competitive advantage. Organisations can dramatically improve their performance in the market by analysing customer level data in an effective way and focus their efforts towards those that are more likely to engage.
One trialled and tested approach to tease this type of insight out of data is Propensity Modelling, which combines information such as a customers’ demographics (age, race, religion, gender, family size, ethnicity, income, education level), psycho-graphic (social class, lifestyle and personality characteristics), engagement (emails opened, emails clicked, searches…
Traditional approaches to time series analysis and forecasting, like Linear Regression, Holt-Winters Exponential Smoothing, ARMA/ARIMA/SARIMA and ARCH/GARCH, have been well-established for decades and find applications in fields as varied as business and finance (e.g. predict stock prices and analyse trends in financial markets), the energy sector (e.g. forecast electricity consumption) and academia (e.g. measure socio-political phenomena).
In more recent times, the popularisation and wider availability of open source frameworks like Keras, TensorFlow and scikit-learn helped machine learning approaches like Random Forest, Extreme Gradient Boosting, Time Delay Neural Network and Recurrent Neural Network to gain momentum in time series applications. …
This year has been rather rewarding for me! After completing some of the excellent Business Science University courses, I have worked on a number of Customer Analytics & Business Intelligence projects and summarised them into technical articles that I published on various Medium’s Publications. This opened up an entirely new world to me and generated many new connections within the analytics and data science community the world over!
The idea to create my own website has been at the back of my mind for a few months. So far I’ve used RPubs.com (the free web publishing service from RStudio) as…
Statistical segmentation is one of my favourite analytic methods: it resonates well with clients, as I’ve found from my consulting experience, and is a relatively straightforward concept to explain to non technical audiences.
Earlier this year I’ve used the popular K-Means clustering algorithm to segment customers based on their response to a series of marketing campaigns. For that analysis I’d deliberately chosen a basic dataset to show that it not only is a relatively easy analysis to carry out but can also help unearthing interesting patterns of behaviour in your customer base even when using few customer attributes.
Recently I have completed the Business Analysis With R online course focused on applied data and business science with R, which introduced me to a couple of new modelling concepts and approaches. One that especially captured my attention is
parsnip and its attempt to implement a unified modelling and analysis interface (similar to python’s
scikit-learn) to seamlessly access several modelling platforms in R.
parsnip is the brainchild of RStudio’s Max Khun (of
caret fame) and Davis Vaughan and forms part of
tidymodels, a growing ensemble of tools to explore and iterate modelling tasks that shares a common philosophy (and a…
The aim of this post of mine is to show that you do not always need super complex and sophisticated machine learning models to get meaningful insights from your data.
For this mini-project I am using the popular K-Means clustering algorithm to segment customers based on their response to a series of marketing campaigns. This technique is relatively easy to implement and yet it allowed me you to gather tons of information from my data and unearth interesting patterns of behaviour in my customer base.
Market segmentation refers to the process of dividing a consumer market of existing and/or potential…
Recently I wanted to learn something new and challenged myself to carry out an end-to-end Market Basket Analysis. To continue to challenge myself, I’ve decided to put the results of my efforts before the eyes of the data science community.
This is the third and final post:
Part 1: (which can be found here) explore and cleanse a dataset suitable for modelling with recommendation algorithms
Part 2: (which can be found here) apply various Product Recommendation models with the recommenderlab R package
Part 3: implement the best performing model in a Shiny Web Application