An application of machine learning interpretability and model selection with h2o and DALEX

Image for post
Image for post
Photo by Bruna Branco on Unsplash

In this post I estimate a number of models and assess their performance and fit to the data using a model-agnostic methodology that enables to compare traditional “glass-box” models and “black-box” models.

There are many libraries that help with Machine Learning Interpretability, feature explanation and general performance assessment and they all have gained in popularity in recent years, but for this study I’ve chosen DALEX.

Given that performance measures may reflect a different aspect of the predictive performance of a model, it is important to evaluate and…


Or how important it is NOT to stick with the default threshold @ Max F1

Image for post
Image for post
Photo by David Sury on Unsplash

In this post I take a random forest model and run a multi-customer profit optimisation that reveals a potential additional expected profit of nearly £1.7 per customer (or £850k if you had a 500,000 customer base).

Furthermore, I introduce key concepts like the cut-off and F1 score and the precision-recall trade-off and and show how important it is NOT to stick with the threshold @ Max F1 that many machine learning modelling platforms select by default.

Overview

In this day and age, a business that leverages…


A Case Study Using K-Medoids on Subscription Data

Image for post
Image for post
Photo by Joyce McCown on Unsplash

With the new year, I started to look for new employment opportunities and even managed to land a handful of final stage interviews before it all grounded to a halt following the corona-virus pandemic. Invariably, as part of the selection process I was asked to analyse a set of data and compile a number of data driven-recommendations to present in my final meeting.

In this post I retrace the steps I took for one of the take home analysis I was tasked with and revisit clustering, one of my favourite analytic methods. Only this time the set up is a…


A case study on estimating the likelihood to purchase a financial product with h2o and DALEX

Image for post
Image for post
Photo by Dan Meyers on Unsplash

In this day and age, a business that leverages data to understand the drivers of customers’ behaviour has a true competitive advantage. Organisations can dramatically improve their performance in the market by analysing customer level data in an effective way and focus their efforts towards those that are more likely to engage.

One trialled and tested approach to tease this type of insight out of data is Propensity Modelling, which combines information such as a customers’ demographics (age, race, religion, gender, family size, ethnicity, income, education level), psycho-graphic (social class, lifestyle and personality characteristics), engagement (emails opened, emails clicked, searches…


How I used machine learning to implement a time series forecast of weekly revenue

Image for post
Image for post
Photo by Ben Elwood on Unsplash

Traditional approaches to time series analysis and forecasting, like Linear Regression, Holt-Winters Exponential Smoothing, ARMA/ARIMA/SARIMA and ARCH/GARCH, have been well-established for decades and find applications in fields as varied as business and finance (e.g. predict stock prices and analyse trends in financial markets), the energy sector (e.g. forecast electricity consumption) and academia (e.g. measure socio-political phenomena).

In more recent times, the popularisation and wider availability of open source frameworks like Keras, TensorFlow and scikit-learn helped machine learning approaches like Random Forest, Extreme Gradient Boosting, Time Delay Neural Network and Recurrent Neural Network to gain momentum in time series applications. …


Work with RStudio, GitHub and Netlify to create and deploy your own webpage

Image for post
Image for post
Photo by Li Yang on Unsplash

This year has been rather rewarding for me! After completing some of the excellent Business Science University courses, I have worked on a number of Customer Analytics & Business Intelligence projects and summarised them into technical articles that I published on various Medium’s Publications. This opened up an entirely new world to me and generated many new connections within the analytics and data science community the world over!

The idea to create my own website has been at the back of my mind for a few months. So far I’ve used RPubs.com (the free web publishing service from RStudio) as…


How to run an effective statistical segmentation with K-means Clustering, Principal Components Analysis and Bootstrap Cluster Evaluation using a feature-rich dataset

Image for post
Image for post
Photo by Toa Heftiba on Unsplash

Overview

Statistical segmentation is one of my favourite analytic methods: it resonates well with clients, as I’ve found from my consulting experience, and is a relatively straightforward concept to explain to non technical audiences.

Earlier this year I’ve used the popular K-Means clustering algorithm to segment customers based on their response to a series of marketing campaigns. For that analysis I’d deliberately chosen a basic dataset to show that it not only is a relatively easy analysis to carry out but can also help unearthing interesting patterns of behaviour in your customer base even when using few customer attributes.

In this…


A Tidy Approach to a Classification Problem

Image for post
Image for post
Photo by Karim Ghantous on Unsplash

Overview

Recently I have completed the Business Analysis With R online course focused on applied data and business science with R, which introduced me to a couple of new modelling concepts and approaches. One that especially captured my attention is parsnip and its attempt to implement a unified modelling and analysis interface (similar to python’s scikit-learn) to seamlessly access several modelling platforms in R.

parsnip is the brainchild of RStudio’s Max Khun (of caret fame) and Davis Vaughan and forms part of tidymodels, a growing ensemble of tools to explore and iterate modelling tasks that shares a common philosophy (and a…


Using K-Means Clustering to Understand Marketing Response

Image for post
Image for post
Photo by Nick Karvounis on Unsplash

Overview

The aim of this post of mine is to show that you do not always need super complex and sophisticated machine learning models to get meaningful insights from your data.

For this mini-project I am using the popular K-Means clustering algorithm to segment customers based on their response to a series of marketing campaigns. This technique is relatively easy to implement and yet it allowed me you to gather tons of information from my data and unearth interesting patterns of behaviour in my customer base.

What is Market Segmentation?

Market segmentation refers to the process of dividing a consumer market of existing and/or potential…


My take on Market Basket Analysis — Part 3 of 3

Image for post
Image for post
Photo by NeONBRAND on Unsplash

Overview

Recently I wanted to learn something new and challenged myself to carry out an end-to-end Market Basket Analysis. To continue to challenge myself, I’ve decided to put the results of my efforts before the eyes of the data science community.

This is the third and final post:

Part 1: (which can be found here) explore and cleanse a dataset suitable for modelling with recommendation algorithms
Part 2: (which can be found here) apply various Product Recommendation models with the recommenderlab R package
Part 3: implement the best performing model in a Shiny Web Application

Introduction

In the course of the research…

Diego Usai

Customer Insight | Business Intelligence | Marketing Analytics | www.linkedin.com/in/diegousaiuk/ | CURRENTLY SEEKING NEW JOB OPPORTUNITIES

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store