Have you heard about “Tweedie” Loss?

Using “Tweedie” Loss in order to improve LTV predictions

Predicting the “Lifetime Value” of customers is a very common task in many fields industry including e-commerce, insurance, media and more.

Image for post
Image for post
Assessing the LTV of a customer is no simple task *1
Image for post
Image for post
Example from our data
Image for post
Image for post
Tweedie distribution examples
  • In our car insurance company, most people will (hopefully) not make insurance claims.

How does it work?

A “Compound Poisson distribution” is a result of the composition of multiple random variables: a Poisson RV (random variable), which determines the number of events, and i.i.d Gamma RVs, which determine a value for each event. The resulting RV is the sum of the obtained Gamma values. The “zero-inflation” is caused by a high probability of getting 0 events from the Poisson distribution.

And in our case at Roundforest?

We use Mean-Squared-Log-Error as our evaluation metric, because of how it deals with high-value outliers and its diminishing errors on larger values.

import pandas as pd
import numpy as np
import lightgbm as lgb
data = pd.read_csv("./data_example.csv")
print(data.shape)
>>> (803013, 50) # 49 features and a label column
mask = np.random.rand(data.shape[0]) < 0.8 # split train-test
train = data[mask].copy()
test = data[~mask].copy()
amount_of_zeros = data.loc[date["label"] == 0.0].shape[0]
print(round(amount_of_zeros/data.shape[0], 2))
>>> 0.63
model = lgb.LGBMRegressor()
model.fit(train.drop("label", axis=1), train["label"])
test["preds"] = model.predict(test.drop("label", axis=1))
print(mean_squared_log_error(test["label"], test["preds"])*100)
>>> 0.02186369461257779
model = lgb.LGBMRegressor(objective="tweedie")
model.fit(train.drop("label", axis=1), train["label"])
test["preds"] = model.predict(test.drop("label", axis=1))
print(mean_squared_log_error(test["label"], test["preds"])*100)
>>> 0.017428740123056317

What’s My Point?

A model can do much better by exploiting the distribution properties of both inputs and, like in our case, the target variable. Thinking about the target’s distribution should also guide our decisions when choosing our loss function.

sources:

  1. https://clevertap.com/blog/customer-lifetime-value/
  2. https://appliedmachinelearning.blog/2018/08/31/lets-talk-about-numeric-distributions-python/
  3. https://www.casact.org/education/annual/2009/handouts/c25-meyers.pdf
  4. https://lightgbm.readthedocs.io/en/latest/Parameters.html

Data Scientist at Roundforest

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store