Have you heard about “Tweedie” Loss?

Using “Tweedie” Loss in order to improve LTV predictions

Roy Ravid
Roy Ravid
Jul 7 · 4 min read

Predicting the “Lifetime Value” of customers is a very common task in many fields industry including e-commerce, insurance, media and more.

Assessing the LTV of a customer is no simple task *1

“Lifetime Value” (LTV) is a measurement that represents the total value of events a given user/customer/person will have within their whole business relationship. For example, Amazon might define LTV as the value all the purchases a person would do in the site, while a car insurance company would measure LTV as the value of all the insurance claims a person would make in his entire life.

In Roundforest, we try to assess the LTV of customers that come to our website in order to direct our marketing efforts. Analyzing our target variable, i.e. our users LTV, this is the obtained distribution:

Example from our data
Tweedie distribution examples

This is an example of a “Tweedie” distribution. “Tweedie” is a family of distributions to which the “Compound Poisson distribution” belongs.

In some cases, a Compound Poisson distribution can contain many instances of an event taking place 0 times (no purchases), and a long-tail of varying, exponentially decreasing values (zero-inflated regression).

This behavior can also be expected in our previous examples:

  • In a large e-commerce company like Amazon, most users entering the site will likely not make a purchase, and if they do, the value ranges between cents and thousands of dollars.
  • In our car insurance company, most people will (hopefully) not make insurance claims.

How does this observation help us?

We can use it to construct our loss function!

How does it work?

A “Compound Poisson distribution” is a result of the composition of multiple random variables: a Poisson RV (random variable), which determines the number of events, and i.i.d Gamma RVs, which determine a value for each event. The resulting RV is the sum of the obtained Gamma values. The “zero-inflation” is caused by a high probability of getting 0 events from the Poisson distribution.

There is a single constant in the function, p -“Tweedie variance power”. p ranges from 1 to 2 (in our case) and indicates where our source of uncertainty lies. If our uncertainty is only about the value of each event, then p tends toward 1 and we assume our data is gamma distributed. While if our dataset has fewer unique values, and only the number of events is uncertain, then p tends toward 2 and we expect a more Poisson distribution.

In a less technical term, a “Compound Poisson distribution” differentiates between the cases that have no events (no purchases or insurance claims), and events taking place; This gives the “no event” scenario a high probability.

For a mathematical deep dive see the sources below

And in our case at Roundforest?

We use Mean-Squared-Log-Error as our evaluation metric, because of how it deals with high-value outliers and its diminishing errors on larger values.

let's get our data:

import pandas as pd
import numpy as np
import lightgbm as lgb
data = pd.read_csv("./data_example.csv")
>>> (803013, 50) # 49 features and a label column
mask = np.random.rand(data.shape[0]) < 0.8 # split train-test
train = data[mask].copy()
test = data[~mask].copy()
amount_of_zeros = data.loc[date["label"] == 0.0].shape[0]
print(round(amount_of_zeros/data.shape[0], 2))
>>> 0.63

As seen in the distribution plot above, our label contains a very large portion of “0” values.

Now, we’ll train on an “out of the box” LightGBM model:

model = lgb.LGBMRegressor()
model.fit(train.drop("label", axis=1), train["label"])
test["preds"] = model.predict(test.drop("label", axis=1))
print(mean_squared_log_error(test["label"], test["preds"])*100)
>>> 0.02186369461257779

Let’s try this again, now with our “Tweedie” loss function, even without tweaking its parameters (default value of p is 1.5):

model = lgb.LGBMRegressor(objective="tweedie")
model.fit(train.drop("label", axis=1), train["label"])
test["preds"] = model.predict(test.drop("label", axis=1))
print(mean_squared_log_error(test["label"], test["preds"])*100)
>>> 0.017428740123056317

An improvement of 20.3% (!!!) in our evaluation metric.

What’s My Point?

A model can do much better by exploiting the distribution properties of both inputs and, like in our case, the target variable. Thinking about the target’s distribution should also guide our decisions when choosing our loss function.

And so, if you are trying to predict LTV values for your customers, or just have a dataset that contains many 0 values, you should definitely give “Tweedie” a try.


Roy Ravid

Written by

Roy Ravid

Data Scientist at Roundforest

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade