Uplift Modeling for Targeted Marketing Campaign Management — A Multiclass Classification Approach in Python with LightGBM Classifier
Sending text messages, e-mail or push notification to millions of customers for a new or existing product for marketing campaign purposes is a costly process for businesses.
Instead of sending bulk marketing campaing messages, targeting and narrowing down the number of customers is a better choice for marketing budget management in terms of cost efficiency.
One approach to target customers is manually filtering customers by the properties of campaing e.g. by age, occupation, product usage etc. Thus the size of customers can be decreased accordingly.
A better approach is setting up a Uplift model in which the details of data preparation, pre-processing, modeling and performance evaluation will be explained briefly in this article.
As it is beyond the purpose and scope of this article, differences between the definitions of terms like Supervised-Unsupervised Learning, Binary-Multiclass Classification, or Training-Test dataset will not be explained.
At first, we need a training dataset which labels the customers into four segments, later which will be the main target of the model to predict. Needless to say these labels are based on past behaviour of customers responses to the marketing campaign of previous months.
- Class1 : Customers who purchased the product with marketing campaign (1/1)
- Class2 — : Customers who didn’t answer to marketing campaing (1/0)
- Class3: Customers who purchased the product without marketing campaing (0/1)
- Class4: Customers who didn’t expose to marketing campaing and didn’t purchase the product (0/0)

At the end of the modeling phase, customers who are labeled as Class1(1/1) by the model will be the target of marketing campaign.
Note: Real time results of the model mentioned in this article had not only achieved 3 times higher campaign response rate than regular campaign, but also minimized the SMS costs and time elapsed significantly by narrowing down the number of customer to be SMS texted/Push notified by about 100 times.
Data Preparation Phase
This phase can be revised according to the business and marketing campaign needs. More than 400 features of customers are gathered for the model in this article.
Main idea in this phase is to gather features that is important for the marketing campaign — business needs and labeling customers with the classes explained above.
With the help of these classes we’ll eliminate;
- Class 2 : Customers that don’t respond/purchase even if they are treated with campaign (Do not disturbs)
- Class 3 : Customers that respond/purchase even if they are not treated with campaing (Sure Things)
- Class 4 : Lost causes
Thus we’ll focus on Class 1 only (Persuadables)
Data Preprocessing Phase
As with many machine learning projects, the most challenging part of this project was the data preprocessing phase.
Firstly, null value ratio of every feature is checked
df_na = (df.isnull().sum() / len(df)) * 100
df_na = df_na.drop(df_na[df_na == 0].index).sort_values(ascending=False)[:100]
missing_data = pd.DataFrame({'Missing Ratio' :df_na})
missing_data.head(50)
We can plot the missing value ratios in an ascending order
missingdataplot=pd.DataFrame(missing_data)
missingdataplot.plot(kind='barh', figsize=(12, 14), zorder=2, width=0.85)
plt.suptitle('Missing Value Ratio',fontsize=20)
txt="Figure 3 - Missing Value Ratio of Features after dropping"
plt.figtext(0.5, 0.05, txt, wrap=True, horizontalalignment='center', fontsize=12)
Handling missing values will differentiate according to
- MCAR — Missing Completely at random
- MAR — Missing at Random
- MNAR — Missing not at random
There’re also techniques to handle missing values;
- Deleting — Dropping
- Filling with a constant, mean or mode
- Backward — Forward Fill
- Imputation (KNN, Iterative etc)
Handling missing values will be explained in detail in another article. For simplicity we can drop the missing rows which has more than %40 null values.
mask = df.isna().sum() / len(df) < 0.4
reduced_df = df.loc[:, mask]
print(df.shape)
print(reduced_df.shape)
Another issue in machine learning modeling is highly correlated features. A term called Multicolinearity will distort the variances of features.
Since the LightGBM algorithm is gradient boosting based algorithm, highly correlated features might not be problem but it’s a best practice to remove unnecessary features while training the model.
Triangle Heatmap is of the best choices for visualizing the features correlation values.
plt.figure(figsize=(16, 6))
mask = np.triu(np.ones_like(df.corr(), dtype=np.bool))
heatmap = sns.heatmap(df.corr(), mask=mask, vmin=-1, vmax=1, annot=True, cmap='BrBG')
heatmap.set_title('Triangle Correlation Heatmap', fontdict={'fontsize':18}, pad=16);
We can set a correlation value (%70 in this case) and drop features that has more than %70 correlation.
corr_matrix = reduced_df.corr().abs()
mask = np.triu(np.ones_like(corr_matrix, dtype=bool))
tri_df = corr_matrix.mask(mask)#Correlation Value(r > 0.70)
to_drop = [c for c in tri_df.columns if any(tri_df[c] > 0.70)]# Drop
reduced_df1 = reduced_df.drop(to_drop, axis=1)print("The reduced_df dataframe has {} columns".format(reduced_df.shape[1]))
print("The reduced_df1 dataframe has {} columns".format(reduced_df1.shape[1]))
Modeling Phase
Important note: This article focuses on Uplift Modeling. Code blocks shared are only for very basics of whole process. Another article will explain the feature selection and classifier comparisons in detail.
- y — Target of our Model that consists 4 different class values.
- X — Feature set after null and highly correlated feature elimination.
from sklearn.model_selection import train_test_splity = reduced_df['TARGET']
X=reduced_df.drop('MODELTARGET',axis=1)
Fitting LightGBM Classifier:
import lightgbm as lgb
X_train, X_test, y_train, y_test = train_test_split(X, y,stratify=y, test_size=0.20, random_state=0)
lgb=lgb.LGBMClassifier()
lgb.fit(X_train, y_train)
Results & Evaluation
Note that training and test accuracies’ difference shouldn’t be more than 5 units as a best practice.
from sklearn.metrics import classification_reportreport = classification_report(y_test, predicted)
print(report)
Plotting the feature importances
importances = lgb.feature_importances_
# Sort importances
sorted_index = np.argsort(importances)# Create labels
labels = X.columns[sorted_index]# Clear current plot
plt.clf()# Create plot
plt.figure(figsize=(25,25))
plt.barh(range(X.shape[1]), importances[sorted_index], tick_label=labels)
plt.suptitle('Predictor Importance',fontsize=20)
txt="Figure 9 - Feature Importance for Light GBM Model"
plt.figtext(0.5, 0.05, txt, wrap=True, horizontalalignment='center', fontsize=12)
plt.show()
KEY TAKEAWAYS
- Prepare the data for modeling with features that seems to be important according to the business/campaign needs and label customers as explained above
- Inspect the data throughly, visualize (Exploratory Data Analysis)
- Preprocess the data (Null values, Highly Correlated Features, Variance Threshold etc.)
- Choose algorithm (classifier in this case), fit and predict the model
- Inspect Confusion Matrix and classification report
- Fit the new data that model has never seen and send the campaing to customers who are labeled as Class1 — We’re not sending campaing to customers labeled as Class 2,3 and 4.
- Inspect the real time results
Next articles will be about;
- Handling missing values
- Feature selection in detail (RFE, Boruta etc)
- Handling Imbalanced classes (SMOTE, Tomek Links etc.)
- Comparison of classifier algorithms