Artificial Intelligence-driven search enhancement using look-alike zip codes

Shreyas SD
MiQ Tech and Analytics
8 min readJul 13, 2022

Shreyas SD, Team lead, RnD/data Science, MiQ

Third-party cookies are being deprecated across web browsers, which means we are going to have to find new data sets and new ways of delivering the same experience at the pre-campaign level to marketers that are driven by artificial intelligence and data science. Let’s explore one new AI-powered feature focused on the use of postal code data.

The AI /DL-driven pre-campaign search expansion of users, sometimes known as lookalike modeling, works by suggesting “N-dimensionally similar” zip codes based on demographics, TV viewing patterns and OSM data (which captures accurate information about individuals, commercial establishments, amenities, social class) to the actual search. For example, a search resulting in just 75 zip codes can be broadened to 1250 or more by finding users with similar traits or behaviors.

This can be used at every stage of the marketing funnel, to:

How do we know the model is successful?

It first must be able to recommend “N dimensionally similar” zip codes in order to enhance search results of users, similarity based on a wide range of datasets

We then seek internal and external validation from our commercial pods and their clients, that the search results are as expected and are working to drive the performance of ad campaigns.

The existing architecture

As part of our Proof Of Concepts to provide advanced insights for futureproofing, we have been using various traditional ML techniques (PCA/LDA) and various correlation techniques (Pearson R/Spearmann) to identify linear relationships between various verticals /features of a given dataset. Our study until now was only limited to linear correlation among features.

The performance of these models tends to decline as the input dimensionality increases, hence the interest in capturing non-linear and higher-order relationships between features. But there can be various other non-linear dependencies/relationships between features, which have been overlooked by existing methods.

**HEADS UP — TECHNICAL STUFF AHEAD**

The following techniques will be used:

  1. Identification, extraction and projection of non-linear and higher dimensional relationships among zip codes based on various datasets.
  2. Identification of a set of look-alike postal codes for a given postal code (ex : A123→ Trained Model→ {C 234(98% similar), D654(0.97 similar)} etc). Sorted based on similarity.
  3. Able to represent each Zip-code as an N(512) dimensional embedding/vector in space.

4 . Generating a bottleneck layer that would have a compressed learned representation of each sample, which will be used for similarity calculation and identification of look-alikes for a given set of zip codes.

AUTOENCODERS FOR NON-LINEAR FEATURE EXTRACTION AND BOTTLENECK LEARNED REPRESENTATION AND LOOK-ALIKE GENERATION:

Autoencoders(AEs) have emerged as an alternative to manifold learning for non-linear feature extraction and nonlinear feature fusion.

  • An Autoencoder with non-linear activations functions is able to extract non-linear features unlike PCA /LDA/Pearson R or spearmann which extract are only for linear feature extraction.
  • Hence, in highly non-linear spaces ex: samba, census and OSM data.
  • Autoencoders, such as the variational autoencoder (VAE)) with KL loss and Stacked Denoising Autoencoder would be better in terms of dimensionality reduction and non-linear feature extraction and also in the representation of each sample of data as an N-dimensional embedding (useful in getting lookalike zip-codes, for example).
  • Autoencoders are the best way to learn the non-linear feature representation in an unsupervised way.

This image shows: that the input layer is Census and Samba data sets joined with N=1800 features. And the bottleneck layer contains the learned space-compressed representation of these features.

SELF ORGANIZING MAPS (SOM’s) FOR UNSUPERVISED CLUSTERING OF DEMOGRAPHIC, CENSUS AND OSM DATA:

  • SOMs provide a way of representing multidimensional data (>250/300 D) in much lower-dimensional spaces — usually one or two or few dimensions. This process, of reducing the dimensionality of vectors, is essentially a data compression technique known as vector quantization.
  • In addition, the Kohonen technique creates a network that stores information in such a way that any topological relationships within the dataset are maintained.
  • SOM has an obvious advantage in terms of topology preserving order. By using a self-organizing map network as the main framework for unsupervised clustering, semantic knowledge can also be easily incorporated so as to enhance the clustering effect.
  • Training a SOM requires no target vector. A SOM learns to classify the training data without any external supervision whatsoever.

The below example shows our entire dataset(M*N) would act as an input layer, which would be mapped to much lower-dimensional neurons/lattices in the output layer(X*Y), Where (M*N >> X*Y) with each node in the output layer competing for that sample to be assigned to it.

Why use OSM for enhancing look-alikes?

  • Suggesting look-alike zip codes using OSM data provides a much more accurate measure of similarity of zip codes since OSM captures key information — individuals, commercial establishments, amenities, social class, and their behavior at the zip code level.
  • OSM data helps to segment and identify similar zip codes at the very niche level, greatly increasing the accuracy of the look-alike zip codes recommended.
  • It helps us in Identifying geographic-economic-sociological similarities and differences among zip codes.

We went beyond just demographics and TV viewing patterns to determine zip code similarity (ex: the model suggested Toowoomba and Rockhampton as being similar. And we found that they are both towns with military bases).
An example of the features which the model was engineering was combining fuel amenity, industrial building, and metal construction craft. In this case, it was looking to find similar industrial towns

Autoencoders for unsupervised learning.

Unsupervised learning relies on data only — no labels.

  • ex ; CBOW and Skip-gram word embeddings: the output is determined implicitly by the input data.
  • These Auto-Encoders learn to represent themselves.
  • This produces a high-dimensional embedding/vector representation from the input data.
  • We will still need to define the loss(MAE|MSE|RMSE).
  • AE’s are designed to reproduce the input.

Feature in focus: Search result enhancement using look-alikes (SPD)

Objective: Generate ‘look-alike“ postal codes for the search being made.

Identify similar zip codes at the very niche level.

Data: LG, Vizio, OSM data, Census (US), Experian(US and AU) , UK (Skyrise), OSM (Open Street Maps)

Concepts: Deep Learning (Auto — Encoders and Self Organizing Maps) , unsupervised clustering.

Datasets and pre-processing.

Datasets used:

TV data: Vizio, Samba

Demography data: census, Experian(US &AU)

Open Street Maps data for US/AU

Note: After the addition of OSM, the number of features increased to 2000 odd.

AUTO-ENCODER ARCHITECTURE

The latent learned space is the output from a hidden layer with 500 neurons

METHODOLOGY

  • Using the Autoencoder described above, we trained it using the df-combined-preprocessed(which includes census, Vizio-samba and OSM datasets, pre-processed). Both X and Y are the same.
  • Parameters :

Initializer = LecunNormal(). #initializes the parameters of the network as a normal distribution (or Gaussian)

Optimizer= optimizers.SGD(learning_rate=20, momentum=0.9)

  • Once the training reached convergence, we generated a new model with the encoder part only(until the latent space layer).
  • Next, we got the predictions/learned space representation (33120,500) from the encoder by passing the df-combined-pre-processed as input. (33120, 1761)
  • Now, this output from the encoder is the 512D embedding / learned representation of all the zip codes, taking into account both linear and non-linear factors.
  • We then used this output to generate a 33k X 33k similarity matrix sorted based on similarity.
  • And this is finally used to enhance the search of the user.
  • Help users expand the size of their search to incentivize usage of cookie-less HUB over the existing framework(Cookie based)

T-Sne visualization of the learned space representation. It shows how vectorially similar zip codes are spread across (ex: the purple areas)

Hyper-parameter tuning, optimization and validation

We performed hyper-parameter tuning for :

Number of layers

Size of each layer

Initializers

Optimizers

Cost function where we arrived at the best possible set of parameters that resulted in lowest value of RMSE.

Internal Validation

Get model’s outputs /Predictions from the Encoder.

Get the N*N Similarity matrix.

No of rows = no. of inputs zip codes

No of columns = ALL zip codes

Values in matrix = similarity score given by model for the (i,j) zip codes.

Next perform PCA or LDA on the original dataset using features and visualize it using T-Sne, ideally for a given set of zip-codes the look-alikes /recommendations from the model should correlate or align with the scatter plot of T-Sne, which is based just on the features.

Back-Testing summary:

  • Percentage increase (averaged across all the tests) in the number of targetable zip codes — for medium number of initial audience — 15.5X for a Larger initial audience(>25k) — 1.8X
  • Percentage “expected “ increase in the revenue generated or media spend from the tested advertisers (averaged across them) as a result of targeting an increased number of zip codes → 15.5X for medium-size advertisers(American 15.5X | Equifax 15X) and for large advertisers(avg 1.8X)
  • Percentage of recommended zip codes that were actually found to be similar when backtested — we’ve observed that for a bunch of initial audiences, at least 70 to 95% of the recommended zip codes were found to be similar — when validated based on any of the 3 datasets — Experian/OSM/TV datasets(Samba+Vizio)

User interviews and validation:

Two rounds of validation (internal) — analytic outputs in early Q3, UX experience validation in mid-late Q3

Internal users loved the new design. They also loved the new concept — Search Result Enhancement.

We received positive feedback for the brief response builder as that allows users to save their insights and share them with clients directly via Hub.

Feedback from user Interviews

“At first sight, searching for TV personas from Audience got slightly confusing (TV Insights vs TV persona ).” -sales director, Atlanta

“TV and demography insights are extremely valuable” -Senior account manager, South

“I like having the cookieless insights on the dashboard, it’s going to be super helpful”. -Senior director, account management, US

“Fullscreen add to plan is intuitive. The look-alike feature is awesome and the layout looks good. I really like the plan concept.” -Sales director

--

--

Shreyas SD
MiQ Tech and Analytics

Team Lead RnD | Artificial Intelligence| NLP| Computer Vision |Machine Learning| Program Management |Data Science at MiQ | Research and Development |