Clustering Customer Interactions with D3 Quad-Tree

Clustering of customer interaction points over a 5 second period
Reducing the visualization complexity with quad-tree clustering

At Kabbage, we provide access to small business loans for thousands of customers and, subsequently, need to provide exceptional customer service. Many times this service comes from a customer navigating our user-friendly website, but more often than not an on-the-go small business owner will want to call in and talk to a human. For our customer success agents to provide the best service, it is imperative for them to understand every interaction between our customers and our product.

Currently, our agents tab back and forth between applications — all while being on the phone with the customer. This is necessary because of the amount of interaction touch points we generate in this digital age — emails, phone calls, web and mobile visits, mailings, surveys, and more. If we were to try and visualize all these interactions on a timeline, it would quickly get not just cluttered but unreadable.

TLDR; The Angular and React code for the below examples can be found on GitHub.

Many customer interaction points overlapping each other on a scatter plot
Initial view of interactions for a customer

One possible solution to this problem could be using d3-force to separate those interactions (so they don’t overlap). While this will work in some situations, too many overlapping points will push some of the interactions too far from their actual x- and y-positions, which can distort an agent’s understanding, as shown below. In addition, the forceSimulation function provides an almost infinite number of tweaks with respect to repulsion, spacing, and collision, which makes it difficult to create a set of parameters that will always work.

Using d3-force simulation to place points
Clustering of points on a chart with a force directed simulation
Using d3-force to cluster the points ultimately distorting the x- and y- positions

Another approach, commonly used to cluster/group geographic markers on a map at higher zoom levels, is d3-quadtree. Per Mike Bostock, the creator of D3:

“A quadtree recursively partitions two-dimensional space into squares, dividing each square into four equally-sized squares. Each distinct point exists in a unique leaf node; coincident points are represented by a linked list. Quadtrees can accelerate various spatial operations, such as the Barnes–Hut approximation for computing many-body forces, collision detection, and searching for nearby points.”

The first step in the process, before starting the search, is identifying an optimal grid size. The size of the grid is something left to the developer (smaller grids = less overlap amongst points and therefore less clusters).

Breaking a chart canvas into equally sized grids
Breaking apart the canvas into equal area grids for clustering

Once the grid is defined, the quad-tree search algorithm will traverse all the data points and find similar points that are in a particular grid item. Once all the points are found, the average position of the points is kept, along with an array of all the points (and their attached object properties).

d3-quadtree search functionality
A chart of clustering points based on a quad-tree algorithm
Visualizing the quad-tree algorithm

When the algorithm finishes searching through all points, the output is an array of reduced points. The algorithm does not “throw away” all the original points, but instead nests the original points with their positions and any identifying properties within the reduced array, as shown below.

const clusteredPoints = [
[cluster_x, cluster_y, [
[original_x, original_y, {id: 1, name: 'node-1'},
[original_x, original_y, {id: 2, name: 'node-2'}
Clustering of customer interaction points over a 5 second period
Result of quad-tree clustering

At this point, we could apply a mouseover or click action to each clustered point, so that the agent can break the cluster apart and see the points in detail. But I will leave that up to the imagination/designer!




Recommended from Medium

Which factors determine your grief duration?

Non-Parametric Tests in Hypothesis Testing

Google Data Studio vs Tableau: A Comparative Analysis of Visualization Tools — Digital analytics…

Predicting Short-Term Stock Movements with Quantitative Finance and Machine Learning in Python

Calculating a Portfolio of Securities : Finding Best Rate of Return for selection of Best…

A Study of Travelling Salesperson Problem

Travelling Salesperson Problem

GenerasiGigih Journey — Week 2

Putting Three Meals on the Table

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Christopher Lanoue

Christopher Lanoue

Engineering and Innovation Director, Graphicacy

More from Medium

Word2Vec -CBOW vs SkipGram

Evaluation of Natural Language Processing Tasks

Shallow Question Answering using NLP

Semantic Search Introduction