Improving the digital experience with graph-based traffic models

Vanguard Tech
Vanguard Tech
Published in
11 min readJun 18, 2024

How Vanguard’s graph-based digital session model empowers client and user experience teams to make informed improvement decisions.

In today’s digital landscape, organizations face ongoing challenges to deliver exceptional client and user experiences (CX/UX) across digital platforms. Successful organizations effectively listen to their users and deploy systems to identify, prioritize, and resolve platform issues to improve user experience. Traditional methods, such as surveys and focus groups, while valuable, often fall short of providing comprehensive and real-time insights into user behavior.

McKinsey Quarterly Report — Predictions on the future of CX

The challenges of using traditional surveys to evaluate CX systems are summarized into four primary flaws: limited, reactive, ambiguous, and unfocused. This makes them unsuited to effectively capture insights that help leaders make informed decisions about their CX platforms. Vanguard addresses these hurdles by augmenting survey data with the client behavior data, using our graph-based digital session model, TIGER (Traffic Insights with Graph Encoding and Rendering), that empowers our CX/UX leaders to make better decisions that improve our digital platforms.

With TIGER, CX/UX teams analyze behaviors of their user populations in terms of a user’s navigation patterns. When we layer this data with survey responses, we mitigate the flaws of surveys mentioned above; results are now analyzed at a population level, making insight share outs more compelling to CX leaders. The real-time nature of traffic data allows leaders to quickly make decisions about their platforms. Product teams can layer the information together into root cause analyses and ROI of platform changes can be fully measured at a population level. With the graph approach to user traffic, we can approximate how significant certain nodes or issues within nodes are to the quality of that session. This allows leaders to sort issues by their significance to the platform population and act on the most salient and troublesome issues.

Fundamentally, TIGER represents sessions in directed graphs with embedded information of the experience within nodes and edges. Nodes are distinct webpages visited in the session, and directed edges represent the traffic flow between pages. Nodes and edges are tagged with attribute vectors, such as time on page and bounce frequency, respectively. Traditionally, traffic metrics are reported at individual page levels — think bounce rates or journey completion. While these are neat measures for reporting high-level progress, there’s a lot of information about the experience that’s lost in this summarization. For example, take a user that launches a support page in their session as visualized below. This event, and those like it, are important to surface to product teams as an indicator of user uncertainty. The graph-based approach enables research teams to identify the closest nodes bidirectionally in the session to approximate the issues the user may have been facing before attempting to resolve them through the support page. Consider that these session representations can be likened to user feedback — users are casting votes on the product, not survey, by how they navigate the digital space.

Graphical representation of an example session of a user who launches a support page.

Using critical patterns in traffic to approximate intent

Defining and measuring client intent is challenging, yet forming accurate approximations enables product teams to create more relevant and client-centric features that enhance satisfaction. By examining sequences of events or flows, we contextualize user behavior and how clients are navigating through our digital platforms and anticipate future activity. Significant sub-sequences have long been used to help differentiate signal from noise in complex network data and have been shown to be strong indicators of user intent when looking at e-commerce clickstream data.

To identify significant flows within session data, we employ sequence pattern mining (SPM), a method of data mining that aims to discover interesting sub-sequences in a set of sequences. In SPMs, the richness of interest of a sub-sequence can be measured in terms of various criteria, such as frequency, length, and profit. The goal of sequence pattern mining is to efficiently discover interesting subsequences within a sequence database (i.e., sequential relationships between items that are interesting for the user), where ‘interesting’ patterns can be defined by constraints and other concise representations of candidate patterns in the sequence database. Many different SPM algorithms have been developed to efficiently find those interesting sequences.

For identifying flows, we focused on two SPM algorithms, SPADE and Gap-BIDE. The SPADE algorithm was the first SPM algorithm to utilize a depth-first search for identifying frequent sub-sequences within the dataset. The Gap-BIDE algorithm utilizes a pattern-growth search methodology, which we used to identify closed and contiguous frequent sub-sequences in our data.

With these SPM algorithms, we identify the most critical traffic patterns on a given platform. We treat these critical patterns as subgraphs. For each session graph, we calculate the subgraphs found. By concatenating the subgraph calculations, we generate a session vector where each feature can be compared with experience measures or treated collectively to describe the activity in the session and predict future sessions. We demonstrate how both treatments are valuable for CX use cases.

Measuring and improving client services with TIGER

TIGER has been successfully applied in three key use cases to improve our client’s digital experience.

1. Prioritizing feature improvement

Session subgraphs of user traffic allow labs to proactively evaluate the quality of user pathways and isolate problematic patterns. By correlating subgraph activity with experience measures, like journey completion rate, call center fallout, and in-session survey scores, we can prioritize feature improvements for the next development sprint. For example, we focused on the TIGER subgraph around the Transact tab in the mobile app.

We discovered users frequently abandoned their transaction flow after encountering a drop down to select their desired product. These users would cyclically click into the dropdown in some cases up to eight times before cancelling the transaction. Rarely did these users come back to complete the transaction they started. Teaming these traffic data points with survey comment insights, research from analysts, and UX expertise, we concluded that the current product formats may have been challenging to interpret and more education on the funds present in the drop down could resolve the issues encountered by those cyclically tapping into the drop down. In short order, these insights turned to new UX wireframes and a better layout for transacting in the mobile app.

Not only does the tool extend our vision into issues on digital platforms, but the insights are drawn much faster than traditional flow analysis. In an Agile environment where the labs are moving quickly, product owners are eager to get insights to drive the direction of their product. Tools like TIGER allow them to lead with data and make optimal bets on product improvement.

2. Predicting user behavior

Predictions around next user actions can be a rich data source for improving the client experience in serving personalized content and near-real-time support. We take inspiration from the active body of research around forming representations of prior digital activity to predict the user’s next interaction such as:

We find the subgraph vectors contain information relevant to predicting future user activity as found by training common tabular models on the session vectors as inputs. We’ve tested this on two tasks to date:

A) Predicting whether clients will return to the mobile app in the next month.

B) Predicting whether clients will contact our call center shortly after their session.

Without fine-tuning or adding complimentary data sources, we observe that subgraph vectors of clients’ past activity can effectively predict those future interactions.

We start with predicting user’s return to the mobile app. This prediction serves as a convenient engagement proxy, a function that’s dependent on the user’s recent traffic flows in the mobile app. We are encouraged by the performance on the hold out test set (F1=0.70) to derive insights using feature importance methods and learn what flows precede users leaving the app. By calculating Shapley values from gradient boosted tree models trained on session vectors, we identify flows with step-up interruptions. This occurs when a user who previously logged in with biometrics requests a feature that requires additional layers of security and is prompted to enter their credentials. The insight sharpened the focus between product and security teams to work on secure solutions with lower costs to the client experience.

In our second exercise, we trained another gradient boosted tree with recent clients’ session vectors to predict who would ring our call center shortly after their session. Analysis revealed a positive correlation between mobile check deposits (MCD) and clients calling for support. Armed with this insight, we collaborated with the technical delivery teams to improve success rates and launch experiments around communication and education of the MCD process to improve the client experience. Since our collaboration, MCD capture rates have improved by 14%, mobile CSAT is up by 8%, and the rate of successful deposits per call on MCD issues has increased by 50%.

3. Informing the client experience through graph properties

TIGER allows us to extract valuable insights into the client experience by analyzing various graph properties. For instance, node reciprocity is the measure of edges that are bidirectional over all connections from a node. This can be understood as users clicking many different links from a page and coming back, a potential indicator of missed expectations in their session. We find this behavior is characteristic of users soon to ring our call center according to our call-prediction model mentioned earlier.

Other graph properties can be helpful constructs to think about the client experience. Consider the average closeness centrality which measures the shortest distance between a target node and all other nodes. We find that users with higher average distances from the home landing page leave us higher survey scores. We interpret this as users who are effectively able to go deeper in the app have better experiences. Because users will rarely communicate feedback about a landing page over the comment field, this graph property, that measures the depth of the session, is a helpful proxy for evaluating success of landing pages.

Leveraging graphs for advanced CX analytics at scale

In our last section, we’ll discuss hosting artifacts from TIGER, namely the session graphs and subgraph vectors. Considering the signal captured in these artifacts for downstream modeling objectives, this is a critical step to encourage wider adoption within the ML community. We’ll note some of the challenges with storing and serving session graphs at scale and discuss potential solutions.

Graphs have proven to be invaluable in representing complex data relationships. When hosting graphs for analytic consumption, they can provide a powerful foundation for understanding, analyzing, and making predictions based on intricate connections. Making session graphs available in close-to-real-time fashion not only unlocks methods of implementing call center intent prediction — and reducing transfer or hold times — but also drives personalized recommendations in sessions or dynamic search capabilities. However, storing and serving session graphs at scale can pose several critical challenges.

Considerations at scale:

  • Scalability: As the volume of data and the complexity of relationships grow, traditional storage and serving solutions may struggle to keep up. Graphs can become prohibitively large and hard to manage.
  • Latency: In real-time analytics and dynamic applications, high latency in serving session graphs can hinder user experiences and model performance, making timely insights difficult to obtain.
  • Resource constraints: Allocating resources to store, maintain, and serve large graphs can be costly, both in terms of infrastructure and operational overhead.
  • Data consistency: Ensuring that the data in session graphs remains consistent and up to date across various models and applications can be a significant challenge as well.

To address these challenges, we approach the problem of serving representations of traffic from a few different angles. First, we acknowledge that the data object of NetworkX graphs may not be a familiar one. So, we offer the more familiar, tabular representation as interactable features. The session subgraph vector provides excellent coverage on 95% of all pages in the graph space and 99% of critical patterns identified in traffic data. Each feature is interpretable and can offer insights that analysts care to bring back to their partners. Like the raw traffic data, it’s stored in distributed fashion in AWS S3 tables and can be queried with common engines like Athena. For those more ambitious practitioners, we offer on-the-fly computation to translate tabular logs to session graphs using common graph processing packages. These approaches serve most of our client’s needs and their use cases.

To serve in-session recommendations and dynamic search capabilities informed by prior session activity, we aim to create client embeddings using historical sessions graphs as training objects for graph neural networks. Now that our partners are making use of the navigation features, we can confidently point them to the client embeddings that, though not as easy to interpret directly, can compress lots of historical session activity that is easier to use for modeling client activities and preferences than the session graph objects. We’re exploring different ways of storing these embeddings with both node and graph embeddings and the optimal frequency of embedding updates based on the personalization use cases served.

Shaping the future of digital platforms

TIGER represents a significant step forward in our ability to understand and improve user experiences. By leveraging graph-based session modeling, we can effectively capture user intent, prioritize feature improvements, predict user behavior, and inform CX decisions with data-driven insights. As we continue to explore the potential of graph-based modeling, we are confident that TIGER will play an increasingly significant role in shaping our digital platforms’ future. With this tool, we evolve our means of understanding client experience from systems that are limited, reactive, and ambiguous to systems that are representative of all digital clients and describe those experiences in terms of behavior and digital interactions.

Building a digital session backbone model required close collaboration from diverse teams and access to state-of-the-art technology. At Vanguard, we had the invaluable support of leadership who empowered us with resources to contribute AI solutions to hard CX problems the business faces. We had subject matter experts help us laser in on critical use cases, and their feedback refined the product to one that can generate actionable client experience insights. We also had dedicated focus from our data engineering partners who built web and mobile data blocks from which our model can extract common sequential patterns. Because of the sheer volume of session data, access to cloud resources has enabled rapid development of the digital session model that has led to faster user insights and ultimately a better client experience.

Come work with us!
Vanguard’s technologists design, architect, and build modernized cloud-based applications to deliver world-class experiences to 50+ million investors worldwide. Hear more about our tech — and the crew behind it — at vanguardjobs.com.

©2024 The Vanguard Group, Inc. All rights reserved.

--

--