At the crossroads of digital CX, cost, and scale

Global Technology
McDonald’s Technical Blog
6 min readFeb 7, 2023

An unconventional use of an observability platform is helping Global Technology balance reliability with affordability.

by Scott Farnum, Senior Manager, Architecture Services and Chris Gundersen, Senior Manager, Architecture Services

With each day that passes, customer use of McDonald’s Global Mobile App (GMA) increases, which means there’s more data available to improve the customer experience. GMA allows customers to interact with us digitally across three fundamental use cases: placing orders (for pickup or delivery), receiving and redeeming offers, and participating in the MyMcDonald’s Rewards program. This digital aspect of our business is responsible for millions of customer transactions every day.

Falling under the umbrella of our Global Technology Architecture & Data function, our team is responsible for observability and affordability of McDonalds’ digital tech stack. Daily, we use the New Relic observability platform to closely monitor the front-end customer experience along with back-end reliability, performance, and run cost.*

Most of our New Relic use is fairly conventional; however, we were intrigued by our unique use of the platform to monitor run cost and customer experience. We had the privilege of presenting these unique use cases at New Relic’s annual FutureStack conference in May.

Technology within a restaurant business
Before I get into the FutureStack event, I’d like to provide some context behind why we positioned our team to oversee reliability and run cost. As a restaurant business, McDonald’s has multiple daily peaks. Additionally, national promotions, such as Egg McMuffin Day and Ice Cream Day drive significant spikes. From a tech standpoint, our back-end stack, hosted on Amazon Web Services (AWS), is regularly scaling out and in to meet daily business demand, presenting some unique scalability challenges.

Our back end cannot always be scaled, as this would be cost prohibitive. On the other hand, customer experience and sales revenue will be at risk if the back end fails to properly scale out at peak times. As the team responsible for monitoring availability, reliability, and performance, we are best positioned to manage the competing requirement of affordability.

A day in the life of the McDonald’s digital tech stack
Normal, everyday use of GMA and other front-end channels is enabled by APIs and their related back-end applications and data. Underneath our proprietary technology is AWS infrastructure, whose costs are driven almost entirely by GMA consumption. In a basic mobile use case, a customer will launch the app, log in, browse rewards and deals, select an offer to redeem, place an order, and earn loyalty points. Throughout this interaction, our back-end services allow the customer to securely log in, browse customer-specific offers, place an order for customizable menu items from the nearest restaurant, and be charged the same prices they would be charged if they placed the order in person at that exact restaurant.

Behind the scenes, numerous AWS services are consumed as API requests pass through the Amazon API Gateway, data is read from Amazon RDS, messages are passed through SQS queues, data is written to and read from Amazon Dynamo DB, and a combination of containerized microservices, non-containerized apps, and lambda functions respond to APIs. Data flows to and from external channels via Amazon MSK, while current workloads may cause pods and worker nodes to scale in or out.

McDonald’s @ FutureStack: Measuring cost and CX at McDonald’s scale
McDonald’s unique scale presents the potential for significant cost and widespread UX issues as technology scales to our growing customer base. To address these risks, we uncovered unique uses of New Relic data to:

  1. Measure cost in context
  2. Monitor and replay user sessions

Measuring cost in context
As adoption of our Digital platform grew, we quickly realized the need to effectively measure the cost of cloud and third-party consumption-based services. Our goals were to:

  1. Automate measurement of digital run costs
  2. Enable predictability of run costs as the tech stack evolves and digital business grows
  3. Provide product-level-cost visibility to product owners
  4. Formally introduce “Affordability” as a non-functional requirement for all new system enhancements

Given our role in using New Relic to monitor our systems, I realized that nearly all the data needed to achieve our goals was already there. As users of New Relic’s Infrastructure, application performance monitoring (APM), and mobile monitoring capabilities, we could see AWS infrastructure consumption data in combination with APM and mobile metrics, such as throughput, error rates, and bytes received. This collection of data points provides granular, meaningful insights into how front-end customer touchpoints drive back-end consumption.

The only piece missing from the cost equation was AWS pricing, but we knew this could be accessed through their publicly published AWS Price List API. I described my goals and the potential opportunity we had to our New Relic account team, and they were keen of the idea given its universal applicability to other New Relic customers who use cloud hosting. A few short months later, the cloud optimize feature was born.

Cloud Optimize measures the cloud run cost of a specific workload during a specified timeframe. Numerous cloud services are covered, along with the ability to calculate container-level-cost based on a container’s CPU and memory consumption. This feature provides value to multiple stakeholders:

  • Developers can shift cost measurement left in the dev cycle and measure cost impact of code changes.
  • McDonald’s Markets can predict and measure the cost of running a promotion in their restaurants.
  • Technology leaders can make decisions that balance cost with performance, reliability, and availability of services.

Monitor and replay user sessions
Ensuring a positive customer experience is at the center of everything we do. The combination of back-end APM, infrastructure, and log monitoring and analytics provide insights and alerting that allow for proactive and reactive response to back-end issues that may impact customer experience.

Mobile monitoring provides meaningful insights into our customer’s actual mobile experience with metrics, such as launch times, crash rates, and crash locations — all of which can be queried and visualized with contextual data, such as user device type, OS, and OS version. While all of this is helpful, the customer-centric, high-transaction nature of our business demanded we find ways to more effectively use New Relic data to reproduce issues reported by customers. We again realized the required telemetry data was there and recognized an opportunity to enhance this by associating New Relic data with front-end user interactions, all while visualizing the customer experience with real user screen views of GMA.

This was achieved through a custom development effort we pursued with an outside partner. Using mobile breadcrumbs and custom events, we built a framework that allows for precise telemetry delivered at build time, with over-the-air updates that can be pulled by our millions of customers. This eliminates the need for continuous development enhancements while providing the required flexibility to observe our customers’ actions and experiences in a privacy-oriented secure way. We are currently evaluating an automatic telemetry framework for deployment into additional channels.

Key benefits:

  1. Stitching the breadcrumbs together with a variety of metrics and business specific markers, we can build the customer’s steps thru the app and effectively replay it, identifying the precise issue, bug, or customer frustration.
  2. This has helped solve some of our most pressing business challenges, some of which are very obscure and hard to reproduce using typical testing approaches.
  3. Despite this being additional observability, the cost of the custom solution was minimal due to re-use of existing metrics already being collected but not stitched.
  4. All of this is possible without the customer notifying of an issue, although we have a way to investigate if the customer does reach out.

* Please note that all monitoring and telemetry data is collected by New Relic’s platform in accordance with the McDonald’s Privacy Policy.

An overview of the McDonald’s Privacy Policy can be found here — https://www.mcdonalds.com/us/en-us/privacy-overview.html, and the full policy can be found here — https://www.mcdonalds.com/us/en-us/privacy.html.

--

--