Experimentation in the Modern Digital Firm
A shallow dive into the deep waters of digital experimentation and testing technology
By: Ryan Mason, Lead Growth Architect at BCGDV
Picture this: two co-workers are waiting at the airport and need to get to a client office for a meeting. Each opens their phone, submits a ride request from the same ride-sharing company, and plans to take whichever driver has the shortest wait time. One gets a pop-up for a limited-time promotion of up to $5 off the ride and the other does not — even though they are standing in the same location, at the same time, requesting a ride to the same destination.
This likely isn’t an error — it’s deliberate design.
Modern digital companies conduct dynamic “experiments” like this every day to better understand customer behavior during key moments in the user experience. This happens to customers when shopping online, using their favorite social media app, booking a flight, or even just reading an article online.
In this blog feature, we’ll explore why digital experimentation is critical to driving growth. We’ll also share a proven framework for implementing the tools, processes, and culture needed to experiment successfully in your organization.
The digital experiments boom
Today, it is safe to assume that almost every digital experience is monitored, tested, and optimized based on aggregated user behavior in response to changing stimuli. Recently, the use of these tactics has sharply increased due to improved technology and lower cost. This has resulted in a swarm of digital firms — startup and mature alike — running thousands of micro-experiments each year in the search for continuous product improvement. But why?
One reason behind the rise of digital experimentation is that digital firms can run experiments far easier than other firms — like physical retail, for example. In a digital environment, the marginal cost of running digital experiments is ever approaching zero. Booking.com, for example, is able to conduct over 25,000 digital experiments annually, and at any given moment is running 1,000 concurrent experiments.
Another reason is that even minor changes can have a significant impact. Famously, Microsoft’s Bing improved revenue by $10 million annually — just by changing the color of its hyperlinks from light blue to a slightly darker shade of blue. Not all experiments are as bite-sized as color changes — experiments often focus on demand elasticity, willingness to pay, advanced incentive design, pricing models, checkout options, and other complex business challenges.
Digital businesses like ride-sharing apps or travel booking websites rely heavily on continuous experimentation to understand individual willingness to pay based on hundreds of signals and data points like time of day, previous history, weather, and other factors. Online merchants do the same when seeking to improve metrics, showing different variations of the website experience to see which version enhances conversion rate.
Often, these experiments are conducted in “A/B” form, meaning one minor change is isolated from the control in order to observe the difference in user behavior between a test group and the control group. For example, a company might test changing the color scheme or removing an extra checkout step in order to see comparisons versus the original version. Social media giants constantly tinker with UX or content algorithms to find what makes users spend more time on the feed. The list goes on: there are countless examples of experimentation to be found in the digital economy today.
Once an experiment confirms a winning variant, the changes can be released to all users or bigger batches of user groups to reap the economic benefits at scale. Increasingly, rigorous experimentation is becoming the norm. Modern digital firms persistently seek optimization, using experimentation to answer questions like:
- When is the best time of day to send this email to this user group to maximize revenue?
- How much can we stretch our pricing to maximize margin without losing demand?
- How many more new users will complete account onboarding if we remove this step?
- How much should we offer our refer-a-friend incentive?
- Should we build this new feature?
Today, these are literally million-dollar questions, and modern digital firms are dedicating an increasing amount of resources towards enabling their teams to answer them.
The history of digital experimentation
Sometimes referred to as growth hacking, agile experimentation, or plainly online controlled experiments (OCEs), or A/B testing, this type of experimentation has been around for some time. Rooted in agile product development, it is now an integral part of the product development lifecycle, considered table stakes for serious players.
Drawing from the randomized control test (RCT) widely used in medical sciences, the online experiment uses the same foundational statistical principles of sample size, control groups, and confidence levels.
An early pioneer in this area, Google is reported to have run its first digital A/B test on search results in 2000. Nine years ago, it launched Google Optimize, a SaaS product dedicated solely to helping other companies run experiments on their own websites and apps.
Today there are dozens of similar tools in the market that offer some form of advanced testing services. Some larger companies build their own solutions in-house. Leading firms use data from experiments — A/B, multivariate, and more — to make capital allocation decisions and embed a culture of experimentation into their teams.
Those who have learned to use these tactics have been able to unlock major growth trajectories. But conducting a successful growth hacking experiment requires sophistication within technology tools, scientific processes, and a commitment to a culture of experimentation.
Getting started with agile experimentation
Most experiments are limited to a small subset of the user base, are completely reversible, and run just long enough to ensure the results from the sample are representative and thus the learnings can be assumed to apply to other customers. By limiting exposure for each experiment to a small fraction of users, firms can run hundreds of experiments simultaneously — without contamination — and can quickly reverse decisions that negatively impact user behavior.
In digital products with many end users (like apps or websites), experiments can usually be conducted within days or even hours. Decisions are swiftly made to either apply the change for all customers or kill the initiative.
Key to scalable experimentation is assembling the right technology that combines product, marketing, engineering, and statistics into a manageable system. Without it, engineering and analytics resources will be consumed with running these processes manually, taking time away from the development of the core product offering.
A series of connected technologies can be assembled to enable modern experimentation:
Some large firms like Microsoft and Amazon have embraced the benefits of experimentation, building massive in-house systems and technology solely dedicated to experimentation. But for most firms — start-up or mature — building the necessary technology in-house is too expensive and demands too many engineering or data science resources. As such, many opt to “buy” the capability from a wellspring of vendors that have emerged in the last decade.
These providers offer turnkey solutions that help digital businesses enable rapid experimentation and A/B testing capabilities at scale with minimal effort. Each tool can be connected to a business’ digital app, website, or other user-facing system, achieved through API or software development kit (SDK) connectivity that requires a moderate amount of implementation work. The connected series of tools offer individual teams a “self-service” experimentation capability, helping them correctly design and expose experimental treatments with the appropriate statistical constraints on sample size and confidence levels.
Below is a quick overview of the tools in a connected tech stack that can deliver an end-to-end growth experimentation engine for digital firms. Many of these solutions fall into the broader category of “digital optimization systems,” but each have unique roles to play. Depending on the nature of your business, other tools not listed here may include promotions management software, pricing science software, or others.
- Attribution: Track the direct deterministic results from marketing, experiments, promotional variants, and other incentivized user activities with great detail. Tools like this are Step 1 in the process. After all, if you cannot measure unique user journeys appropriately, conclusions are difficult to reach. Examples include: AppsFlyer, Kochava, Branch.
- User analytics: Record the actions of the entire customer base of your business, generate descriptive statistics, and understand the average performance of any particular action (conversion rate, order value, time spent on page, etc.). Without baseline levels of analytics, measuring incremental lift will not be possible. Examples include Google Analytics, Mixpanel, Amplitude.
- Customer data platforms: A common tool for most businesses, CDPs (Customer Data Platform) and User Segmentation tools help stitch together the string of data sources and data destinations to generate one comprehensive view of each customer. Tools like this can enable businesses to slice and dice user groups based on behavior (observed or predicted) — and plan unique experiences based on that information. It is worth mentioning that this usually happens in near-real time. Examples include: Segment, Amperity, Blueshift, Treasure Data
- A/B testing platforms: Push the actual experiment into production, conducting tests without heavily relying on engineering capacity to release changes. A/B testing platforms provide “feature flagging” capabilities and allow firms to correctly design and expose experimental treatments to users with the appropriate statistical constraints on sample size and confidence levels. Once your test reaches significance, it can be automatically stopped or scaled up. Examples include: Optimizely, LaunchDarkly, Amplitude Experiment, Google Optimize.
- User engagement and marketing automation: Deliver the right message to user groups across multiple communication channels (SMS, email, push notifications, etc.). This can also be done in real-time by using automated signals collected from the other tech stack components. It is often here or in-app when your customer will engage with the experimentation content that you wish to measure. Examples include: Hubspot, Braze, Iterable, Klayvio.
Are the tools worth the cost?
The technology that enables this type of experimentation has become cheaper and more accessible in recent years, but still tends to be a modest and mostly fixed expense.
Anyone running a startup or growth-stage venture will know that more fixed costs (and operating leverage) are undesirable. Additional costs that are not critical to the user experience can make or break unit economics — the marginal benefit per user must outweigh the marginal cost per user.
Licensing these types of systems has become commonplace, and when used correctly they can deliver incredible value. A simple rule would be that the tools must be expected to increase contribution margin per user beyond their total cost, on a per user basis. Depending on the size of the customer base, this can often be a relatively low hurdle, resulting in an implied increase of revenue per user (RPU) of a few dozen basis points in order to recover the cost of the technology.
The software offered in this sector today seems to deliver on the value-creation promise well beyond total cost of ownership. Firms must conduct their own business case assessment for each type of technology using a per-user cost benefit analysis to understand whether the investment is worthwhile. If, say, a tool can increase website conversions by 0.5 percent or average customer lifetime value by 1.25 percent, most businesses would find the investment works in their favor.
Not only must the right technology be in place, but your people, operations, culture, and leadership must also be in sync. Running over a thousand experiments per year — nearly 100 each month — is no easy feat and requires a disciplined process. In many ways, the biggest challenge of rapid experimentation is making the cultural commitment to it.
Developing and operationalizing a culture of experimentation
Once the economics are verified, firms need to operationalize a rigorous process and methodology. A key component of this is autonomy — enabling teams to run independently at their own speeds.
In modern agile product development, a typical sprint cycle can last between two and four weeks, which is faster and more iterative than a traditional monthly, quarterly, or annual business review cadence. Teams comprised of cross-functional participants — such as product managers, engineers, designers, analysts, and marketers — contribute to the management of this process largely by committee. Experimentation should be tightly woven into the sprint process, operating firmly within the “inner circle” of the two-speed model.
In a two-speed operating model, the democratized approach to experimentation allows for faster learnings and a higher volume of experiments:
In addition to autonomy, these groups need dedicated testing headspace and “backlog real estate” to ensure that experiments are treated with proper importance and see the light of day.
Since this is often hard to secure within product-led teams, it is important to have leadership that fosters a structure supportive of the “democratization” of experiments, granting everyone the ability to submit ideas and generate hypotheses. While easier said than done, this process can start with a simple hypothesis intake form that is visible to all team members in the form of a publicly available log. From here, it is up to the team to commit to the experimentation mindset.
Each experiment should have a goal, hypothesis, acceptance criteria, statistical design, measurement, and behavior change:
Once a minimum detectable effect is reached, the experiment can automatically scale or, based on pre-established guidelines, be sent to a committee for review.
As one example, a BCGDV fintech venture recently conducted an experiment that resulted in a 30 percent lift in a core metric compared to the control group. The goal of this experiment was to understand which communication method (e.g. SMS or push notification) was most effective at driving new users to make incremental transactions (in this case, cashing in with a payments app). After the results reached significance, the experiment was scaled up into the evergreen marketing automation program to realize the benefit at scale.
Another BCGDV client, a home energy provider, tested the font size on its website. The result? Increasing the font size improved the website conversion rate by 5 percent. Next, they included a countdown timer on the website to add a sense of urgency to the conversion flow. The timer improved conversion rate by another 2 percent. These are simple experimentation examples that can build significant impact over time. With a marginal cost of close to zero, the benefits of these experiments are virtually all upside.
Setting the guiding principles
Democratizing experimentation isn’t without its challenges — teams could inadvertently release a breaking change to production, or each selfishly optimize toward a certain set of goals. That’s why — as with any other initiative — experimentation needs guiding principles.
Most importantly, all teams need to march toward one North Star metric.
A North Star metric is the single measurement that is most predictive of long-term success. Leading digital and technology firms have popularized the North Star metric to guide prioritization decisions and measure the performance of their products.
When defining and aligning on your firm’s North Star metric, here are a few things to keep in mind:
- It is a leading indicator of success: It predicts future results, rather than reflecting past results.
- It is actionable: Teams can “pull levers” or run experiments to influence it.
- It is (usually) not revenue: Some companies focus on revenue as a North Star like ARR. Other companies purposely avoid it because it is hard to operationalize and can be disheartening to team members.
All experiments should be tied to the business goal and North Star metric in some form. The ability to measure the impact of any given experiment on the North Star metric allows firms to disentangle any change in metric performance that would have happened anyway (i.e., incremental).
And keep in mind that not all experiments will work. In fact, most may fail. Fostering an “institutional memory” of success and failure is paramount. In firms where thousands of experiments are conducted every year, it’s critical to keep a running ledger of experiment outcomes. Firms should aim to be better at writing post-mortems about experiments that did not work and why. This way, knowledge is decentralized and accessible by all teams, facilitating a culture of continuous learning and improvement.
A sound approach to democratizing experimentation
While big tech firms like Facebook and Google use advanced machine learning and large-scale algorithmic experimentation to perfect their feeds by the minute, a guideline like this can help any digital business begin on the road to rapid experimentation.
The concept of experimentation may be simple, but there can be complex practical issues in designing an experiment to test a particular feature and analyzing the results of the experiment. The right process and team design is also essential to overcome the tactical and political challenges that may exist within a firm undertaking an initiative like this for the first time.
Leaders spend a lot of time thinking about how to optimally allocate resources and set teams up for success in organizational design. And while many frameworks exist within the broader umbrella of Agile, democratizing experimentation is what truly enables a culture of experimentation to take root. By mastering the science of experimentation, firms can build and maintain a true competitive advantage — now and into the future.