Bluecore 2020: A Development Year in Review

Published in

Bluecore Engineering

14 min readJan 29, 2021

A month by month recount of Bluecore’s engineering accomplishments

The moment we’ve all been waiting for, a new year, a new start, a chance to hit the reset button… 2021!

Although it feels like we get to hit the reset button, many of the things that we’ve come to accept as normal throughout the last year are still here. From the zoom phrases of, “can you see my screen?”, “can you hear me?”, and “you’re muted”, to virtual happy hours, and the days jammed with meetings, we are still learning to operate in this virtual world.

As we reflect on the rubble that 2020 has left behind, here are some of our highlights that kept us feeling positive.

January: Link Tracking

Indu Subbaraj, Senior Software Engineer — Core Infrastructure

One of Bluecore’s core workflows is enabling customers to easily send engaging, personalized emails. Link tracking is a feature that provides customers with greater insight into their customer’s engagement with items in email templates. Link tracking tracks user clicks within the template and displays aggregate total and click counts. Customers can use this feature to measure engagement within their emails and test personalization features with A/B tests.

The technical work of implementing link tracking fell into three main buckets: generating a unique “link context” per link to embed into the HTML of emails, parsing that link context from click events we receive from our sending infrastructure and storing it in a database, and creating the front-end to display the desired click data. We generate a Bluecore-specific unique identifier per link that allows us to tie a click to the original template/widget/link. We store this data in BigQuery tables that are sharded along two dimensions. When data about a specific campaign is needed, we use an endpoint that triggers a BigQuery query to retrieve desired click aggregates and percentages. Because this query can take several seconds to scan through historical data, we implemented a cache to speed up requests. The email template HTML is sent to the front-end, and the click data is overlaid via JavaScript. After almost a year of this feature being generally available, it is now a core design iteration workflow at some of the largest brands and retailers.

February: Bluecore Hackathon

Alexa Griffith, Software Engineer — Analytics Infrastructure

Hackathons are an important part of Bluecore’s culture and take place twice a year. Throughout this two-day event, Bluecorians work together on teams to develop and explore new ideas and tooling. Our hackathons are a time where everyone looks forward to participating, dining together, working with others, and learning new things. The winning projects from this hackathon were: migrating our feature gating service into Kubernetes, an automation tool to set up new services, and a transparency audit log UI tool.

Switching to a remote-first work environment didn’t throw a wrench in our plans for the second hackathon of the year! During Remoticon 2020, we adapted to our remote environment, had fun, and got creative. The focus of the hackathon was around creating a better remote experience and creating more observability into our systems. A scalable URL shortening service, account health dashboard, and an encouraging, joke-telling BaharBot (Bahar is one of our favorite directors) for Slack were the winning projects. Check out our detailed blog post on Remoticon 2020 here.

Typically, we have a fun post-hackathon barbeque after the awards ceremony. In the second hackathon of the year, we couldn’t, but we look forward to being able to see each other — in person — soon! This was a great opportunity to continue a Bluecore tradition, even while we all worked remotely.

The coveted people’s choice banana trophy was given to a team of engineers

March: Performance Improvements

Jason Deng, Senior Software Engineer — Campaigns

An intuitive and fast platform with an easy to use UI is key to unleashing the creativity of our customers. Previously, we observed bottleneck performance issues with API latency below our Service Level Objectives (SLOs). Throughout March, we worked to significantly speed up browser rendering by

Reducing the time complexity of computationally heavy functions, O(n²) => O(n) where n is the number of items shown on screen. These functions were being executed millions of times across thousands of items.
Removing the “reduce” usages in our Redux reducers that were copying the entire object in memory for every item and instead, mutated the existing object, note that only the newly created object is mutated so this won’t cause problems in the React components.
Adding memoization for some selectors and utility functions to reduce React re-renders and recomputation.

This reduced worst-case latency from minutes to seconds. Then, we rewrote our React components to reduce the number of document object model (DOM) nodes appearing on screen by

Optimizing pages with CSS grid instead of large numbers of <div /> for ordering and spacing.
Adding virtualization to our tables, which only renders the rows when they are in the viewport.
Replacing our tooltip component with a plain “title” attribute where possible.

All of these improvements on the rendering side enabled us to significantly reduce browser page rendering latency to mere milliseconds.

April: One-Click Holdout Groups

Sunny Shapir, Software Engineer — Campaigns

The Bluecore platform enables customers to maximize lift, but how exactly is that measured? Running a holdout group test is one way to do this by randomly splitting a marketing campaign audience into a test and control (or holdout) group and intentionally excluding the latter from exposure to campaign messaging. Comparing the behaviors of the two groups provides valuable insights into the effectiveness of a marketing campaign in a statistically significant way.

This feature was important to our customers, but it was hard to configure. Users had to construct complex audiences in an attempt to mimic the holdout group's functionality. The complexity, infrastructure cost, and usability of this implementation were suboptimal. Withholding of groups happened at the very last steps in our processing pipelines.

We rethought this feature and worked to implement control group withholding in the very initial steps in the processing pipeline. The benefits were twofold: customers had the ability to enable a holdout group test with one click, and the size of the control group was immediately available before campaign execution. Moreover, the implementation was an order of magnitude more efficient with compute resources, and we enabled the new capability for all of our customers. This feature is now being used by many of the top 100 global retailers.

May: Enhanced Reporting Powered by Looker

Dan Mantica, Data Insights Analyst — Data Science

First-party retail data drives Bluecore’s marketing engine and AI models. This same data is extremely valuable for our customers to understand the performance of their marketing emails and their businesses. To best serve our customers, we built an enhanced analytics product that allows our analysts to rapidly prototype, test, and push dashboards to our customers without engineering support, thus minimizing time to value.

This enhanced analytics product offers access to a set of embedded Looker dashboards within the Bluecore UI. We began with three dashboards at launch, which gave new views into email performance and deliverability. We have since added several more analytical dashboards that give customers more detailed views of their email sends and their profit margins from those sends.

This effort was a collaboration between product, engineering, and our insights team to build a platform for rapid delivery of new dashboards. Our engineering team set up the back-end, our front-end team set up the UI page, and our analytics team set up the infrastructure within Looker. The script on the back-end generates an iFrame and an embed URL that allow us to display the embedded dashboard. We also set up commands that allow us to update the list of dashboards and permissions through datastore, without requiring a code change. On the Looker side, we had to set up filters so that a user from a given customer could only access their data and a workflow for converting internal dashboards to become usable as embedded dashboards. Using Looker has simplified the process of releasing new dashboards and democratized their development.

June: Email Subscription Settings

Amir Lavi, Software Engineer — Track Team
Tim Freeman, Manager of Forward Deployed Engineering — Customer Success

Email eligibility helps control the various levels of email subscription status for any given customer in Bluecore. These statuses are used to determine which customers meet email eligibility criteria for certain campaign types in Bluecore and ensure compliance with retailer policy and consumer privacy laws. The engineering team started by creating a unified framework for email eligibility permissions that standardizes opt-in/opt-out semantics for all types of campaigns enabling a scalable and maintainable infrastructure for email permissions.

We implemented a more cohesive pipeline that makes the eligibility process more efficient while investing in reliability. We built and implemented new ways of monitoring and alerting our email eligibility pipeline by creating data processing jobs that periodically provide us with an eligibility status update.

The end result was additional email eligibility options for our customers, with a simple account setup with no additional customization.

To ensure the success of the rollout plan, the Forward Deployed Engineering Team (FDE) was responsible for rolling out this feature to replace previously custom implementations. First, we bucketed our hundreds of customers into three groups: accounts that would be unaffected and for which we could just flip the switch, accounts that would be affected but for which we could make some behind the scenes implementation changes before flipping the switch, and accounts with which we would need to actively collaborate to modify their implementation. We created runbooks and communications for the two groups needing interventions from us and/or the customer. Finally, we executed the rollout as efficiently as possible. We continue to work with the Customer Success Managers (CSMs) to ensure that each account can access this feature reliably.

July: Recommendations Service Caching

Mike Hurwitz, Principal Software Engineer — Data Science Infrastructure

In a high-growth environment, all services either die or live long enough to become victims of their own success. Bluecore’s recommendation service was no exception. The complexity of product data, having scaled to hundreds of large retailers, combined with the request volume we were seeing, was just too much for our caching layer, especially when combined with some expensive deserialization. This is a case where adding hardware wouldn’t be cost-effective — we could get some improvement by spreading out the deserialization, but the load on the cache wasn’t going to get any better.

Sometimes the solution is to redesign the system. In this case, we got to take the easy way out by adding a local layer to the cache. While Bluecore tracks hundreds of millions of products for our customers, they are usually accessed in per-partner bursts. That creates a situation where even a fairly simple in-memory cache with a short, fixed expiration is really effective. Since we don’t control who calls us when we needed a cache replacement policy to ensure we weren’t caching so many products that we exceeded our memory limits.

There are many LRU (least-recently used) cache implementations online that we could have plugged in. Some are not thread-safe and would require a top-level lock. Others had per-request locking. While there are many algorithms known to approximate LRU behavior, there weren’t any open-source implementations that we could just integrate. Instead, we were able to come up with a reasonably straightforward approximate LRU cache that allows for lots of concurrency. And it takes most of our multi-millisecond remote requests and turns them into multi-microsecond local requests instead. We were able to increase bandwidth and reduce our instance count, a big win-win!

August: Highest Category Preference

Joe Guzzardo, Engineering Manager — Customer Team
Zahi Karam, Vice President of Data Science

Personalization at scale is Bluecore’s core differentiator. Category affinity, one of our most popular models, allows marketers to find, in real-time, customers interested in a chosen set of products or categories.This model has been crucial in allowing marketers to be nimble and react to changes in inventory, discounts, or even current events. One of our customers was able to capitalize on and generate significant revenue from their product being worn in a massively televised event by sending emails to their customers with an affinity for this product.

Bluecore’s retail focus allows us to be in sync with our customers’ needs and react quickly to them. Our customers wanted to leverage this model to bucket their customers into different categories based on each individual’s highest preference. For example, a sporting goods store may have specific creative and messaging to push for each of their top categories (running, basketball, crossfit), and while a customer may have an affinity to multiple categories, they only get the creative for their highest ranked category.

The resultant product not only allowed marketers to split their list into different categories but also they were able to do so in real-time. This allowed for a better understanding of how their customer base’s affinities are distributed and helped them to maximize ROI on effort by focusing on the categories with the largest reach.

Delivering on this project was an R&D-wide effort. Design and product iterated heavily to integrate this model seamlessly into our product with a UI that is clean, simple, and most importantly encourages exploration. Data Science adapted the model to handle a set of categories and choose the “best” one for each customer rather than handling a single category. The engineering team leveraged our core platform and flexible model-serving infrastructure to serve the model in real-time.

September: Game Day Testing for Black Friday Cyber Monday

Cesar Rizo, Engineering Operations

The holiday season is eCommerce’s busiest and most important time of the year. Our retail customers leverage our platform to make a large portion of their eCommerce revenue during Black Friday, Cyber Monday, and the December shopping season. Starting the week of Thanksgiving, our engineering infrastructure experiences peak loads up to 5x daily median throughout the year. To prepare for this important time, we undergo a holiday readiness process that brings together the engineering, product, and customer success teams.

In June, each engineering squad did an audit of their features to identify which components needed to be strengthened and tested. Next, teams created projections for expected traffic along with a detailed timeline for the work required to meet the projections. Our goal was to finish all the work by September so that we could load test our infrastructure and make any needed changes before the holidays.

Load testing helps us convey confidence to ourselves, and our customers that our infrastructure is reliable and ready. Our engineers go to great lengths to ensure the simulated traffic is as realistic as possible while taking precautions not to disturb the production environment. We tested event processing, email sends, and our recommendation engine multiple times each with sustained higher loads. We successfully load tested over 5x our projections without any major issues and outlined a process for scaling to even higher loads if necessary.

The results during the holidays were a resounding success and Bluecore had a record-breaking season. During the period from Thanksgiving to Cyber Monday, Bluecore sent 72% more emails than last year and processed many billions of shopping behaviors. We experienced no production incidents during this time. When one of our infrastructure vendors did run into a problem, our team quickly jumped into action and took the steps necessary to mitigate the impact. I’m proud of the months of hard work put in by the engineering, product, and customer success teams to ensure we had a successful holiday season for the company and our customers.

October: Bluecore Site™ Architecture Improvements — Performance, Scalability and Cost-Effective

Arjun Maheshwari, Director of Product — Bluecore Site™

Bluecore Site™ is an eCommerce customer acquisition and revenue-driving platform that customers can use to grow their email lists and increase conversion on-site. The product was released into general availability in May.

With this being a new product to Bluecore, there are a lot of ongoing updates and improvements that are being made to continuously iterate on the existing version. One of the bigger themes of updates is to continuously improve the architecture of Bluecore Site™ to speed up iteration and increase reliability. High traffic on a customer’s website can be a challenge, therefore campaigns must be served to match customer’s interactions. In order to help this problem, we worked with the JavaScript team to divert traffic to our GKE back-end architecture to help serve more than 1 billion campaigns in a month with high geo-availability and low latency. This also helped increase the compute and storage efficiency of pre-loaded campaigns.

In addition to enhancing the customer’s experience on-site, we also made improvements to the way that personalized coupon codes are delivered from Bluecore Site™ campaigns. It’s important for these personalized coupons to be displayed in less than 10 ms, with high availability internally to be served. To achieve this, we stored our coupons in Google CloudSQL and ran parallel workers on GKE containers to process large files containing millions of coupons. This allows us to import coupons at the rate of 1 million a minute with the distribution of coupons at less than 10 ms, ensuring our SLA of serving the campaign after matching the targeting qualifications.

November: Improved Next Best Purchase Recommendations

Gino Knodel, Senior Data Scientist — Data Science

Bluecore is unique amongst email service providers (ESPs) in that our product is priced on value since we charge for emails clicked rather than emails. This means our success is directly tied to the success of our customers. We therefore constantly revisit our recommendation models to ensure they drive the highest engagement possible.

Our most used recommendation type is our Next Best Purchase model, which leverages a customer’s full history to recommend the products they are most likely to buy. We, therefore, set out to research an improved model leveraging our in-house built AI Platform that allows us to rapidly iterate and test new models. The resulting model leverages recent interactions that users had with products (e.g. clicking on a product in an email, viewing a product on the website, making a purchase) and is based on a method called “collaborative filtering,” which means that recommendations are generated according to the principle “Other people with similar interests to yours also liked these products…”.

Because of the massive scale of users and products that the model needed to handle (up to 10s of millions of users per customer, interacting with millions of products), we decided to deviate from our usual design pattern of using BigQuery in combination with Python/Sklearn models to serve recommendations in batch. Instead, we created a Dataflow pipeline that can process a large number (up to 1,000) of micro-batches of users in parallel, which allows us to generate recommendations in a fast and cost-effective way, even for some of our largest customers.

Overall the new model led to an 8% relative increase in clicks on recommended products in personalized emails. This not only means a much better email experience for customers but also directly leads to higher revenue for our clients.

December: Bluecore Margin Optimization

Bryan Estes, Principal Product Manager

Bluecore Margin Optimization enables retailers to evaluate and improve profit margin across their Bluecore programs without sacrificing performance or personalization.

Given our industry expertise in retail, we were able to start with product recommendations since they appear in almost every Bluecore email and are used to drive traffic to the website, influencing the shopper to purchase. While typical personalized product recommendations are customer-centric and focused on creating the best personalized experience for every shopper, they don’t consider other areas of business impact, like profit margin, so this was an opportunity that Bluecore was uniquely positioned to drive value from.

We were able to quickly pull in margin related data that the platform can understand and begin taking immediate action on. This capability allowed for quick validation of our approach through a customer-facing pilot since our personalization engine is flexible enough to allow us to modify our recommendations with no major technical overhead and through our extensible enhanced analytics framework, we were able to leverage Looker to easily monitor and measure performance impact.

Overall, Margin Optimized Recommendations have been shown to improve email program profit margin by up to 5% and have the added benefit of aligning business and merchandising goals — all without sacrificing performance or personalization.

Conclusion

Despite all of the twists and turns that 2020 brought, it was a year full of collaboration and innovation at Bluecore. We’re excited to continue this momentum in 2021!

If you’re interested in tackling similar technical challenges and building exciting products, check out our careers page!