How Data Analytics Revolutionized a National Retailer: A Case Study

Harry @DataSmiles
9 min readSep 4, 2023

--

Executive summary

We, Data Smiles, partnered with GGV, a national retailer with approximately 40 stores across the U.S. and 8-digit annual revenue, to address their fragmented data infrastructure. Previously, GGV struggled with disparate reporting systems from multiple software vendors, leading to inefficiencies and decision-making bottlenecks.

Our comprehensive solution involved building a centralized data warehouse, complete with ETL data pipelines, and deploying Power BI reports for unified analytics. This new ecosystem covers a wide range of operational metrics, including sales, customer behavior, products, inventory, and store operations.

As a result, GGV now has a single, centralized hub for all their reporting needs. This streamlined approach has not only expedited decision-making but also unlocked analytical capabilities that were previously unattainable. The project has effectively transformed GGV into a data-driven organization, well-equipped for both current challenges and future growth.

Background

The company’s founder and CEO is very tech-savvy and data-driven. He wants his company to make decisions based on data. Our relationship with their Head of Operations was more extensive. As a skillful and obsessed operation leader, he observes their stores on a daily basis and makes all types of adjustments in order to improve their efficiency and effectiveness, and he tries new tactics or twisting their processes just to see if they work.

Before we start the project, their mind is full of questions, from understanding their customers’ behaviors to analyzing their product performance. They also want to optimize their operations through data analysis. Here are some questions they would like to answer.

  1. “What are the optimal opening and closing times for each store, especially on Saturdays?”
  2. “How does severe weather, like blizzards or storms, impact store performance?”
  3. “How effective are vendor promotions in driving new product sales?”
  4. “Which products, brands, and categories are top performers? What’s the best-selling brand within each category and product within each brand?”
  5. “Are there popular products that underperform in specific stores?”
  6. “Where should the next store location be?”
  7. “Who are the top customers, and what criteria should be used to define them?”
  8. “How can lapsed customers be re-engaged through promotions and gifts?”

Initial Problem Statement

When we started working with GGV, the company had no centralized data infrastructure. This meant they couldn’t effectively track important metrics like sales, inventory, or customer behavior. Reports were isolated and couldn’t be shared across departments, making it difficult for the team to act on any insights.

In short, GGV was operating without a cohesive data system. Our task was to build a comprehensive data ecosystem from scratch, enabling GGV to monitor key operational metrics and make informed decisions.

Data Strategy Framework

To catalyze GGV’s evolution into a data-centric enterprise, we’ve architected a comprehensive data strategy, executed in meticulously planned phases. Our initial focus was on gaining an in-depth understanding of GGV’s unique challenges and current operational landscape. This foundational knowledge enabled us to articulate a clear objective: empowering both executive leadership and operational teams to make decisions informed by high-quality, actionable data.

With this objective serving as our North Star, we delineated a strategic roadmap designed to transition GGV from its current state to a future state of data-driven excellence. This roadmap is segmented into four pivotal stages:

Data Collection: The cornerstone of any data strategy, this phase involves the meticulous gathering of raw data from diverse sources, including customer interactions, transactional records, and supply chain metrics. We employ advanced data scraping techniques and API integrations to ensure a rich and varied dataset.

Data Centralization: In this phase, we consolidate the collected data into a centralized data warehouse. This not only enhances data accessibility but also ensures data integrity and security. We leverage cloud-based solutions like AWS or Azure to provide scalable and secure data storage options.

Data Processing: Here, we apply a series of data transformations and pre-aggregations to prepare the data for analysis. This involves data cleansing, normalization, and the application of machine learning algorithms for predictive analytics. Our data models are tailored to align with GGV’s specific business questions and KPIs.

Reporting and Analysis: The final stage involves the deployment of advanced analytics tools like Power BI or Tableau for data visualization and reporting. Custom dashboards are designed to provide real-time insights, enabling agile decision-making across all levels of the organization.

By adhering to this structured approach, we not only aim to solve GGV’s immediate data challenges but also lay the foundation for a sustainable, long-term data strategy. This ensures that GGV remains agile and competitive in an increasingly data-driven marketplace.

Solution

Data collection and data quality

To effectively capture customer data, we implemented a loyalty program that encourages shoppers to identify themselves during in-store transactions. This strategy is aligned with current retail best practices, where customer behavior is tracked through rewards systems. In exchange, customers accumulate points that can be redeemed for exclusive products or even elevated to VIP status.

The identification process is integrated into the checkout workflow, ensuring that each transaction is uniquely associated with an individual customer. Our comprehensive solution involved the development of a centralized data warehouse, utilizing Snowflake for backend database architecture and Power BI for advanced reporting.

The primary advantage of this approach is the ability to closely monitor customer shopping behavior. By participating in the rewards program, customers willingly provide their contact information, thereby enriching the data pool. The program is designed to reflect the store’s unique branding, offering a personalized touch.

Building the data warehouse and ETL pipelines

Data Flow and System Architecture

We architected a comprehensive data and reporting infrastructure tailored to the client’s needs. Data is sourced directly from their Point-of-Sale (POS) system via its API, capturing essential metrics such as customer profiles, transaction histories, store details, product inventories, and vendor information.

To optimize performance and reporting accuracy, the data is stored in a centralized database where it undergoes a series of pre-aggregation processes before being utilized for reporting. Given the client’s familiarity with Microsoft technologies, we strategically selected data models and table designs that align with their expertise.

Our end-to-end solution comprises a centralized data warehouse, specifically engineered for advanced reporting. We employed Snowflake for the backend database architecture and leveraged Power BI for in-depth reporting capabilities.

The system now serves as a unified repository, consolidating inventory and sales data. This integration enables the client to perform comprehensive valuations, thereby enhancing their decision-making processes.

Customer Segmentation using RFM

One of the foundational queries driving our analytics is identifying who constitutes the ‘best’ customers. Understanding this demographic enables targeted strategies for customer retention and conversion, transforming good customers into great ones.

However, defining ‘best’ is multifaceted. Is it gauged by monetary expenditure, longevity of shopping history, or frequency of transactions? To address this complexity, we implemented a Recency, Frequency, Monetary (RFM) scoring model for each customer, encapsulating these three critical dimensions.

While we didn’t originate the RFM concept, our expertise lies in optimizing the metrics that align with both data quality and business value. Ensuring data availability and quality across the customer base is paramount. From a business perspective, the metrics chosen must not only be robust but also strategically meaningful.

  • Recency (score): we sort the customers by how many days passed from the last transaction date, and divide them into 9 buckets. Score 9 has the best recency as it is closest to today; while score 1 is the worst, meaning the last transaction is a long time ago.
  • Frequency (score): we sort the customers by how many times the customers have shopped within the last 12 months, and divide them into 9 buckets. Score 9 has the highest shopping frequency, while score 1 probably only comes in once. Because there are about 30% of the customers who come in only once, the lowest score is actually 3.
  • Monetary (score): we sort the customers by the total revenue that the customer has spent, divide them into 9 buckets. Score 9 has the highest spending, while score s has spent the least.

The FRM Score is the concatenation of the three individual scores. For example, a score of 859 should be read as 8–5–9, not eight hundred fifty nine. It means R=8, F=5, M=9.

Introducing Customer360

In addition to common customer identifiers, like First Name, Last Name, Email Address, we have the following metrics to illustrate the customer.

  • RFM_score: Total RFM score = Recency + Frequency + Monetary. In theory, the range will be from 3 to 27, because the range of each score is between 1 and 9. The higher the score, the better the customer.
  • Recency (score): described in the above section
  • Frequency (score): described in the above section
  • Monetary (score): described in the above section
  • Last Transaction Date: The last transaction date
  • Store Name: The primary store name, the most visited store in the last 12 months.
  • 3m Spent Amount: Total revenue that has been spent in the last 3 months.
  • 3mo Avg per Transact: Average revenue per transaction in the last 3 months.
  • 3mo transacts: The total number of transactions in the last 3 months.
  • Total Spent Amount: The total revenue that the customer spent.
  • Average per transact: The average revenue per transaction of this customer.
  • 12mo transacts: The total number of transactions in the last 12 months.
  • Secondary Store: the second most visited store in the last 12 months.

Product 360

This table measures the metrics of each product. Below are the definitions.

  • Product Name: the name of the product.
  • Variant Name: the name of the product variant.
  • Store Name: to show the information in the store level, such as units being sold, or last sold date, or inventory level.
  • Inventory Level: how units are being stored in the inventory currently.
  • Sold in 1mo/3mo/6mo: How many units have been sold within the last 1mo, 3mo, or 6mo.
  • Sell thru Rate 1mo/3mo/6mo: Sell through rate of the given time period. It is the number of units being sold divided by the initial inventory level.
  • Last sale date: when was the last transaction happening.

Location heat map for new stores

We also introduced the notion of a ‘Primary Store’ for each customer, defined as the location they frequent most. Additionally, we identified a ‘Secondary Store,’ which is the second most-visited location for each customer. Leveraging our robust data collection methods, we enabled GGV to pinpoint customer origins and assess the impact of store cannibalization. Approximately 30% of customer zip codes have been captured and visualized on a geospatial heatmap.

This heatmap serves as a strategic tool for visualizing customer distribution. By analyzing this data, we can identify the regions from which customers are originating and assess the presence or absence of a store in those areas. Should a gap be identified, it provides a data-driven rationale for new store openings.

Results

The implementation of our comprehensive data strategy has yielded transformative results for GGV, both quantitatively and qualitatively:

Operational Efficiency: One of the most immediate impacts was a significant reduction in the time spent generating reports. By centralizing all reporting into a single hub, GGV has streamlined its decision-making processes, enabling quicker and more informed choices.

Customer Engagement: The loyalty program has been a resounding success, driving repeat business and enhancing customer satisfaction. The rewards system has not only incentivized customer loyalty but also created a positive feedback loop that encourages ongoing engagement.

Cost Savings and ROI: GGV has realized tangible efficiencies in its operations, particularly in inventory management. The data-driven approach has enabled them to optimize their product and brand selections, thereby maximizing profitability.

Risk Mitigation: An unexpected benefit was the ability to assess the impact of external factors, such as severe weather conditions. With the new data insights, GGV confidently made the decision to close stores when necessary, without adversely affecting revenue.

In summary, our data strategy has empowered GGV to become a more agile, efficient, and customer-centric organization. The project has not only solved immediate operational challenges but also laid the groundwork for sustainable growth and competitiveness.

About Data Smiles
Data Smiles LLC is a leading consultancy specializing in data analytics, strategy, and engineering. We empower businesses to become data-driven organizations. Our tailored solutions range from building centralized data ecosystems to implementing customer engagement strategies, all designed to optimize operations and drive ROI. If you’re looking to transform your business through the power of data, we’re the partner you’ve been searching for. Visit us at datasmiles.com to learn more and take the first step towards a smarter, more efficient future.

--

--

Harry @DataSmiles

Chief Solution Architect @ Data Smiles. Former Engineering Manager of Data team at ActiveCampaign.