Building the Seller Analytics Dashboard

From pipe dreams to pipelines

Published in

Whatnot Engineering

8 min readMay 9, 2023

Kendall Barber-Mayo | Data Engineer & Edelita Valdez | Growth Engineer

As one of the fastest-growing marketplaces, Whatnot is committed to empowering its sellers to turn their passions into successful businesses. That’s why we created the Seller Analytics Dashboard, a powerful tool that provides sellers with a detailed look at their performance on the platform. By tracking key metrics such as revenue and average order value, sellers can make informed decisions about their businesses and improve their shows, which creates a better buyer experience for users.

When initially scoping this project, we wanted to show many different metrics to our sellers, but we also wanted to ship this product ASAP. To do this, we decided the following metrics were key for an initial version of the dashboard:

Business metrics:

GMV/Sales
Revenue/Seller Payout
Order Count
AOV
Buyer Count

Livestream metrics:

Stream Hours
Number of lives
Total number of views
Unique views

Buyer metrics:

Top N buyers by Revenue
Top N buyers by Order count

The metrics above allow sellers to get a sense of their business and how their livestreams and/or active buyers may affect sales. Now let’s look into how these metrics are created.

Transform Pipeline

For our data pipeline requirements, we had to:

Aggregate and serve metrics daily
Ensure data correctness
Create a repeatable, maintainable pipeline

Since all necessary data lies in Snowflake and we are refreshing the dashboard on a daily cadence, using dbt to define transformations made the most sense as it is part of our existing stack. In doing so, we created a separate directory within our dbt Snowflake project dedicated to the seller dashboard. Within it, we have two model layers:

Base models: Reusable models that gather required data from source/application datasets
Metrics models: Structured models that aggregate/calculate metrics by seller id and date

Below is a simplified diagram of this in action for our business and buyer metrics.

We utilize the base model “seller_dash__orders” to gather order-related metrics from our source datasets, and then reference it to derive business and buyer-related models in our metrics layer. This pattern promotes:

Consistent data to be shared across metric models
Minimal points of failure, we know to look in the base layer and/or reach out to data producers if something is off
Readability, reusability, and maintainability — any engineer outside of the project can understand and contribute to the pipeline

Now of course, you want ongoing programmatic checks as well, so we currently run simple dbt tests like `not_null` and `unique` which alert us if unexpected data pops into our base layer.

Data Activation

Once new data hits the metric layer each day, it’s ready for moving to our data serving application (Rockset). To programmatically do this, we use:

Snowflake’s External Stage (connected to a S3 bucket)
A custom dbt unload materialization that runs Snowflake’s Copy Into statement and drops data into S3
An unload model that references the materialization and is downstream of a given metric model

This setup allows us to trigger unloads within dbt and utilize Snowflake’s compute to move data into S3, as opposed to adding maintenance overhead with managing ETL code within an orchestrator.

The diagram below shows how data moves from the metrics layer to storage.

Two key parts of this diagram:

Metric models that have their own associated unload mode, same as the dbt materialization
Unload models point to specific s3_uri prefixes

Since Rockset has the capability to auto-ingest new files/data from s3 by a uri prefix, it made sense for us to add the logic into our unload materialization.

{% materialization unload, adapter='snowflake' %}
    {% set unload_stage_fqn = config.get('unload_stage_fqn') %}
    {% set s3uri_prefix = config.get('s3uri_prefix') %}
    {% set run_datetime = run_started_at.strftime("%Y/%m/%d/%H/%M/%S") %}
    {% set max_file_size = config.get('max_file_size', 16777216)|int %}
    {% set file_format_type = config.get('file_format_type', 'json') %}
    {% set compression = config.get('compression', 'NONE') %}
    {% set header = config.get('header', false) %}


-- Execute any sql required to implement the desired materialization
    {%- call statement('main') -%}
        COPY INTO @{{ unload_stage_fqn }}/{{ s3uri_prefix }}/{{ this.database|lower }}/{{ this.schema }}/{{ this.identifier }}/{{ run_datetime }}/{{ env_var("DBT_CLOUD_RUN_ID", "local_run") }}/data
        {{ sql }}
        file_format = (type={{ file_format_type }}, COMPRESSION={{ compression }})
        header = {{ header }}
        MAX_FILE_SIZE = {{ max_file_size }};
        commit;
    {%- endcall -%}
    {{ return({'relations': []}) }}
{% endmaterialization %}

In the above example of the dbt materialization, we are generating the Copy Into statement based on model configs and metadata associated with each dbt project run. To reference the custom materialization within each unload file, we do something like the below.

{{
    config(
        materialized='unload',
        unload_stage_fqn='dbt.example.seller_dash_prod',
        s3uri_prefix = 'business_metrics',
        max_file_size = 16777216,
        file_format_type = 'json',
        compression = 'none',
    )
}}

FROM (
    SELECT object_construct(*)
    FROM {{ ref('seller_dash__business_metrics') }}
)

And that’s it! By using this pattern, we’re able to drop data into s3 with minimal overhead. Now that you know how we move data from our sources to our data-serving application, let’s look into how sellers can view this data in the Front End client.

Serving data to clients

When designing the API layer and front end, the following were important to ensure we could continue to extend our set of metrics with ease:

Reusable front-end components
Consistent data shape for all time series metrics
Minimizing external service calls (for example, to Rockset)

To achieve this, we have an intermediate backend layer that fetches, validates, and transforms the data before it’s returned to the client. (pydantic is a helpful library for this.)

By conforming to a consistent data shape, as illustrated by the GraphQL schema, we ensure compatibility with standardized front-end components.

type TimeSeriesMetrics {
  ...
  startAt: String!
  endAt: String!
  granularity: MetricGranularity
  aov: TimeSeriesMetric
  aovAsync: TimeSeriesMetric
  aovLiveStream: TimeSeriesMetric
  gmv: TimeSeriesMetric
  # and so on…
  ...
}


type TimeSeriesMetric {
  items: [TimeSeriesItem]
  summaryValue: Int
  unit: TimeSeriesUnit
}

type TimeSeriesItem {
  timestamp: String
  value: Int
}

Each TimeSeriesMetric maps to its own resolver. In practice, many of the metrics fetched together have a shared data source, which we cache to prevent additional calls to Rockset as resolvers are evaluated. This results in a more efficient, scalable, and reusable system for serving time series metrics to clients.

Rendering data

Given we wanted to rapidly expand our set of metrics available, making the necessary changes on our Web client should also be quick and painless.

We picked Chart.js as our charting library due to its ease of use, great documentation, performance, and overall popularity among the dev community. Our frontend is built in React, and while there are also a few different React wrappers available for Chart.js, we wrote our own to avoid additional dependencies, allowing us to perform our own bug fixes if needed and upgrade Chart.js without a need to wait on external libraries.

To abstract away details of working directly with Chart.js, we created a pattern for constructing new charts that removes the complexity of creating tooltips, styling, and displaying other elements. The bulk of the work is done through a <ConfiguredChart />, which contains logic for rendering the chart. A developer just needs to supply data points and a “chartConfig” — a simple object that defines properties about how the metric should be displayed, e.g. title, type of chart, segments, etc.

The following annotated example corresponds to the chart above.

// SalesChartConfig.js

// The 'legend items' we want to display on the chart
// Defines its display name, color, and 'TimeSeriesMetric' name
const LIVE_LEGEND_ITEM = {
  label: "Livestream",
  color: colors.HYPERLINK,
  yAxisKey: "gmvLivestream",
};

const ASYNC_LEGEND_ITEM = {
  label: "Marketplace",
  color: colors.WN_PURPLE,
  yAxisKey: "gmvAsync",
};

// Enables the dropdown selector to choose a 'segment' to view.
// Defines its display name, which legend items to show, and if we want to display a total value in a tooltip, the name of the corresponding 'TimeSeriesMetric' to use at the value
const SEGMENTS = {
  All: {
    label: "All",
    legendItems: [LIVE_LEGEND_ITEM, ASYNC_LEGEND_ITEM],
    totalKey: "gmv",
  },
  Livestream: {
    label: "Livestream",
    legendItems: [LIVE_LEGEND_ITEM],
  },
  Marketplace: {
    label: "Marketplace",
    legendItems: [ASYNC_LEGEND_ITEM],
  },
};

const SHOW_TOTAL_SEGMENTS = new Set([SEGMENTS.All]);

// The chartConfig that determines how the chart should be rendered, inclding name, type, optional notes, units with optional currency to handle number formatting, and segments
export const SalesChartConfig = (currency) => ({
  title: "Sales",
  chartType: ChartType.BAR,
  unit: "CENTS",
  currency,
  isStacked: true,
  note: "Reflects completed orders only",
  segments: [SEGMENTS.All, SEGMENTS.Livestream, SEGMENTS.Marketplace],
  showTotalSegments: SHOW_TOTAL_SEGMENTS,
  // Reuseable helpers to format the values of x- and y-axis values
  xTicksCallback,
  yTicksCallback: yTicksCallbackDisplayCents(currency),
});

When updating GQL queries on the client, our consistent schema also allows us to make heavy use of Fragments and the @include directive to only fetch what needs to be rendered. It also simplifies the addition of new metrics to a few lines in the existing query and corresponding logic in the component fetching the data.

// GQL example:
export const GET_SELLER_ANALYTICS_QUERY_CHART = gql`
  ${TIME_SERIES_METRICS_CACHE_KEY_FRAGMENT}
  ${TIME_SERIES_METRIC_FRAGMENT}
  query GetSellerAnalyticsChart(
    $startAt: String!
    $endAt: String!
    $granularity: MetricGranularity!
    $includeAov: Boolean!
    $includeCountBuyers: Boolean!
    ...
  ) {
  getSellerAnalytics(
      params: GetSellerAnalyticsParams
    ) {
      ...TimeSeriesMetricsCacheKeyFragment
      currency
      aov @include(if: $includeAov) {
        ...TimeSeriesMetricFragment
      }
      aovAsync @include(if: $includeAov) {
        ...TimeSeriesMetricFragment
      }
      aovLivestream @include(if: $includeAov) {
        ...TimeSeriesMetricFragment
      }
      countBuyerReferralAccountCreated
        @include(if: $includeCountBuyerReferrals) {
        ...TimeSeriesMetricFragment
      }
      ...
  }
`

// Logic in component:
const { metricType } = props; 
const { loading, data: chartData } = useQuery(
  GET_SELLER_ANALYTICS_QUERY_CHART,
  {
    variables: {
      params: { ... },
      includeAov: MetricType.AOV === metricType,
      includeCountBuyers: MetricType.COUNT_BUYERS_TOTAL === metricType,
      ...
    },
    ...
  },
);

Similar to backend caching, we also cache results on the client as viewer pages through different date ranges and metrics, reducing calls to fetch data. This is done through Apollo’s custom cache IDs where the key for a `TimeSeriesMetrics` object is determined by the combination of `startAt`, `endAt`, and `granularity` values. As a viewer toggles between various metrics, those values are built up in the cache, allowing clients to bypass requests to the server if we’ve already fetched the necessary data.

How’s it going?

Since launching the initial version of Seller Analytics, we’ve heard a lot of positive feedback from our sellers and many ideas for data they want to see! In terms of usage, we’ve observed that among all sellers active in the last 7 days, 30% are checking Seller Analytics in that week. When the seller base is segmented by sales, we see sellers with higher sales browsing Seller Analytics at higher rates.

So while that’s not to say skimming through the dashboard alone will make a seller successful overnight, it is being employed as a tool by our successful sellers to better understand their businesses.

Additional audience metrics have since been added so that sellers can easily see how their follower growth is changing over time, and whether their buyers are returning, or purchasing for the first time.

A new tab has also been added to share Ads related metrics so a seller can see how effective a particular promotion was if they promoted a show on our platform.

This shows us that Seller Analytics is an important tool for our sellers, and we’re excited to continue investing on this surface.

What’s next?

So much more!

A sneak peek of what’s being worked on:

Reducing the feedback loop — new data will be available more frequently than once per day
Adding a ‘Livestream’ metrics tab — with better linking to/from the daily view
More metrics around viewership and livestream engagement

If building tools to help entrepreneurs grow their businesses sounds fulfilling to you, join us!