Granular cost attribution and chargeback for warehouse costs on Snowflake
Imagine a Snowflake warehouse as a virtual processing facility. While Snowflake has a separate storage layer for data, the virtual warehouse is primarily responsible for the compute tasks. The size and activity level of the warehouse determine the costs, similar to how a busier processing facility would incur higher expenses.
A common request by Snowflake customers is how to attribute costs to individual queries at a finer grain than a whole warehouse. This blog post will explore various methods to allocate compute costs and introduce the per query cost attribution feature for more granular cost attribution and analysis.
Warehouse-Based Cost Allocation
When warehouses are dedicated to a specific business unit, cost allocation is straightforward. There are two common scenarios:
Scenario 1: Object Tagging (recommended) Warehouses are tagged using object tagging to logically group cost centers. The cost attribution to a business unit for a given month can be computed using the following query. Note that this requires compliance, and a unique tag attributed to every warehouse. For reconciliation, the following example query also groups the untagged warehouses into the untagged bucket.
SELECT COALESCE(tag_references.tag_value, 'untagged') AS tag_value,
SUM(warehouse_metering_history.credits_used_compute) AS total_credits
FROM snowflake.account_usage.warehouse_metering_history
LEFT JOIN snowflake.account_usage.tag_references
ON warehouse_metering_history.warehouse_id = tag_references.object_id
WHERE warehouse_metering_history.start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND warehouse_metering_history.start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY COALESCE(tag_references.tag_value, 'untagged')
ORDER BY total_credits DESC;
Scenario 2: Warehouse Naming Convention Warehouses follow a naming convention such as teamname_details or workload_details. This allows for direct mapping of costs to the respective business units or workloads. Here is an example query that gives a split of the costs across teams assuming they follow a naming convention. The others bucket captures the ones not following the convention for reconciliation.
SELECT
CASE
WHEN POSITION('_' IN warehouse_metering_history.warehouse_name) > 0
THEN SPLIT_PART(warehouse_metering_history.warehouse_name, '_', 1)
ELSE 'others'
END AS team_name,
SUM(warehouse_metering_history.credits_used_compute) AS total_credits
FROM snowflake.account_usage.warehouse_metering_history
WHERE warehouse_metering_history.start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND warehouse_metering_history.start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY team_name
ORDER BY total_credits DESC;
Introducing Per Query Cost Attribution
When multiple business units or workloads share the same warehouse, cost allocation becomes more complex as the warehouse-level granularity is not sufficient enough. Even in cases where a warehouse may be tied to a given business unit, the unit may want to understand the costs in a more granular manner. The per query cost attribution feature, now generally available, helps in these scenarios. This feature allows for warehouse cost attribution at the query level.
The per query cost attribution feature provides the portion of the warehouse cost that can be attributed to a given query. If there is only a single query running in a warehouse, the entire cost of the warehouse is attributed to that query. If multiple simultaneous queries are running, the cost of the warehouse during the overlapping periods is attributed to the queries based on their relative resource consumption. During the times when there is no query running and the warehouse is idle (not suspended), the costs are not attributed to any particular query but can be easily determined and spread across the queries as needed for reconciliation purposes.
Common Methods to Allocate Costs Using Per Query Cost Attribution
1. User-Based Costs Understanding costs associated with a given user can be useful for accountability, budgeting and more granular forecasting. It can also provide hints on which users are most effective in their use of Snowflake. Chargeback of Snowflake costs at a user level can be achieved using the USER_NAME column in the QUERY_ATTRIBUTION_HISTORY view. The following sample query provides a way to attribute monthly warehouse costs across users. The idle time costs are proportionately attributed based on the relative spend by users.
WITH wh_bill AS (
SELECT SUM(credits_used_compute) AS compute_credits
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
),
user_credits AS (
SELECT user_name, SUM(credits_attributed_compute) AS credits
FROM snowflake.account_usage.query_attribution_history
WHERE start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY user_name
),
total_credit AS (
SELECT SUM(credits) AS sum_all_credits
FROM user_credits
)
SELECT u.user_name, u.credits / t.sum_all_credits * w.compute_credits AS attributed_credits
FROM user_credits u, total_credit t, wh_bill w;
Users can be mapped to business units / cost centers in a custom view, which can be used along with the user-based costs to roll up the spend across business units. Alternatively, the ROLE_NAME (available in QUERY_HISTORY view) can be used to allocate costs for a given project, if several developers use a given role for a specific project. This method does not require manual tagging, thereby reducing the operational overhead to enforce tagging policies.
2. Query Tag-Based Costs Query tags can be used to allocate costs across different workloads. Customers can assign query tags to specific workloads or a set of queries to track them together. These can be set at a session level (Account->User->Session) and can also be configured when issuing snowflake queries from tools such as dbt, airflow, etc. The following example query helps attribute the warehouse compute costs from the previous month to different workloads, assuming that workloads are tagged, and also allocates the idle costs proportionately across workloads for reconciliation purposes.
WITH wh_bill AS (
SELECT SUM(credits_used_compute) AS compute_credits
FROM snowflake.account_usage.warehouse_metering_history
WHERE start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
),
tag_credits AS (
SELECT COALESCE(NULLIF(query_tag, ''), 'untagged') AS tag, SUM(credits_attributed_compute) AS credits
FROM snowflake.account_usage.query_attribution_history
WHERE start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY tag
),
total_credit AS (
SELECT SUM(credits) AS sum_all_credits
FROM tag_credits
)
SELECT tc.tag, tc.credits / t.sum_all_credits * w.compute_credits AS attributed_credits
FROM tag_credits tc, total_credit t, wh_bill w;
3. Recurrent Query Costs For recurrent or similar queries, costs can be attributed using QUERY_HASH and QUERY_PARAMETERIZED_HASH. If there are several recurrent / similar queries and want to understand costs attributable to such queries over a period, you can use the following example query to group the costs based on the recurrent queries and identify the top drivers of cost for the past month (most expensive recurrent queries). Note that the attributed costs of a query changes depending on the environment (size of a warehouse, utilization of warehouse, other queries running during that time etc.).
SELECT query_parameterized_hash,
COUNT(*) AS query_count,
SUM(credits_attributed_compute) AS total_credits
FROM snowflake.account_usage.query_attribution_history
WHERE start_time >= DATE_TRUNC('MONTH', DATEADD(MONTH, -1, CURRENT_DATE))
AND start_time < DATE_TRUNC('MONTH', CURRENT_DATE)
GROUP BY query_parameterized_hash
ORDER BY total_credits DESC
LIMIT 20;
Additionally, for stored procedures that often issue multiple hierarchical queries, you can compute the attributed costs for the entire procedure using the following example query, as the parent and the root query ids are available for such procedures in the view.
SET query_id = '<query_id>'; // root query id for the stored procedure
SELECT SUM(credits_attributed_compute) AS total_attributed_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_ATTRIBUTION_HISTORY
WHERE (root_query_id = $query_id
OR query_id = $query_id);
Conclusion
Snowflake’s per query cost attribution feature, along with existing cost allocation mechanisms, provides powerful tools for granular cost management. By leveraging these features, organizations can enhance transparency, optimize resource usage, and ensure fair cost allocation. For more details, visit Snowflake’s per query cost attribution page.
Maximize your cost efficiency with Snowflake and take control of your data platform expenses today!