A Summary of TPC-DS

The state-of-the-art decision support benchmark.

Julian Menzler
Hyrise
9 min readJun 19, 2019

--

Welcome back to our blog series!

As announced, we want to give you an extensive summary of the industry standard TPC-DS benchmark. We read quite a bit about it so you will gain a comprehensive overview of the context, the history, and the characteristics of the benchmark.

Context

Industry Standard Benchmarks

As Jim Gray and others already stated in a paper of 1985¹, “computer performance is difficult to quantify”. The only “reasonable metrics“ are cost (price/performance) and throughput.

Industry-standard benchmarks try to provide such metrics by bundling different micro-benchmarks and workload simulations ². Their metrics allow for system performance comparisons, taking into account hard- and software characteristics. Vendors may want to use these benchmarks to improve their products. They are also encouraged to publish their results online to enable competitive comparisons, usually for a fee. However, they must provide accurate information about their test environment. A third party might verify this information afterwards.

Decision Support Systems

TPC-DS is an industry standard benchmark for “general purpose decision support systems“, the specification states³. As it turns out, the spectrum of decision support systems (DSS) is broad. There are multiple definitions available in the literature.

First of all, we should define Decision Support, the common purpose of all DSS: It refers to helping a person or group in

  • “assessing and evaluating alternative options“ as well as
  • “evaluating the possible consequences of taking action in a situation“ ⁴.

We are going to classify TPC-DS using a DSS classification by Daniel J. Power. He differentiates DSS by mode of assistance, which became the most general definition for DSS in literature ⁵.

TPC-DS belongs to the online analytical processing benchmarks (OLAP) and models a data warehouse ⁶, which is “a copy of transaction data specifically structured for query and analysis“ ⁴. Following the DSS definition of Daniel J. Power, data warehouses belong to the category of data-driven DSS, which strongly rely on OLAP and data warehousing methods ⁵. Consequently, TPC-DS benchmarks the database component of data-driven DSS.

To keep the focus on TPC-DS, we do not discuss document-, communication-, model- and knowledge-driven DSS, which are also part of Power’s DSS classification.

History of TPC-DS

The Transaction Processing Performance Council (TPC) was founded in 1988 as a non-profit organization. Since then it released a couple of transaction processing and database benchmarks.

The first decision support benchmark was TPC-D, which came out in 1994. It modeled a data warehouse and pushed “IO subsystems to their limit“ by storing a raw amount of data scaling from 1 gigabyte to 3 terabytes ⁷. At that time, query optimizers were struggling with the included 17 complex, long-running queries. This changed quickly: The development of aggregate/summary structures (e.g., materialized views or join indices) vastly decreased query runtimes. In consequence, TPC-D “was effectively broken“ ⁷.

In April 1999, the TPC released its TPC-H and TPC-R benchmarks to fix the issue with TPC-D. Both were pretty similar, but only TPC-H prevailed. In comparison to TPC-D, an additional scale factor (10 terabytes of raw data) and six more queries were introduced. Apart from that, TPC-H was nearly identical to TPC-D. Later versions only added scale-factors of up to 30 and 100 terabytes of initial data to extend the lifetime of the benchmark just a little bit further. ⁷

However, in the year of 2000, the TPC tasked its Decision Support Working Group to develop a new DS benchmark specification. The TPC-D-based benchmarks were no longer representative for elaborate, state-of-the-art DSS implementations. Although scheduled for the end of 2001, the development of the TPC-DS specification took more than ten years. In January 2012, it finally got approved. Already three years later, in 2015, the TPC released version 2.0 of the benchmark to support big data systems like Apache Hive/Hadoop. ⁷

Characteristics of TPC-DS

As we have already learned, TPC-DS models a data warehouse and focuses on OLAP, online analytical processing tasks. In practice, it models the decision support functions of a retail product supplier so that users can relate intuitively to the components of the benchmark ⁷. The imaginary retailer sells its goods via three different distribution channels:

  • Store
  • Catalog
  • Internet

Schema

TPC-D and its successors TPC-H & -R were built using a 3rd Normal Form (3NF) schema. Over the years, however, the industry has “expanded towards star schema approaches“ ⁷. As a consequence, TPC-DS implements a multiple snowflake schema, which is a “hybrid approach between a 3NF and a pure star schema“ ⁷.

The following entity-relationship-diagram shows an excerpt of the TPC-DS schema. It contains the two fact tables Store_Sales and Store_Returns as well as the associated dimension tables such as Customer and Store.

The green and blue colored relations visualize the classic star schema approach. The orange-colored relations, on the other hand, represent the normalization of dimension tables introduced with the snowflake schema approach. Normalized dimension tables can not only have relationships with each other, in addition they can also be decoupled from fact tables, as table Income_Band shows.

ER-Diagram of Store_Sales and Store_Returns in TPC-DS

In comparison to TPC-H, the schema of TPC-DS became significantly more complicated, as the following table shows:

Data Set

Once the schema for the test database is set up, tables must be populated with data. Similar to TPC-H, TPC-DS also provides the concept of scale factors to control the size of the database. Scale factors are formulated as discrete values and correspond to the approximate data volume in gigabytes. In this context, it is worth noting that benchmark results are bound to scale factors and cannot be compared with results based on other scale factors.

So how does TPC-DS implement scaling? First of all, a distinction between domain and tuple scaling must be made. The latter refers to the number of tuples in fact tables. In TPC-DS, these tuples scale linearly with the scale factor.

Domain scaling, however, refers to the dimension tables, which usually do not scale linearly. Therefore, TPC-DS scales domains sub-linearly and avoids the unrealistic table ratios that exist under TPC-H ⁷. Even with high scale factors, the ratio of customers, items, stores, and other dimensions remains realistic.

Another interesting point is the way how values are generated in TPC-DS. For this purpose, mathematical models, well-studied distributions such as the Normal or Poisson distributions are typically used. These distributions lead to synthetic datasets, which have many advantages in decision support benchmarks, but also a significant disadvantage: They are incompatible with a technique employed in TPC-DS “to make the benchmark less predictable“ (bind variables) ⁷.

Consequently, TPC-DS synthesizes real-world data for a number of crucial distributions to circumvent this issue. Consider reading the paper The making of TPC-DS ⁷ for more details on how this works.

Workload

The simulated workload is the core of any benchmark. In TPC-DS it is split into two components:

  • Query workload
  • Data maintenance workload

Both aspects are vital to a DSS: Data maintenance operations synchronize data between operational systems and the data warehouse, whereas queries convert operational facts into some business intelligence.

Query Workload

TPC-DS specifies a set of 99 distinct SQL-99 queries (including OLAP extensions), which are designed to „cover the entire dataset“ ⁷. Thanks to the schema design, which falls apart into

  • a reporting part (Catalog sales channel, 40% of the entire dataset) and
  • an ad-hoc part (Internet & Store sales channels),

the query workload consists of the following four query classes:

  • pure reporting queries
  • pure ad-hoc queries
  • iterative OLAP queries
  • extraction or data mining queries

Reporting queries are “very well known in advance“ so that “clever data placement methods“ and “auxiliary data structures (e.g., materialized views, indexes)“ can be used for optimization ³.

Ad-hoc queries form the opposite and should simulate user queries not known in advance. For this purpose, certain explicit optimizations via auxiliary data structures (such as DDL, session options, global configuration parameters) are prohibited for the ad-hoc part of the schema. The latter technique was introduced with TPC-H to fix the performance-related issues with TPC-D ⁷.

Iterative OLAP and data mining queries can be classified as either ad-hoc or reporting because they reference both parts of the schema, the ad-hoc, and the reporting part. Therefore, they are also considered as „hybrid queries“ ⁷.

Interestingly, none of the queries are purely static, not even the reporting ones. Instead, all queries are generated by a query template model with pseudo-random substitutions to model the dynamic use of DSS. The model is tightly coupled with the data generator though to „guarantee query comparability across substitutions“ ⁷.

Data Maintenance Workload

TPC’s workload models of DSS benchmarks before TPC-DS focus exclusively on data analysis. Although TPC-H includes some refresh functions, their purpose is limited. They are used to “demonstrate the update functionality for the DBMS“ and to assess “an appropriate performance cost to the maintenance of auxiliary data structures“ ⁸.

In contrast, TPC-DS recognizes the periodic data refresh process as an integral part of the data warehouse lifecycle. Its refresh process is modeled after ETL, a general process for copying data from one or more source systems into a target system. It consists of the following three steps, which are usually supported by most data warehouse systems:

  1. Data Extraction (E)
  2. Data Transformation (T)
  3. Data Load (L)

Prior to benchmarking, flat files are generated to represent data extraction results from potential external systems. The Data Transformation and Data Load steps, however, are part of the Data Maintenance workload, which incorporates update, insert, and delete operations into the test database.

Benchmark Execution and Metric

TPC benchmarks provide simple metrics for system performance comparisons. In order to calculate them, execution times have to be measured. In the case of TPC-DS, four time intervals are of interest. They can be understood quickly when looking at the execution order of the benchmark:

During the Load Test phase, everything required for the performance test is set up: The creation of the tables, populating them with data and setting up auxiliary data structures, among others. Afterwards, in the course of the Performance Test, all 99 queries are run twice. The data maintenance operations take place in between the two query runs.

Once the required times have been measured, the performance metric, which reads Queries per hour for Decision Support, can be calculated as follows:

TPC-DS Performance Metric

Worth mentioning: SF stands for scale factor and S for the number of concurrent query streams per query run. TPC-DS uses these query streams to simulate multiple users operating in parallel. Therefore, each query stream executes all 99 TPC-DS queries sequentially, but in a different order. The minimum number of query streams depends on the selected scale factor and is documented in the official specification. In general, the higher the scale factor, the higher the number of query streams that must be processed.

Conclusion

To sum up, TPC-DS is a rather sophisticated and comprehensive benchmark for database and big-data systems supporting SQL. It has been in development for a very long time and considers decision support systems as a whole: Unlike TPC-H or TPC-D, different workloads, as well as vital data maintenance operations, are taken into account when calculating the benchmark’s performance metric. That sets TPC-DS far apart from its predecessors.

References

  1. Anon et al. (1985). A measure of transaction processing power.
    Datamation Volume 31 Issue 7 (pp. 112–118).
  2. Gregg, B. (2013). Systems Performance: Enterprise and the Cloud.
    Prentice Hall (p. 601).
  3. Transaction Processing Performance Council. (April 2019).
    TPC BENCHMARK ™DS Standard Specification, Version 2.11.0, tpc.org
  4. Power, D. J. Decision Support Systems Glossary. (2014). http://DSSResources.COM/glossary/
  5. Nižetić, I., Fertalj, K., & Milašinović, B. (2007, January).
    An overview of decision support system concepts.
    In 18th International Conference on Information and Intelligent Systems (pp. 251–256).
  6. Barata M., Bernardino J., Furtado P. (2015)
    An Overview of Decision Support Benchmarks: TPC-DS, TPC-H and SSB.
    In: Rocha A., Correia A., Costanzo S., Reis L. (eds) New Contributions in Information Systems and Technologies. Advances in Intelligent Systems and Computing, vol 353. Springer, Cham
  7. Raghunath Othayoth Nambiar and Meikel Poess. 2006.
    The making of TPC-DS.
    In Proceedings of the 32nd international conference on Very large data bases (VLDB ‘06), Umeshwar Dayal, Khu-Yong Whang, David Lomet, Gustavo Alonso, Guy Lohman, Martin Kersten, Sang K. Cha, and Young-Kuk Kim (Eds.). VLDB Endowment 1049–1058.
  8. Transaction Processing Performance Council. (December 2019).
    TPC BENCHMARK ™H Standard Specification, Version 2.18.0, tpc.org

--

--

Julian Menzler
Hyrise

Interested in database engineering. Currently working at AWS Redshift in Berlin, Germany.