How to use Excel to query big data? MDX for Kylin! (Part-I)

Xiaoxiang Yu
Kyligence
Published in
6 min readApr 7, 2022

Abstract

During the Kylin community discussion at the beginning of this year, we talked about the positioning of multidimensional databases and the idea of building a Kylin-based business semantic layer. After some development efforts, we are delighted to announce the beta release of the MDX for Kylin , an MDX query engine for Apache Kylin to allow Kylin users to use Excel for data analysis.

Target audiences

  • Kylin users who are not familiar with MDX
  • Data engineers who are interested in building a metrics platform based on Kylin
  • Data analysts who are interested in massive data analysis with Excel

What you will learn

  • Basic concepts of MDX and MDX for Kylin
  • Quickstart tutorial for MDX for Kylin
  • Demonstration of how to use MDX for Kylin to define complex business metrics

Why Kylin need MDX?

Multidimensional database and business semantic layer

The primary difference between multidimensional databases and relational databases lies in business semantics. As the must-have skill of data analysts, SQL (Structured Query Language) is extremely expressive, but if we are talking in the context of “every professional will be an analyst”, it is still too complex for non-technical users. For them, data lakes and data warehouses are like dark rooms that hold a huge amount of data; they cannot see, understand, or use the data for lack of the fundamental knowledge of databases and SQL syntax.

How to make data lakes and data warehouses “easy” for a non-technical user to use? One solution is to introduce a more user-friendly “relational data model — multidimensional data model”. If relational models are to provide a technique-oriented description of the data, multidimensional models intend to provide a business-oriented description of the data. In multidimensional databases, measures correspond to the business metrics that everyone is familiar with. Measures provide the analytic perspective to check and compare these business metrics. For example, it is like comparing the KPIs between this month and last month, or the performance of different business departments. By mapping the relational model to a multidimensional model, we add a business semantic layer on top of the technical data, thus helping non-technical users understand, explore, and use data.

In Kylin Roadmap, support to multidimensional query languages (such as MDX and DAX) is an important part, as we aim to enhance the business semantic capability of Kylin as a multi-dimensional database. Users can use MDX to convert the Kylin data model into business-friendly language, so they can perform multidimensional analysis with Excel, Tableau and other BI tools and understand the business values from their data.

Build a business metrics platform with MDX

When building complex business metrics, MDX provides the following advantages if compared to SQL:

  1. Better support for complex analysis scenarios, such as semi-accumulation, many-to-many, and time window analysis;
  2. More BI support: “Kylin + MDX” can be exposed as relational database tables through the SQL interface, or XMLA-compliant data source with business semantics. It allows MDX queries and integration with Excel and other BI tools through the XMLA protocol;
  3. Flexible defining of MDX semantic model based on Kylin data model, it will convert the underlying data structure into a business-friendly language and add business value to data. With MDX model, we offer users a unified business semantic layer, they no longer need to worry about the underlying technology or implementation complexity when analyzing data. For more information, see The future of Apache Kylin, SSAS Disadvantages: Opportunities for SSAS in the Cloud Era, and Semantic Layer: The BI Trend You Don’t Want to Miss.

MDX Overview

What is MDX?

MDX (Multi Dimensional eXpression) is a query language for OLAP Cube. It was first introduced by Microsoft in 1997 as part of the OLEDB for OLAP specification and later integrated into SSAS. Since then, it has been widely adopted by OLAP databases.

MDX is similar to SQL in many ways and also offers some SQL features though maybe not as intuitive or effective as SQL. For example, you can include SELECT, FROM, or WHERE clause in your MDX queries. But it is not an extension of SQL. You can use these keywords to dig into specific parts of the Cube.

MDX query syntax are as follows:

select <axis_specification>[, <axis_specification>]from <cube_specification>where <slicer_specification>

Key concepts of MDX

Please learn some basic MDX concepts before we continue.

  1. Dimensions, Levels, Members, and Measures
  2. Cell, Tuple, and Set
  3. Query Axis and Slicer Axis

For detailed information about these concepts, see MDX Syntax Elements (MDX).

Comparison of MDX and SQL

The query objects are different. MDX is to query the cube, with data already joined and aggregated, so users needn’t specify the join relation when querying. SQL is to query a table with detailed records. Users need to specify the join relation among the tables when querying.

Another difference is the query result. SQL returns a 2d data subset, while MDX returns the cubes.

MDX for Kylin Overview

What is MDX for Kylin?

MDX for Kylin is an MDX query engine which developed based on Mondrian, contributed by Kyligence, and with Apache Kylin as data source. Like Microsoft SSAS, MDX for Kylin can also integrate many data analysis tools, including Microsoft Excel and Tableau, to provide a better user experience for big data analysis.

How to create business metrics

Atomic metrics and business metrics

In Kylin Cube, we will perform certain aggregate calculations (such as Sum/Max/Min/Count/Count Distinct, exclude TopN) on a single column when creating measures, and the measures created are called atomic metrics.

In actual business scenarios, we can run complex calculations based on these atomic metrics to create composite metrics with business implications, and these metrics are called business metrics.

Hierarchy, Calculated Measure, and NamedSet

Hierarchy: Hierarchies are collections of dimension-based hierarchies that can empower data analysts with advanced analytical capabilities. For example, you can create a time hierarchy with year, quarter, month, week, and day as its hierarchy. Then data analysts can do a YOY analysis on the sales volume, or dig into the “Quarter > Month > Week > Day” hierarchy for more detailed analysis.

Calculated Measure: Calculated Measure are metrics/indexes acquired by running composite computing on the atomic metrics with MDX expressions. We mainly use calculated measures to create business metrics.

NamedSet: NamedSet is for the scenario when you need to reuse a set of members in MDX for Kylin. A NamedSet uses specified expressions to get the set members. It can be placed directly on the axis or used in expressions of Calculated Measure for or other Namedset.

Dataset as semantic model

In Kylin 4, we create a data model based on the relationship among tables, and define different dimensions and measures on the Cube. These measures are atomic metrics.

In MDX for Kylin, we join related Kylin Cubes to create datasets and create business metrics based on atomic metrics.

Process of create a business metrics by Apache Kylin

Process of calculating

The client(BI/Excel) sends an MDX query to MDX for Kylin, which will then be parsed into SQL and sent to Kylin. After that, Kylin will answer the SQL query based on the pre-computed Cuboid and return the result to MDX for Kylin. Then, MDX for Kylin will do some derived metrics calculation, and return the multidimensional data results to the client.

Process of answering a MDX query

Summary

MDX for Kylin supports MDX interface enhancing the semantic capability and creates a unified data analysis and management user experience. Now users can better leverage the value of data. The figure below shows the process of how raw data is processed into business metrics.

How to create a metrics platform by Apache Kylin

Technical advantages of MDX for Kylin

If compared with other open-source MDX query engines, MDX for Kylin has the following advantages:

  • Better support to BIs (Excel/Tableau/Power BI, etc.)and compliance with XMLA protocol
  • Optimize the MDX Query for BIs
  • Accelerate MDX queries with Kylin’s pre-computing capability
  • Easy-to-use interface for metrics definition and management

Contact us

If you want to see the detail of how to use MDX for Kylin? Please check next article.

If you want to check the official documentation of MDX for Kylin, please check the User Manual . For developers who want to contribute, please check Github.

Feel free to leave your suggestion, ask a question or report a bug by referring link .

This blog was published at Kylin website .

--

--

Xiaoxiang Yu
Kyligence

PMC member of Apache Kylin. Software engineer who interested in big data, algorithms etc. Engineering Bachelor of HIT (graduated at 2017/07).