Applying a non-standard setup of attribute element values in Power BI through various methods: (1) calculation groups, (2) Power Query generated tables, and (3) a maintainable master file and Databricks script

Mateusz Mossakowski
Microsoft Power BI
Published in
8 min readJun 15, 2024

A few weeks ago, my colleagues approached me with a request to customize and restructure a report in a specific way. They wanted to display only a subset of attribute elements while still including the values in the totals. To illustrate this, let’s consider the example below.

Suppose we have a report that includes the following attributes: Product Brand, Product Category, and Product ID. However, my colleagues wanted to see (1) the Clone Furniture product brand total without it being filtered by the Benches product category, (2) the Benches product category total without it being filtered by the 32263 product ID, and (3) the 32263 product ID value itself.

Simply relying on out-of-the-box filtering alone will not lead you to the desired outcome, but instead, it will likely result in a disappointing and frustrating facepalm moment.

To address this challenge, I have developed various solutions. Let me guide you through them. After you‘re done with the reading, you can choose your preferred one. However, I kindly request that you stay focused until the end 😉

Calculation group

One solution could be to include it as part of a calculation group definition. While this has the benefit of preserving the original look and feel of the Power BI matrix, as well as the ability to expand and collapse it, it is also the most difficult to maintain and is more likely to result in mistakes. Additionally, using the ISINSCOPE function multiple times within a calculation group may impact performance quite badly.

------------------------------------
-- Calculation Group: 'non_standard'
------------------------------------
CALCULATIONGROUP non_standard[Name]

CALCULATIONITEM "product_dim_non_standard" =
IF(
( SELECTEDVALUE( product_dim[brand] ) = "Clone Furniture"
&& NOT ISINSCOPE( product_dim[category] ) )

|| ( SELECTEDVALUE( product_dim[brand] ) = "Clone Furniture"
&& SELECTEDVALUE( product_dim[category] ) = "Benches"
&& NOT ISINSCOPE( product_dim[product_id] ) )

|| ( SELECTEDVALUE( product_dim[product_id] ) = 32263 ),
SELECTEDMEASURE( )
)
Ordinal = 0

Additionally, you would need to manually disable top-level subtotals to avoid having an empty Total row in your matrix.

Power Query table

An alternative approach would be to create a non-standard setup table and establish a many-to-one bidirectional relationship between this table and the original dimension table in your semantic model. This setup table would include three columns: element name, element sort, and the original dimension key column (such as the product_id column in our case).

By utilizing this non-standard setup table and establishing the appropriate relationship, you can leverage a standard Power BI table visual to display the desired information.

An initial suggestion would be to manually create the custom table using Power Query. This approach offers the advantage of being highly flexible. However, it may still have limitations in terms of maintainability.

semantic model diagram
let
EmptySpace = "‎ ",
Indentation = Text.Repeat(EmptySpace,5),
VisualIndicator = "⇢" & EmptySpace,

Source = product_dim,

// element1
Filtering1 = Table.SelectRows(
Source,
each ([brand] = "Clone Furniture")
),
ColumnCleanup1 = Table.SelectColumns(Filtering1, {"brand", "product_id"}),
Rename1 = Table.RenameColumns(ColumnCleanup1, {{"brand", "element_name"}}),
AddSort1 = Table.AddColumn(Rename1, "element_sort", each 100),
ChangeType1 = Table.TransformColumnTypes(AddSort1, {{"element_sort", Int64.Type}}),
Final1 = ChangeType1,

// element2
Filtering2 = Table.SelectRows(
Source,
each ([brand] = "Clone Furniture") and ([category] = "Benches")
),
ColumnCleanup2 = Table.SelectColumns(Filtering2, {"category", "product_id"}),
Rename2 = Table.RenameColumns(ColumnCleanup2, {{"category", "element_name"}}),
AddSort2 = Table.AddColumn(Rename2, "element_sort", each 101),
ChangeType2 = Table.TransformColumnTypes(AddSort2, {{"element_sort", Int64.Type}}),
Final2 = Table.TransformColumns(ChangeType2, {{"element_name", each Indentation & VisualIndicator & _, type text}}),

// element3
Filtering3 = Table.SelectRows(
Source,
each ([brand] = "Clone Furniture") and ([category] = "Benches") and ([product_id] = 32263)
),
DuplicateColumn3 = Table.DuplicateColumn(Filtering3, "product_id", "element_name"),
ColumnCleanup3 = Table.SelectColumns(DuplicateColumn3, {"element_name", "product_id"}),
AddSort3 = Table.AddColumn(ColumnCleanup3, "element_sort", each 102),
ChangeType3 = Table.TransformColumnTypes(AddSort3, {{"element_sort", Int64.Type}}),
Final3 = Table.TransformColumns(ChangeType3, {{"element_name", each Text.Repeat(Indentation,2) & VisualIndicator & Number.ToText(_), type text}}),

// union of all elements
Result = Table.Combine({Final1, Final2, Final3})
in
Result
product_dim_non_standard table

The concept behind this approach is to define each element’s name (such as Clone Furniture brand, Benches category, and 32263 product ID), assign the corresponding set of product IDs to establish a correct relationship with the dimension table, and include a proper element sort column to present the data in a meaningful and hierarchical way.

To enhance the readability of the data, we can also incorporate appropriate indentation and visual indicators to indicate the different levels within the hierarchy. This will make the information more accessible and easier to comprehend.

Master file and Databricks script

Lastly, I would like to describe a more maintainable approach that involves utilizing a master file. In this approach, stakeholders can define custom setups for different dimensions, such as product and store dimensions. Within this master file, they can specify each custom item by providing information on the intersections of various attributes and their corresponding values. This allows for greater flexibility and ease of maintenance, as stakeholders can easily update and modify the custom setups as needed within the master file.

non_standard_setup master table

Step 1️⃣

Firstly, we determine the hierarchy of columns in each dimension table included in the master file by analyzing the distinct count of elements within those columns. This analysis allows us to identify the hierarchical structure of the data within each dimension table. Assumption here is: the smaller the distinct count of elements within a column, the higher the attribute is positioned in the hierarchy.

# Collect the rows of the setup table as a list of Row objects
cfg_schema = "z_mm"
cfg_table = "non_standard_setup"

tables_and_attributes = (
spark.table(f"{cfg_schema}.{cfg_table}")
.select("schema", "table", "attribute")
.distinct()
)

rows = tables_and_attributes.collect()

df_attributes = None

for row in rows:
schema = row["schema"]
table = row["table"]
table_name = f"{schema}.{table}"
attribute = row["attribute"]

attrs_in_df = (
tables_and_attributes.where(f"`schema` = '{schema}' and `table` = '{table}'")
.select("attribute")
.distinct()
)

for attr in attrs_in_df:
query = f"""
select
'{schema}' as `schema`
, '{table}' as `table`
, '{attribute}' as column_name
, count(distinct `{attribute}`) as distinct_count
from
{table_name}
"""
df_output = spark.sql(query)

# Append df_output to the df_attributes
if df_attributes is None:
df_attributes = df_output
else:
df_attributes = df_attributes.union(df_output)

df_attributes.createOrReplaceTempView("attributes_w_distinct_count")

Step 2️⃣

Next, we iterate through the master file to generate element names with appropriate indentation and visual indicators, particularly for lower-level attributes. We also ensure proper element sorting and include a flag indicating which elements should be displayed in the final visualization (is_displayed column).

empty_character = '\u00A0'
visual_indicator = '⇢'

df_attributes_expanded = spark.sql(f"""
with distinct_count_added as
(
select
nss.*
, awdc.distinct_count
from
{cfg_schema}.{cfg_table} nss
join
attributes_w_distinct_count awdc
on
nss.`schema` = awdc.`schema`
and nss.`table` = awdc.`table`
and nss.attribute = awdc.column_name )
, rank_column_added as
(
select
`schema`
, `table`
, item_nbr
, primary_key
, attribute
, attribute_value
, rank() over
(
partition by `schema`
, `table`
, item_nbr
order by distinct_count
)
as rank
from
distinct_count_added )
select
`schema`
, `table`
, item_nbr
, primary_key
,
attribute
, attribute_value
,
case
when
rank = max(rank) over (partition by `schema`, `table`, item_nbr)
then
1
else
0
end as is_displayed
, item_nbr * 1000 + rank as element_sort
, concat(repeat('{empty_character}', (rank - 1)*5), case when rank > 1 then concat('{visual_indicator}','{empty_character}') else '' end, attribute_value) as element_name
from rank_column_added
""")

Step 3️⃣

The next step involves creating a suitable set of WHERE conditions that will be used to retrieve the appropriate set of dimension key column values. These conditions are essential for accurately filtering and selecting the desired data from the dimension tables.

windowSpec = Window.partitionBy("schema", "table", "item_nbr").orderBy("attribute")
df_attributes_w_condition = df_attributes_expanded.withColumn(
"condition",
concat_ws(
" and ",
concat(expr("attribute"), lit(" = '"), expr("attribute_value"), lit("'")),
),
)
df_attributes_w_condition = df_attributes_w_condition.withColumn(
"conditions",
concat_ws(
" and ",
collect_list("condition").over(
windowSpec.rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
),
),
)
df_attributes_w_condition = df_attributes_w_condition.withColumn(
"table_full_name",
concat(expr("schema"), lit("."), expr("table"))
)
df_attributes_w_condition = df_attributes_w_condition.where("is_displayed = 1").drop(
"row_number",
"condition",
"displayed_value",
"item_nbr",
"attribute",
"attribute_value",
)

df_attributes_w_condition.createOrReplaceTempView("attributes_w_condition")

Step 4️⃣

In the final step, SQL queries will be generated for each element within the custom hierarchy setup. These queries will incorporate the appropriate WHERE clauses and will be combined using the UNION operator within each dimension table defined in the master file or table. The output of this process will be written to separate delta tables, corresponding to the number of dimension tables defined in the master file (e.g., product_dim_non_standard and store_dim_non_standard).

In this approach, the resulting output will be the same as in the Power Query approach, but it is driven by the master file, eliminating the need for manual adjustments and reducing the potential for human errors. This method provides a more automated and less error-prone solution by relying on the information within the master file.

product_dim_non_standard table
# Collect the rows 

rows = df_attributes_w_condition.collect()

df_attributes = None

for row in rows:
table_full_name = row["table_full_name"]
element_name = row["element_name"]
conditions = row["conditions"]
primary_key = row["primary_key"]
element_sort = row["element_sort"]

query = f"""
select
'{table_full_name}' as table_full_name
, '{element_name}' as element_name
, {element_sort} as element_sort
, {primary_key} as `id`
from
{table_full_name}
where
{conditions}
"""
df_output = spark.sql(query)

# Append df_output to the df_attributes
if df_attributes is None:
df_attributes = df_output
else:
df_attributes = df_attributes.union(df_output)

# split the DataFrame into multiple DataFrames based on table_full_name values
distinct_table_full_name_values = df_attributes.select("table_full_name").distinct().rdd.flatMap(lambda x: x).collect()

dataframes = {}
for value in distinct_table_full_name_values:
dataframes[value] = df_attributes.filter(df_attributes.table_full_name == value)

# write the individual DataFrames to delta tables
for key, dataframe in dataframes.items():
non_standard_table = f"{key}_non_standard"
dataframe = dataframe.drop("table_full_name")
write_dataframe_to_delta_table(dataframe, non_standard_table)

And here goes the definition of the write_dataframe_to_delta_table function:

def write_dataframe_to_delta_table(dataframe, table_full_name):
# Drop the table if it exists
drop_query = f"DROP TABLE IF EXISTS {table_full_name}"
spark.sql(drop_query)
# Write the DataFrame to the table using Delta format
dataframe.write.format("delta").saveAsTable(table_full_name)

Don’t forget to subscribe to

👉 Power BI Publication

👉 Power BI Newsletter

and join our Power BI community

👉 Power BI Masterclass

--

--

Mateusz Mossakowski
Microsoft Power BI

Husband | Dad | Business Intelligence Guy | IT Software Engineer at Procter & Gamble