The Many-to-Many Relationship and its Many Problems Part 1

How to Avoid and Solve these Problems

DOFEM (Oluwadamisi Olujobi)
5 min readJul 15, 2024

Unlike some of the other topics I have written coining up with a topic for this week's post was relatively easy as it naturally came with a good sense of humour, only that this time around, it is data-related.

The many-to-many relationship refers to a scenario where multiple records in one table are associated with multiple records in another table. A characteristic of a many-to-many relationship is that it involves duplications; records in both tables can appear multiple times, and there are no unique identifiers that prevent these repetitions.

Model and Relationship

Let’s dive into our model. The model includes three dimension tables (Products, Customers, and Territories) and two fact tables (Sales and Budget). Each dimension table is connected to the Sales table, creating a one-to-many relationship between each dimension table and the Sales table. These relationships are all active.

Model and Relationship

Now, let us create a matrix to compare Actual Sales with Budgeted Sales.

Actual vs Budgeted Sales

We observe that there are repeated numbers for Budgeted Sales, indicating an issue with the relationships. Unfortunately, there is no relationship between the Budget table and the Sales table (or any other table in the model).

No relationship between Sales and Budget table

At this point, one might ask if it is possible to connect the Budget and Sales tables directly. While it is technically possible, this approach is generally not recommended because it would create a many-to-many relationship, which can lead to significant complications and inefficiencies in the model.

Many-to-Many Relationship and its Many Problems

Problem 1- Repitions / multiple records between tables

Connecting the Sales and Budget table via Sales[Order Date] and Budget[Date] creates a very obvious problem; many-to-many relationship

Budget connected to Sales

We get a warning from Power BI informing us about the relationship. The many-to-many relationship is due to the dates, as both tables have repeated dates (neither table has unique dates; both tables have dates repeated multiple times in the rows). Cross-filter direction is “Both” which means that both tables can filter each other.

Warning from Power BI

Now we have different values for Budgeted sales but we still have problems because Budgeted Sales has missing records. Since “OrderDate” (from the Sales table) is filtering the Budget table, it has also filtered out the records for budget sales, hence we have some months missing.

If I choose to switch the filter to Sales by using the “Date” field (from Budget ) to filter Sales, we would still have missing records because the result only shows the Actual Sales for the 1st of each month and ignores all the other days in the sales table.

Missing Values for Actual Sales

Problem 2- Indirect Relationship between Tables

When creating relationships, the rule of thumb is that you can only have one active relationship between tables at a time. I want to get Budgeted Sales by Category and I add Category to a slicer. Unfortunately, all the figures are the same and some months are missing. Having similar values means there is no relationship between the Product table and the Budget table

This would force me to create a separate table for Category because we cannot create a relationship between Product table and Budget table. A separate table is created using the DAX expression below which creates a unique list for the category field in the Product table.

Category = ALL(Products[Category])

Our model will be updated as below;

Category table added to the model

Creating an active direct relationship between the Budget and Products tables would cause conflicts with other active relationships. We cannot create a direct active relationship between the Budget and Product tables because Budget already has an active relationship with Sales. Adding another active relationship between Budget and Product would lead to multiple indirect relationships in our model.

Since Sales is connected to Product, Customer, and Territories, and is also connected to Budget, this setup creates an indirect connection between Budget and these dimension tables. Hence, Product, Customer, Territories and Category indirectly affect Budget because Budget is connected to Sales. What can be more complicated than this?

Direct and indirect relationship

Problem 3- Temporary Solutions that don't solve the problem

To solve the other two problems explained above we can either set the cross filter direction for Budget and Sales to “Single”, delete the relationship between Budget and Sales, or deactivate the indirect relationships. However, these solutions are only temporary.

If we choose to adjust the existing connection between Budget and Sales by setting the cross-filter direction to “Single,” data filters will flow in only one direction between Budget and Sales. For example, if you set the direction from Budget to Sales, filtering Budget will affect Sales, but filtering Sales will not affect Budget. Consequently, the tables directly connected to Sales would no longer be connected to Budget. This takes off the indirect relationship however, this solution is not ideal for best practice.

If we choose to delete the relationship between Budget and Sales we would be back to problem 1.

If we choose to deactivate any of the indirect relationships, we would be removing key and important relationships genuinely needed for the model. What if you need the deactivated relationship for some other calculations later? Then activating that relationship would only bring us back to problem 1

To avoid all these problems, the rule of thumb is not to connect or create a relationship between fact tables. But then, how do we include the Budget table in the model without going through the many-to-many relationship and its related problems?

Let’s connect LinkedIn Youtube

--

--

DOFEM (Oluwadamisi Olujobi)

Welcome to DOFEM! Join me to dive into data analysis with Excel and Power BI. It goes beyond skills; it's about turning raw data into meaningful insights.