Power Query, Power BI, DAX and Relationships

After years of abusing VBA, array formulae, structured references in tables and generally wringing excel’s neck, I discovered Power BI. And after a struggle, it was good.

OK, where do I start?

I found it very difficult to find entry points in the documentation to gain the fundamental understanding of the concepts, the various languages, the array of products and their various and layered user interfaces.

Get & Transform in Excel

Yes, there is excel of course and it has Data Analysis ToolPak, Tables, Pivot Tables, Power Maps, Power Pivot, Manage Data Model, Relationships, Get & Transform, Connections… and probably other stuff that I can’t bother to try to remember or don’t know about. How do I make sense of all these options?

I had used the Tables in excel (and ran into functional and performance bugs in the past, hopefully all fixed now) but I wanted to build data structures and queries with them like you can in Access for example, so I gravitated towards Get & Transform. And it was exactly what I was hoping for, the way I always thought Tables should be. It took a while to get the concept though, because I had to disentangle it from all of the other elements mentioned above.

For example, do I set up relationships with joins? Do I add the tables to the data model and then create relationships? Apparently not. Are they connections or queries that I’m working with? Not sure. So, OK, where do I start? What’s the metaphor, what’s it based on? Absolutely no help from MS on this that I could find. What search terms do I use? Is it Power Query? is it Get & Transform? No idea. So, yeh, it took a while but eventually I got there and realized that I could create a table and then From Table in the first instance and Show Queries thereafter, in the Get & Transform section of the Data tab, is the magic portal I was after. When you create a query from an existing Table, and save it, you can then see that query in a panel to the right after you click on Show Queries and there we have it: the UI for building and managing your data structure. Great.

The Query Editor

The only problem is that none of the terminology was familiar and the metaphor is completely different from Access or SQL. I couldn’t see options to do familiar things like create joins for example, but OK, it makes sense after a while as you start cobbling things together with the UI. You kind of realize that a Merge Query, followed by clicking on an icon on the resulting column that is added to select which fields to bring in from the table on the right side of the join is the method here.

This is what you see after Merge Queries

As you go along, building stuff with the UI, you start to see the Applied Steps area filling up with “transformation steps” that you are applying to the table. This makes sense.

You can also see things popping into a formula bar much like the one in excel. Trying to figure out what the formula syntax for the formula bar is a confusing experience when you are starting out (one of those many things that is simple in hindsight but difficult when you are struggling with the cognitive load of unfamiliarity) but turn out the formula language is like clauses from MS’ mashup language called M and we can learn about it at the Power Query M function reference.

It turns out that the formula bar in the UI along with the icons you click on are acting like a VBA macro recorder and creating code behind the scenes that defines the data structure you are creating and you can see this by clicking on the Advanced Editor option.

The M code

If your doing fairly simple mashups, you probably don’t need to know about this but I found that it helped me a lot to understand what is going on with the transformation steps. But, syntax is really badly documented and the idiom, to me at least, frustrating and strange, with a implicit structures that deliver a thing called row context.

Now, I was very resistant to learning another language that had no basis in the wider CS space, but then I discovered that it does have lineage from F#.NET so, that calmed me down. Turns out this is Functional Programing.

I have been doing a lot of programing lately in JavaScript and C#.NET and I thought I had a bit of an idea about “pure functions” but I hadn’t worked with F# and I had no idea it was so radically different. I was used to objects and functions and mutability in fact. I had no idea how evil I was and I was confused by statements like let and in and also each. Eventually I figured out that the place to go to understand the basics was Expressions, values, and let expression

Expressions, values, and let expression

The way to read the structure

let
Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
Value = Source{0}
in
Value

is let Source and Value be defined as immutable values in the expression after the in statement. If this is the entire code then Value is available in the global scope and can be used by other queries in the workbook.

You can use the various Power Query M functions on the RHS of the assignments in the let clause, but you can only set their value once and then you can use the these values in a single expression in the in clause.

There are no loops, because a counter variable would need to be mutable and there is no conditional flow control as such with If and switch statements for example. You can however have conditional structures for determining value like…

Value = if boolean_expression then #”Value 1” else #”Value 2”,

This also demonstrates the convention for referencing variables with spaces in them which is nice for making them more readable in the UI.

Power Query M functions

The functions are grouped as methods on objects like Table, List etc. and they usually include a transformation function as one argument in their call signature. This transformation function is passed the current element of the set that is being iterated over, for example a row for a Table function, and this is often shortened using the each statement, which is syntactic sugar for a function definition. Like JavaScript, the function is a first class object and the definition can be in-lined in the argument list. This is all explained quite well in the documentation.

Separation of Concerns

When I was searching for help on the various scenarios I ran into while learning, I kept seeing recommended solutions using this thing called DAX (Data Analysis Expressions). I had the idea that this was an alternative M and that it was the old way of doing things based on the Excel Database formulae. Frustratingly, there was also never any clue as to where you physically type the fucking things. So, I avoided this and focused on doing everything in M because it has that lineage to .NET.

As it turns out, DAX is the formula language for a thing called measures which are a bit like calculated columns in Access, but not really. They are like functions that can be used to add a derived column to a table set. But where?? Let me try to explain…

Model View Controller

The query editor that I described above is creating what we would normally call the Data Model, or just Model for short. This is essentially a database cube that is comprised of a structure of shaped and linked tables. Another essential part of a modern BI system is the View. This is the visualisation of the data where it all comes to life as graphs and maps and things like that. There is also a third component commonly referred to as the Controller and this element is concerned with managing changes in the Model and View in response to user inputs.

I’ll focus here on the Model and View

Model

I’ve already covered the Model above. This is built by establishing connections to data sources (Tables in Excel files in the simplest case) and mashing the tables into the shape you need to relate, group lookup and transform, to create the BI cube that you need. This is primarily done using M, using the transformation steps that I introduced above in conjunction with the Accessing data functions that are also part of the Power Query platform.

View

The View is Where Power Maps, Power Pivot, Power View and the Report View of Power BI come in and this is where measures and DAX and Relationships live. After linking and shaping up your data Model in Power Query, you might need to add some processing that is part of the View requirements, for example summary calculations that depend on user-selected grouping or filtering and propagation of user inputs to related view elements.

Controller

This part I am not yet clear on: it could be that what I have called the Model should more appropriately be called the View Model (as in MVVM used in WPF applications). The main point though is to think about arc

Decoupling

Regardless of what they are called, coupling between these two spaces should be as loose as possible so that they can be developed and maintained separately.