Streamline Your Power BI Workflows with Version Control, Dataflows, and Deployment Pipelines

Igor Plotnikov
Microsoft Power BI
Published in
7 min readAug 25, 2023

--

In the world of business intelligence, getting version control right has always been a big deal for Power BI users. In the past, trying to handle versioning for Power BI reports was a bit of a headache, with workarounds that often weren’t as smooth as we’d hoped.

But then, on June 13, 2023, something pretty cool happened. Power BI Desktop rolled out a new way of doing things that changed how projects are made, shared, and stored. They started storing the different parts of projects — like reports and dataset definitions — as simple text files in easy-to-follow folders. This change also brought in Git versioning, a system that helps keep track of changes, making project management and teamwork way more efficient.

For folks who love bringing software engineering smarts into data work, this update was a real game-changer. The buzz around this feature’s release showed just how important smooth version control is in the world of Power BI fans. And hey, other tools like Looker from Google had already shown how Git can supercharge BI.

But even with this cool new feature, some old-school stuff still hangs around in Power BI. One example is how Power BI deals with database sources — it works, but there’s room to make it even better.

In this post, I’m going to give you some down-to-earth advice on how to bring Git versioning into your Power BI setup and enable smooth CI/CD, as far as possible. We’ll walk through how to start, and even more importantly, how to make a way of working that lines up with the smart moves of software engineering.

Start versioning your Power BI reports

Presently, you can preserve your efforts in the form of a Power BI Project (PBIP). Within this project framework, the definitions of reports and dataset artifacts are stored as separate plain text files, thoughtfully organized within an uncomplicated and user-friendly folder structure.

Just enable the .pbip save option in the settings of your Power BI desktop app

From now on if you’re working on a new project or using an existing Power BI Desktop file (pbix), you can save what you’re doing as a Power BI project file (pbip). When you do this, Power BI Desktop sorts out your reports and dataset stuff into folders, and each of these folders has text files that explain what’s in them.

You’ll probably encounter difficulties trying to study the saved project in a text editor; I don’t see much point in that. Where I do see the value is in:

  • Simultaneous work on different parts of the project
  • Visual representation of git diff, i.e., changes that were made to the project by you or your teammate
  • The ability to easily revert to a previous version of the report
  • Easy deployment and backup

Microsoft has currently only integrated its Azure repositories, so if you’re a fan of GitHub, I have some not-so-great news for you — they don’t have a way for you to deploy code from GitHub to Power BI Service just yet.

Why you should consider making use of PowerBI dataflows

This is what you see in the Power BI service when choosing the dataflow as a source

Another feature that I suggest you use along with versioning is Power BI Dataflows. Dataflow is the data transformation service that runs on the cloud independent of the Power BI dataset or solution. This data transformation service leverages the Power Query engine and uses the Power Query online and UI to do the data transformation. Dataflow does not store the data in the Power BI dataset. The dataflow acts independently of the Power BI dataset.

The main benefit that I see here is reducing the data source complexity — instead of connecting to multiple data source objects, and having all of them in the dependency circle, you can transform and prepare the data within the dataflow. It allows easier iteration and testing, as well as the creation of reusable transformation logic that can be shared by many datasets and reports inside Power BI. Moreover, multiple team members can contribute to building and refining dataflows without interfering with each other’s work.

Aside from those, there is one thing that is crucial in terms of versioning: we move connection parameters such as server name, and database name, which we don’t want to version, out of a dataset to a dataflow. Server names often hold critical information about the infrastructure and architecture of your data systems. If this information falls into the wrong hands, it could be exploited to target vulnerabilities and gain unauthorized access to your systems.

Dataflow in its gist is a JSON that is stored in Power BI service only and is not part of your git project. This way we separate the connection configuration and all the rest which is the preferable way of versioning your project.

Instead of a database server, we connect to a dataflow which contains all the metadata on the database behind it

So with the use of dataflows, we can connect from PBI Desktop to a dataflow instead of a database because a dataflow is a proxy component between a dataset and a database. One nice bonus of it is no more need to use SSH forwarding and switch server names in case your database instance lives behind the SSH host.

Can you deploy the same dataflows across multiple environments? Absolutely but…

Power BI allows you to create and manage dataflows in the Power BI Desktop and then publish them to the Power BI service. Once your dataflow is published to the service, you can promote it across different environments using the Power BI deployment pipelines.

Deployment pipelines are a feature in Power BI that allows you to automate the promotion of content (including dataflows) from one environment to another. This is particularly useful when you have a development environment, a testing environment, and a production environment, for example. With deployment pipelines, you can ensure that the same dataflow logic is used across all these environments, maintaining consistency and reducing the risk of errors.

PBI service will tell you if you have different versions of dataflow across environments:

dwf-postgres is an example of Power BI dataflow here

Interestingly, Microsoft has implemented in the Power BI service its own version control, which is only for dataflows. Thus, any changes that have been made to a dataflow should be reviewed in the service before deployment. Below is how those changes are represented:

The only way of reviewing and fixing the changes in Power BI dataflows

Ok, everything sounds great so far but here is another challenge you will definitely face when using multiple environments: the same dataflow is deployed with different IDs across environments, which makes it difficult to reference one from within a report. And one workaround will bail us out here.

There is one neat method that I borrowed from Johnny Winter that enables using the dataflow name, rather than its ID for connecting to it and thus the same connection string to a dataflow across all the envs. You need to pass it three parameters: Workspace Name, Dataflow Name, and Entity Name:

let
fnConnectToDataflow =
(Workspace as text, Dataflow as text, Entity as text) =>
let
Source = PowerPlatform.Dataflows(null),
Workspaces = Source{[Id="Workspaces"]}[Data],
Workspace_Name = Workspaces{[workspaceName= Workspace]}[Data],
Dataflow_Name = Workspace_Name{[dataflowName=Dataflow]}[Data],
Entity_Name = Dataflow_Name{[entity=Entity,version=""]}[Data]
in
Entity_Name
in
fnConnectToDataflow

Using this technique, if for whatever reason you find yourself in a position where you need to delete and replace a Dataflow, providing you stick to the same naming convention, then any of your datasets will continue to connect to the new version.

Finally, set parameter rules for different environments

For each environment, we can set up the respective parameters and switch them depending on the deployment environment (we can store and substitute the database server name here). Keep in mind that the parameter should be of the text datatype so that you can manage it using pipelines, otherwise, it won’t work.

Parameters can be used to control the connections between datasets or dataflows and the Power BI items that they depend on. When a parameter controls the connection, auto-binding after deployment doesn’t take place, even when the connection includes a parameter that applies to the dataset’s or dataflow’s ID, or the workspace ID. In such cases, you’ll need to rebind the items after the deployment by changing the parameter value, or by using parameter rules.

That is how I set up server hostnames for different environments

It’s awesome that Microsoft rolled out a version control feature for Power BI Desktop. It shows they’re listening to what users need. Still, like any new thing, there’s room for improvement.

So, while working with Power BI might have its twists and turns, remember that there are solutions out there that can make it a lot less bumpy. Embracing dataflows, getting friendly with version control, and keeping an eye on future improvements can turn these challenges into real victories. With these tools in hand, Power BI becomes a real powerhouse for making sense of data and driving smart decisions.

Don’t forget to subscribe to

👉 Power BI Publication

👉 Power BI Newsletter

and join our Power BI community

👉 Power BI Masterclass

--

--

Igor Plotnikov
Microsoft Power BI

BI/ Data Engineer. This blog is about my day-to-day working challenges and how I approach them