Microsoft Power BI
Published in

Microsoft Power BI

Demystifying Power BI — Data Management

A summary and overview of data management and supporting components in the Power BI ecosystem

Photo by Miguel Á. Padriñán from Pexels

Data Model

Types of Data Models

  1. Power BI datasets: Power BI datasets are the native and most frequently used data models created using Power BI Desktop, Power BI service, and Power BI datamarts. Under the hood, every Power BI dataset is an Analysis Services tabular model. A tabular model is a data model consisting of one or more tables, relationships amongst those tables, hierarchies, and calculations.
  2. External-hosted models: Power BI supports SQL Server Analysis Services and Azure Analysis Services models for reporting and visualization. These models are hosted on their respective infrastructure and outside the Power BI back-end platform. Additionally, Power BI can connect to multidimensional models (cubes) and tabular models using live connection mode.
  3. Excel workbook model or CSV: Connecting to Excel workbooks and CSV files hosted in a Power BI workspace results in the automatic creation of a model. For example, excel tables and CSV data are imported to create model tables, whereas an Excel workbook data model is transposed to create a Power BI model.
  4. Streaming datasets: Real-time streaming datasets are helpful for streaming data and updating visuals and dashboards in real time. The source of streaming data can be factory sensors, social media sources, service usage metrics, etc. Power BI provides four types of real-time datasets: Push dataset, Streaming dataset, Hybrid dataset, and PubNub streaming dataset.
Power BI: Dataset limits by license type (Image by Author)

Table Storage Modes

  • Import: Tables with import mode are stored in the dataset. DAX queries submitted to the Power BI dataset could be fulfilled only from the stored data.
  • DirectQuery: Tables with DirectQuery aren't stored in the dataset. DAX queries submitted to the Power BI dataset can be fulfilled only by executing on-demand queries to the data source.
  • Dual: Tables with dual mode can act as either cached or not cached, depending on the context of the query submitted to the dataset. Power BI determines the most efficient plan, striving to use stored data whenever possible.

Dataset Modes (aka Model Framework)

  • Import mode: An import model comprises tables with their storage mode property set to Import. Import mode delivers high-speed performance thanks to in-memory caching and querying.
  • DirectQuery mode: A DirectQuery model comprises tables with their storage mode property set to DirectQuery, and all the tables belong to the same source group. DirectQuery models don't import data. Instead, they consist only of metadata defining the model structure. When the model is queried, native queries retrieve data from the underlying data source, for example, SQL. Connecting to other Power BI datasets and Analysis Services models using DirectQuery (in preview) is possible.
  • Composite mode: A composite model comprises tables from more than one source group. The composite models can mix Import and DirectQuery modes or integrate multiple DirectQuery data sources.
  • Import — Represents all import storage mode tables, including calculated tables. There can only be one import source group in a model.
  • DirectQuery — Represents all DirectQuery storage mode tables related to a specific data source. For example, SQL Server and SAP HANA represent two separate DirectQuery source groups in a model.

Dataset Connectivity Modes

  • Live Connection (or Exploration mode): A Power BI report can connect to an Analysis Services model or Power BI datasets using Live connection mode. DAX queries are issued via a live connection to the Power BI dataset or Analysis Services whenever the report is rendered. This mode supports both tabular and multidimensional models. However, no transformations, mash-ups, or modeling capabilities are available with live or exploration connectivity mode.

Data Preparation & Cleansing

  • Integrating data from multiple inconsistent data sources
  • De-normalizing the data to make the dataset ideal for reporting
  • Filtering out unnecessary columns or rows
  • Handling dirty data, incorrect data, or errors
  • Power Query Editor in Power BI Desktop: In Power BI Desktop, Power Query Editor is used to connect, shape, and transform data from one or multiple sources. The data is then loaded into the Power BI dataset, ready for further modeling and reporting.
  • Dataflows: Dataflows provide a flexible and more straightforward solution to create a simple, centralized data repository like a data warehouse, albeit one that is not as sophisticated as a full-fledged data warehouse. Dataflows are helpful in scenarios where the requirement is to query the source system as few times as possible, to reduce the amount of data preparation work being duplicated across many datasets, and to provide a single storage location for integrating data from multiple sources. In addition, tables in a dataflow can't be browsed, queried, or explored without a dataset. Dataflows are available for all users with a Power BI Pro license, with additional features available in the Power BI Premium or Premium Per User (PPU) workspaces.
  • Datamarts: Datamarts is a newer alternative to Power BI Dataflows targeted toward self-service scenarios. Datamarts enables users to collect, transform, store and explore data loaded in a relational and fully managed Azure SQL database. Additionally, Datamarts automatically generate a dataset, which can be used to create Power BI reports and dashboards. Unlike Dataflows, Datamarts provides access to tables through a SQL endpoint. Datamarts require a Power BI Premium subscription or Premium Per User (PPU) license.

Dataflows and Datamarts are created using Power Query Online, a web-based version of the same Power Query engine used in Power BI Desktop.

  • Streaming dataflows: Streaming dataflows allow authors to connect to, ingest, mash up, model, and build reports based on streaming, near real-time data directly in the Power BI service. Authors can mash streaming data with batch data if required. The final entity produced is a dataflow, which can be consumed in real-time to create a highly interactive, near-real-time reporting solution. Streaming dataflows can only be created and run in a workspace part of a Power BI Premium capacity or Premium Per User (PPU) license.

Data Refresh

Types of Data Refresh

  • Full Refresh: In full refresh, the tables are truncated and loaded fully from the data source. That means processing and loading every single row of data. Full refresh is expensive regarding resource consumption, time, and load on the Power BI infrastructure.
  • Incremental refresh: Incremental refresh loads only new and changed data from the source. Incremental refresh helps reduce the time to load larger datasets and the query load on source systems.

Methods for Data Refresh

  • Refresh now: "Refresh now" triggers an on-demand refresh of datasets, dataflows, and datamarts.
  • Scheduled refresh: Schedule refresh is helpful to set up periods at which a dataset, dataflow, or datamart will refresh automatically. The Power BI service starts scheduled refreshes at the specified times on a best-effort basis. The target is to initiate the refresh within 15 minutes of the scheduled time, subject to the load on the capacity used to execute the refresh.
  • Re-uploading a .pbix file: Re-uploading a .pbix file replaces the existing dataset in the service, thus effectively changing the data in the underlying tables. As a result, the refreshed data is not necessarily the latest compared to the sources.
  • REST API: Power BI REST APIs provide a flexible approach to automate the refresh operations for datasets and dataflows. The REST APIs can be executed via external apps or Power BI cmdlets for PowerShell.
  • Refresh using XMLA: Datasets hosted in a Power BI Premium, Premium Per User (PPU), and Power BI Embedded workspace support asynchronous refresh with XMLA endpoints. In addition, XMLA endpoints offer fine-grained control over the dataset refreshes. For example, refresh operations through the XMLA endpoint are not limited to 48 refreshes per day, and the scheduled refresh timeout (five hours) is not imposed.
  • Power Automate: Power Automate provides an out-of-box automated refresh action for datasets allowing datasets to be refreshed based on external conditions and requirements. Power Automate is the perfect low-code alternative for REST APIs.

Supporting Components

  • Data source connectors: Connectors enable Power BI to connect and source data from many different sources. Connectors provide a solution for authenticating and querying data sources. In the case of Microsoft data sources, connectors integrate user authentication and data governance with Azure Active Directory.
  • Data lineage: Data lineage is an administrative tool in the Power BI service that shows visual relationships between all the artifacts in a workspace and all its external dependencies. The lineage can help data authors to debug problems and tune performance by showing the links between reports, dashboards, datasets, and dataflows. Users need a Power BI Pro license and appropriate workspace roles to switch to lineage view.
  • Gateways: The on-premises data gateway acts as a bridge to provide quick and secure data transfer between on-premises data and several Microsoft cloud services such as Power BI, Power Apps, etc. The gateway enables the Power BI service to query locally hosted databases and other data sources and handles encryption, compression, and data transmission. Gateways can be configured for real-time access or scheduled refreshes. Data gateways are available in two different modes, each for use in a different scenario: Standard (aka enterprise) and Personal mode.
  • Deployment pipelines: Deployment pipelines is an administrative tool that provides life-cycle control for managing content. Authors can use the pipelines to control the movement of reports, paginated reports, dashboards, datasets, and dataflows through the development, testing, and production stages. Deployment pipelines are only available with a Power BI Premium subscription.

Conclusion

Next in series

Like this post? Connect with Dhyan

--

--

Microsoft Power BI blog publishing news, tips and tutorials carefully chosen by our editors team. If you would like to contribute to our publication contact me on email powerbi.medium@gmail.com or https://linktr.ee/tomas.kutac

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dhyanendra Singh Rathore

Analytics Expert. Data and BI Professional. Owner of Everyday BI. Private consultation - dhyan.singh@everydaybi.com