Introducing Fast Copy: Speed Up Data Management with Fabric Dataflows Gen2

Discover the new Fast Copy feature in Fabric Dataflows Gen2, designed to accelerate your data processing within Azure. Learn how Fast Copy simplifies data ingestion and transformation, supports large datasets, and integrates with Azure’s data ecosystem.

Rui Carvalho
The Data Therapy
5 min readMay 9, 2024

--

Improving the efficiency and speed of data processing has been one of the big priorities in Microsoft Fabric. One of the most recent features coming to Preview is the Fast Copy feature in Dataflows Gen2, which is all about these qualities. It was designed to enhance data ingestion capabilities, it can be used with Dataflows Gen2 to transform and load data at high scales. In this story let's dive into Fast Copy’s features, benefits, and configuration.

What is Fast Copy?

Fast Copy is a new feature in the public preview in Dataflows Gen2, aimed at making the data ingestion process more efficient. Traditionally, with Dataflows, loading large volumes of data required multiple steps, involving data pipelines to transfer data into a staging area, and then processing it through Dataflow Gen2. Fast Copy simplifies this by consolidating data ingestion and transformation, eliminating the need for separate data pipelines for these tasks. It´s like having the Copy Activity from Data Pipelines inside Dataflows… but with some limitations.

Capabilities and Advantages

Fast Copy allows direct ingestion of data from various Azure data sources like ADLs Gen2, Blob Storage, Azure SQL database, Lakehouse, PostgreSQL and On-premises.

About files, currently, it supports file types such as CSV and Parquet. One of the key benefits of Fast Copy is its ability to handle large datasets efficiently — requiring a minimum of 100 MB for files and a million rows for Azure SQL databases.

You don´t need to use Fast Copy resources for every single data copy, since the problematic loading was always on large datasets.

Configuration and Use

Configuring Fast Copy is straightforward. The first thing you need to do is create a Dataflow Gen2 in Fabric by just following the steps below after entering app.fabric.microsoft.com.

  1. Click on Data Factory Experience and choose DataflowsGen2 in the items options.
Microsoft Fabric — Data Factory
Microsoft Fabric — Data Factory

2. Then let´s create a connection to an Azure SQL Database and import a table.

Microsoft Fabric — DataflowGen2 — new connection
Microsoft Fabric — DataflowGen2 — Load table

3. You must enable Fast Copy by accessing the Options menu and selecting Scale -> Allow use of fast copy connectors.

Microsoft Fabric — DataflowGen2 — Allow fast copy connector

4. Let´s apply this setting to a specific query to ensure that Fast Copy is used.

Microsoft Fabric — DataflowGen2 — Set Fast Copy

5. Now set a destination, publish, and run the dataflow.

Microsoft Fabric — DataflowGen2

6. Additionally, the Refresh History interface allows users to verify if Fast Copy was employed correctly by examining the processing logs.

Microsoft Fabric — DataflowGen2 — Fast Copy

Comparing both methods— Is it really faster?

I have a table in this same database with 1 million rows.

Let´s run a Dataflow to import this data to a Lakehouse with and without Fast Copy.

Hmm.. here the Fast Copy was slower than the traditional way. Let´s make another test with 2 million rows.

Now Fast Copy is faster but with a small margin, I am still not convinced, so let´s test it with 10 million rows.

Looks that as we increase the data volume we are working with, the advantages of using Fast Copy increase.

Test it in your environment!

Limitations

While Fast Copy is a great tool, it is currently limited to a few data source types and supports only basic transformations such as:

  • File combining.
  • Column selection.
  • Renaming.
  • Data type changes.

Wrapping up

Fast Copy in Dataflows Gen2 represents a significant leap towards more integrated and efficient data management solutions. It offers a promising future for businesses looking to leverage big data for analytics and insights with Fabric by reducing the complexity and steps involved in data ingestion and transformation.

For those managing large volumes of data within the Azure ecosystem, using Fast Copy could be a game-changer.

Did you enjoy it? For just $5 a month, become a Medium Member and enjoy limitless access to every masterpiece on Medium. By subscribing via my page, you not only contribute to my work but also play a crucial role in enhancing the quality of my work. Your support means the world! 😊

--

--

Rui Carvalho
The Data Therapy

Data Enthusiast | Time Management and Productivity | Book Lover | One of my passions is to teach what´ve learned | Storys every week.