Extract Zip Files from Azure Blob Storage to Power BI Efficiently

Kamilesh Efron
BI3 Technologies
Published in
6 min readJul 1, 2024

Introduction

In the realm of data analysis, the journey from raw data to insightful visualizations is often a complex process. This guide explains this process by exploring how to integrate Power BI with blob storage and demonstrating how to unzip the file using the Power Query Editor. For a hands-on experience, just follow these easy instructions.

Step 1 - Establishing a Secure Connection with Azure Blob Storage:

To initiate a connection with Blob Storage, the ‘Get Data’ action is typically selected first.

Home page of Power BI Desktop

Choose Azure and ‘Azure Blob Storage’ is chosen from the list of options. The connection with the desired Blob is then established using the Blob URL and access token. This procedure facilitates a successful connection with Blob Storage.

Get data page.

Upon successful establishment of the connection, the below image will appear. to proceed with the extraction of files, the ‘Transform Data’ option is typically chosen for further processing.

Data Preview page

Step 2 - Data extraction from zip files:

The Power Query Editor possesses the capability to handle the data unzipping process.

The below image displays the transform data page. On the left corner, there is a section for queries to create a new query, right-click on this section, hover over ‘New Query’, and then select ‘Blank Query’.

Transform data page

The below image displays a successful creation of blank quey.

Blank Query page

Now, copy the code provided at the end, paste it into the blank query, and choose done.

Step 3 - Creation of a Custom Column:

In the current data set, choose add column and then select Invoke Custom Column Function.

Add Column page

When configuring this function, need to set the parameters. For this parameter, select the content column from the exported data. This content column should contain the zip file that needs to be unzip.

Invoke Custom Function page

A new column will be populated by the results of a custom function, In the below image, there is a column named “Files” which was created by the custom function.

Custom column created

In the above image, there is a column named “Files” at the end. When the “list” is clicked, it displays all the files contained in the zip folder.

The image below displays the record of all files, which appears when the “list” in the previous image got selected.

Unzipped files

The below image presents the data that contained within the zip file.

Data from the zip file

Step 4 - Transforming Extracted Data into Insightful Visualizations:

Proceed to load the data into the model, and subsequently generate a visualization to represent the data effectively. This procedure will assist in extracting valuable metrics and key insights from the data.

Load Data
Sample Dashboard

Construct a dashboard similar to the one mentioned above to visualize your data effectively.

CODE:

(ZIPFile) => 
let
ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
EDOCfn = BinaryFormat.Record([
ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
Magic = uint,
DiskNum = ushort,
CDirectoryDiskId = ushort,
CDirectoryRecordCountOnDisk = ushort,
CDirectoryRecordCount = ushort,
SizeOfCentralDirectory = uint,
CentralDirectoryOffset = uint,
CommendLength = ushort
]),
EDOC = EDOCfn(ZIPFile),
BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
[
ZipContent = BeforeCentralDirectory,
Items = BinaryFormat.List(BinaryFormat.Record(
[
Magic = uint,
CurrentVersion = ushort,
MinVersion = ushort,
Flags = ushort,
CompressionMethod = ushort,
FileModificationTime = ushort,
FileModificationDate = ushort,
CRC32 = uint,
BinarySize = uint,
FileSize = uint,
FileInfo = BinaryFormat.Choice(
BinaryFormat.Record(
[
Len = ushort,
FieldsLen = ushort,
FileCommentLength = ushort,
Disk = ushort,
InternalFileAttr = ushort,
ExternalAttr = uint,
PosOfFileHeader = uint
]),
(fileInfo) => BinaryFormat.Record(
[
FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
])
)
]),
EDOC[CDirectoryRecordCount]
)
]),
EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),
Contents = List.Transform(
CentralDirectory(ZIPFile)[Items],
(cdEntry) =>
let
ZipEntry = BinaryFormat.Record(
[
PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]),
Magic = uint,
ZipVersion = ushort,
ZipFlags = ushort,
CompressionMethod = ushort,
FileModificationTime = ushort,
FileModificationDate = ushort,
CRC32 = uint,
BinarySize = uint,
FileSize = uint,
FileName = BinaryFormat.Choice(
BinaryFormat.Record(
[
Len = ushort,
FieldsLen = ushort
]),
(fileInfo) => BinaryFormat.Record(
[
FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
])
),
FileContent = BinaryFormat.Transform(
BinaryFormat.Binary(cdEntry[BinarySize]),
each Binary.Decompress(_, Compression.Deflate)
)
])(ZIPFile)
in
[FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
)
in
Contents

Code Description:

  1. ZIP File Extraction: The code is designed to extract the contents of a ZIP file.
  2. Binary Formats Definition: It defines binary formats for 16-bit and 32-bit unsigned integers.
  3. End of Central Directory (EDOC) Record: The code constructs a record of the EDOC of the ZIP file.
  4. Central Directory Reading: It reads the central directory of the ZIP file, which contains metadata for each file in the ZIP.
  5. File Header Reading: For each file, the code reads the file header and extracts the file name and compressed content.
  6. Content Decompression: The compressed content is then decompressed using the Deflate algorithm.
  7. List of Records Return: The function ultimately returns a list of records, each containing the file name and decompressed content of each file in the ZIP file.
  8. Efficient Data Extraction: This process allows for efficient extraction and conversion of data from zipped files

Conclusion:

Using the Power Query Editor to extract and unzip files from Blob storage is like having a super-efficient data translator. It takes messy data and turns it into a neat, organized format that Power BI can easily understand. This helps us to create beautiful charts and graphs that tell a clear story. So, by using this approach, we’re basically making our data work smarter, not harder!

About Us

Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe, and we’re quickly building a brand that is well-known for superior delivery.

Website: https://bi3technologies.com/

Follow us on,
LinkedIn: https://www.linkedin.com/company/bi3technologies
Instagram: https://www.instagram.com/bi3technologies/
Twitter: https://twitter.com/Bi3Technologies

--

--