Web Scraping with Power Query: A Quick Guide

RxDono
5 min readOct 31, 2023

Web scraping is the process of extracting data from websites through automated means, rather than manual copy and pasting. This can be achieved using several tools, with Python being a popular choice.

The choice of web scraping tool can be influenced by the amount of data needing extraction. For large-scale scraping needs, Python libraries like BeautifulSoup and Selenium may be more robust options. But Power Query is well-suited for limited, focused scraping tasks.

Power Query

Power Query, a data transformation tool in Power BI and Excel, provides a user-friendly way to extract data through web scraping. For smaller datasets, Power Query is an accessible web scraping option for Excel and Power BI users compared to advanced programming libraries like Python.

Why and When To Use Power Query

  • Power Query’s main advantage is its tight integration into Excel and Power BI. This makes it readily accessible to analysts already using those Microsoft platforms.
  • Additionally, Power Query provides a user-friendly graphical interface and M code language to build scrapers. This allows web scraping without deep coding expertise.

Some ideal use cases for Power Query web scraping include:

  • Extracting tables or datasets under 1,000 rows from websites
  • Occasional scraping needs, such as monthly/quarterly data updates
  • Scrape tasks focused on a single site or section of a site
  • Augmenting existing Excel or Power BI reporting and dashboards

For large-scale web scraping needs, Python libraries may be more appropriate. But for focused scraping tasks to supplement Excel or Power BI work, Power Query offers a quick and easy data extraction option.

Getting Started with Web Scarping in Power Query

To get started with web scraping in Power Query, first launch Power BI or Excel and access the Power Query editor.

We will be scraping iPhone catalogues from Jumia.

  1. Open Power BI and Navigate to GET DATA
  2. Click on WEB

3. Enter Website URL

4. Click OK

5. Select the Data (Table) from the Navigator and Preview.

6. Click on Transform Data.

7. Data Preview in Power Query

This only scraped the first page of the website containing 40 products. While there are situations where scraping just a single page or table is sufficient, there are also scenarios like this one where you need to scrape data spread across multiple pages with pagination.

Handling Pagination.

Many websites split data across multiple pages. To scrape all the data, we need to iterate through each page. This requires analyzing the page URL structure to see how it handles pagination.

Page 1 URL: https://www.jumia.com.ng/catalog/?q=iphones

Page 2 URL: https://www.jumia.com.ng/catalog/?q=iphones&page=2#catalog-listing

The URL changes with the page number (iphones&page2=2#catalog-listing).

Creating a Custom Function:

In the Power Query Advanced Editor, we can edit the M query to create a loop to iterate through the page numbers. This increments the page parameter in the URL each time to load the next page. Then we can combine the data from each page into a single dataset.

Combining Data:

To consolidate data from multiple pages into a single, coherent dataset, Power Query provides a robust solution. The key is to append the data from each page, allowing you to merge all the information into a comprehensive dataset. This step is pivotal in transforming raw web data into actionable insights.

8. Click on Advanced Editor

9. Edit M Query Code — Define the number of pages you want to retrieve. (Note the current query remains unchanged)

10. Initialize an empty table to store data

11. Loop through the pages and append the data

12. Click on Done

let
// Define the number of pages you want to retrieve
PageCount = 50, // Change this to the number of pages you want to fetch

// Initialize an empty table to store the data
CombinedData = Table.FromRecords({}),

// Function to fetch data from a single page
GetDataFromPage = (Page) =>
let
Source = Web.BrowserContents("https://www.jumia.com.ng/catalog/?q=iphones"),
#"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".name"}, {"Column2", ".prc"}, {"Column3", ".img-c"}, {"Column4", ".add"}, {"Column5", ".rev"}, {"Column6", ".stars"}, {"Column7", ".old"}, {"Column8", "._dsct"}, {"Column9", ".mpg"}}, [RowSelector=".prd"]),
#"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", Currency.Type}, {"Column8", Percentage.Type}, {"Column9", type text}})
in
#"Changed Type",

// Loop through the pages and append the data
PagesData = List.Transform({1..PageCount}, each GetDataFromPage(_)),
CombinedDataResult = Table.Combine(PagesData)
in
CombinedDataResult

Result

13. Click on view and check column profile

14. Inspect to see applied steps

15. Product count from 40 to 1000 after scraping 50 pages

Video Link: https://youtu.be/8UGO74QjMVE

Note: After scraping your data, you will likely need to apply further steps to clean and standardize the extracted data before loading it into Excel or Power BI. This can involve:

  • Removing unnecessary columns
  • Changing column data types (e.g. text to date)
  • Fixing formatting inconsistencies
  • Filtering to remove unwanted rows
  • Deduplicating the data
  • Standardizing abbreviations or naming conventions

The raw scraped data often requires additional transformation to prepare it for reporting and analysis. Be sure to assess data quality and apply cleaning as needed through the user-friendly Power Query editor. This will result in higher quality data ready for insights.

Conclusion

In summary, web scraping with Power Query offers a fast and effective way to gather data from websites for your data analysis projects. Understanding how to handle pagination, extract data, and combine it across multiple pages is the key to unlocking valuable insights. We encourage readers to explore this powerful technique and make it an essential part of their data analysis toolkit.

Additional Tips

When embarking on web scraping adventures with Power Query, consider the following tips:

  • Always respect website terms of service and robots.txt files.
  • Handle data with care and be mindful of privacy and legal considerations.
  • Keep your data extraction workflows organized for easy maintenance.
  • Stay up to date with Power Query updates and enhancements to maximize efficiency.

--

--