Learn XPath in 5 minutes — A tutorial for beginners
What is XPath?
Assuming you want to search or select a document in your laptop, you can open up your file explorer and key in file path to navigate to the document, for example, D://data_science/notes/learning.docx.
XPath serves similar purpose. It uses path expressions, which are similar to your file path, to search, select or navigate to specific element(s) in your XML documents, for example, //data_science/notes[@title=‘learning.docx’].
Quick Start Example
One of the most common use cases of XPath is web scrapping. Web pages, which most of them are in HTML, contain many elements. When crawlers try to extract useful information, they have to locate the correct elements in these web pages. XPath is a useful element locator.
Say we want to crawl NBA team statistics from two tables on NBA.com:
Let’s assume these two tables have the following HTML structure:
Below are examples to use XPath to locate element(s) in this web page:
- //* selects all elements in the web page.
- /div/div/table selects both tables using absolute path from root. [line 3–16, 20–33]
- /div/div[@class=‘points-per-game’]/table selects only points per game table. [line 3–16]
- //div[@class=‘points-per-game’]/table also selects points per game table, but using a relative path that matches the selection condition. [line 3–16]
- //div[@class=‘points-per-game’]/table | //div[@class=‘rebounds-per-game’]/table selects both tables using a set operator. [line 3–16, 20–33]
- //a[contains(. , ‘Milwaukee Bucks’)] selects all the elements with text Milwaukee Bucks. [line 13, 22]
- //a[@class=‘team-value’ and . >90] selects all the team value elements that are larger than 90 using an and operator. [line 6, 10, 14]
Summary
From the previous example, you can start your XPath practice with the following starter tips:
- Use / for absolute path and // for relative path based on selection condition.
- Use @ sign to apply selection conditions.
- Use ‘and’ and ‘or’ operators to connect multiple selection conditions.
- Use | operator to select multiple elements (or nodes in proper term).
- XPath also supports many functions, such as contains.
Please refer to the XPath documentation for the full description and functionality if you are interested.
Have fun exploring!
Useful resource
To practice: XPath Playground