Sitemap
Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Introduction to XPath.

6 min readMay 9, 2021

--

Press enter or click to view image in full size
Photo by Lili Popper on Unsplash

Similar to regular expressions, Xpath can be thought of as a language for finding information in and XML/HTML document. It has many uses, but personally I use it most for developing web crawlers and grabbing information from websites. We’re going to go over the basics of the language, and how to grab the content you need from a document. In order to follow along with this tutorial, you can use the console in your Chrome Developer Tools (any browser developer tools will do) or you can use your favorite web scraping framework. If you want to use your developer tools you need to navigate to your console and start every expression with $x('YOUR XPATH EXPRESSION') :

Press enter or click to view image in full size
Querying XPath in Chrome Developer Tools

Different web scraping frameworks have different syntax. For this tutorial, I will use Scrapy Shell. You can download Scrapy by calling:

pip install scrapy

After you install the framework, run scrapy shell 'www.website.com' to open and interactive shell that will allow you to query the XPath of a specific page using the syntax response.xpath('XPATH EXPRESSION').extract() .

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Brendan Ferris
Brendan Ferris

Written by Brendan Ferris

Turning over rocks and seeing what crawls out.

No responses yet