CSS Selectors vs. XPath

Which is better?

Dmitry Narizhnykh
Dec 11, 2018 · 3 min read
Photo by Greg Rakozy on Unsplash

Each year, more and more companies start using web scraping tools as a part of their business intelligence and analytics. This helps businesses to become more competitive and profitable.

You should always check if you are able to extract data from a website before scraping. Here is a checklist containing 5 things to consider before doing web scraping.

So you’ve found a website that you can scrape. More likely, you will want to extract data from certain HTML elements, or elements with specific classes or IDs.

Advanced locator strategies such as CSS selector or XPath are both capable to find almost any HTML element on a web page.

Cascading Style Sheets (CSS) is a style sheet language used for describing the look and formatting of a document written in HTML or XML.

CSS Selectors are patterns used to select the styled element(s).

XPath, the XML path language, is a query language for selecting nodes from an XML document. Locating elements with XPath works very well with a lot of flexibility.

XPath uses path expressions to navigate through elements and attributes in an XML document.

Data patterns

Let’s look at the following HTML code.

<div><p class="dataflowkit expandable">Some text in Paragraph</p></div>

In order to match the tag with CSS selector we should do something like this:

p.dataflowkit.expandable

XPath locator looks like:

//p[@class='dataflowkit expandable']

CSS selectors are better to use when dealing with classes, IDs and tag names. They are shorter and easier to read.

Let’s look at the another HTML code.

<p> First </p><p> Second </p><p> Third. Some text in Paragraph </p>

XPath locator for getting content of the third <p> tag is :

//p[contains(text(), 'Some text in Paragraph')]

How to achieve the same result with CSS Selector?

This is not possible to match content inside <p> tag with Pure CSS Selector.

There are no content selectors in CSS3 specification. We can match on an element, the name of an attribute in the element, and the value of a named attribute in an element. There is nothing for matching content within an element, though.

But, what if we need to do a complex query that takes into consideration the element’s content you’re trying to find? There’s no other way except using XPath.

Or

CSS Selectors + jQuery is going to be a perfect substitute for XPath.

In order to get content of the third <p> tag from the last example we can use jQuery :contains() Selector:

p:contains('Some text in Paragraph')

Alternatively you can consider sizzle pure-JavaScript CSS selector engine.

Brief side-by-side comparison of CSS3 Selectors and XPath Expressions.

The table below is adapted from this article.

Closing Notes

Use CSS Selectors for doing simple queries based on the attributes of the element. CSS Selectors tend to perform better, faster and more reliable than XPath in most browsers.

Goquery (https://github.com/PuerkitoBio/goquery) it is based on an HTML(5) parser and supports CSS-style selectors. It is used by many Go programmers to get the similar functionality as its javascript inspiration, JQuery.

But, for more complex queries, to overcome the impossibility of querying an element’s content with CSS Selectors, use XPath or jQuery selectors.

You may want to check xmlpath (https://godoc.org/gopkg.in/xmlpath.v2) for a pure Go XPath engine, or gokogiri (https://github.com/moovweb/gokogiri) for a Go wrapper over the C libxml library.

Dataflow kit Selectors

Dataflow kit is no-coding-skills-required web data extraction service. We use CSS Selectors + jQuery to specify HTML elements to scrape data from. In most cases, it is enough to point and select needed elements on a loaded page to scrape data.

Useful resources related to XPath and CSS Selectors.

http://cssify.appspot.com/ Online XPath to CSS translator

https://css2xpath.github.io/ CSS to XPath Online converter

ChroPath — Edit, inspect & verify XPath & CSS selectors in devtools panel

Dataflow kit blog

Stories about web scraping, automation.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store