Using SelectorLib to scrape the information you need, fast

Brendan Ferris
Apr 19 · 5 min read

My go-to tool for data collection is the SelectorLib library. It is an easy to use, quick alternative to setting up a scraping solution from scratch. There are many ways to implement the library, and I will share my workflow. I encourage anyone interested to also take a look at the documentation on the website, because they do a good job of providing tutorials and guides that spell things out clearly.

In order to use this module you need to download the python package, and download the chrome extension.

pip install selectorlib

You can think of the process of using SelectorLib as applying a filter to HTML output. You are picking all the pieces of HTML you want, and discarding the rest. SelectorLib is used to easily build the filter.

First thing we need to do is identify the information we want to obtain. For this article, let’s assume we want to scrape the BBB complaints for Facebook. Here is the layout of the complaint’s page:

As we can see, the page is designed using a card layout. Most modern websites are designed in this way, with “cards” that all contain similar information. In the above example, each complaint is contained within its own card. Further down the page, it becomes clear that company responses are also added to the same cards:

The first step is to use Selectorlib to select the outermost card. We need to right-click on the page, open up developer tools, and hit the double arrow symbol.

Then click selectorlib in the dropdown menu.

Then we want to create a new template, name it, then click create template again when prompted.

Then click add:

Now you will be presented with a form, you want to name your selections appropriately. I usually name the outermost selection card. After giving your selection a name, click select element next to the type of selection you want to make. There are two options CSS Selector and XPath . XPath can be very useful if you need to grab those hard to reach items, but I tend to try using CSS Selectors because they change less frequently than XPath — which can change depending on the layout of the page. You also want to select the “Multiple” radio button, because we want to grab all of the complaints on the page.

After clicking select element, you will see as you hover your cursor over the page that the elements will light up green. We want to start selecting the cards, after selecting a few they all should be highlighted red.

After the item is selected, it will turn red and you will see all of the information within the card in the preview pane. After you select the outermost card with all of the information you want to obtain, click save.

Now that we selected the outermost cards on the page, we want to start selecting all of the information within the card we want to grab. After clicking the plus sign, we just want to follow the same procedure, adding the elements within the card (and naming them correctly).

After we finish, the page should look something like this:

Now that we have the information we want, we have get the YML text by clicking on this icon:

We want to copy this YML output for use later in our python script. The YML is the template the selectorlib is going to use to select to correct elements on the page.

Now that we have our template, fire up your favorite code editor and import the selectorlib module. You can use selenium, the requests module, or urllib to grab the page HTML from within your script, and extract the information. I like to use selenium because it allows me to deal with any browser interaction like pagination or scrolling down to reveal information on a site with infinite scroll. Below is a basic outline of the process, you can wrap other logic around this code in order to create some very efficient scraping tools very quickly.

I hope you found this tutorial helpful. SelectorLib is my go-to framework for extracting data from a website, and has worked in most of my scraping use cases. SelectorLib also does not only extract text, you can extract links, images, HTML, and attributes with this tool. Just remember to respect the robots.txt file for any websites you scrape.

Happy coding!

💻 Feel free to check out my website.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Brendan Ferris

Written by

Turning over rocks and seeing what crawls out.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Brendan Ferris

Written by

Turning over rocks and seeing what crawls out.

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store