Python tools for webscraping

  • Introduction to webscraping
  • WebScraping is the process of collecting or extracting data from web pages automatically. Nowdays is a very active field and developing shared goals with the semantic web field, natural language processing,artificial intelligence and human computer interaction.
  • Python tools for webscraping
  • Some of the most powerful tools to extract data can be found in the python ecosystem, among which we highlight Beautiful soup, Webscraping, PyQuery and Scrapy.
  • Comparison between webscraping tools
  • A comparison of the mentioned tools will be made, showing advantages and disadvantages of each one,highlighting the elements of each one to perform data extraction as regular expressions,css selectors and xpath expressions.
  • Project example with scrapy
  • Scrapy is a framework written in python for extraction automated data that can be used for a wide range of applications such as data mining processing. When using Scrapy we have to create a project, and each project consists of:
  • 1.Items: We define the elements to be extracted.
     2.Spiders: The heart of the project, here we define the extract data procedure.
     3.Pipelines: Are the proceeds to analyze elements: data validation, cleansing html code
Like what you read? Give José Manuel Ortega a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.