Web Scraping vs. APIs, explained simply and briefly

Katherine Strickland
2 min readJun 4, 2020

--

Learning techy stuff can be daunting, especially with so many complex terms, acronyms, and difficult concepts out there. This explanation will hopefully help break that information barrier for at least two things: Web Scraping and APIs.

What is web scraping?

Though it may sound like a sinister term, web scraping, also known as data scraping, is simply gathering data from a website. Instead of manually looking through multiple web pages and inputting data by hand, web scraping and web scraping tools speed up the process by automating it.This data is often organized into a data frame, which can then be used for analysis.

Web scraping can have a wide range of use cases. Companies may want to compare their own products to those of their competitors, analysts could need to compile their own datasets for a project, and scientists and researchers may want to looking for trends from a website. All of these tasks can be completed with web scraping.

How does web scraping work?

Essentially, web scraping is all about looking for patterns in the HTML of a website. A program can be taught to recognize certain patterns and extract text that matches that pattern, then organize that text into a dataframe.

For example, all of the product names on an e-commerce website like Amazon will be formatted in the same way in HTML. A web scraper takes advantage of this, and after identifying the pattern can tell a program to extract data fitting the pattern. The program will then read through the HTML of a website, and will know to extract the text every time the HTML fits the pattern.

What is an API?

API, or Application Programming Interface, is used to “tell” software components how to interact with each other. It also provides access to the data itself, which means that you can use an API to get data from somewhere else. This is how many websites will display data from other websites, and the data on the new site will update as it is being updated on the other site. The API is acting as the go-between, and therefore has access to the website’s data.

There are lots of types of APIs, each of this have a different use. Additionally, an API can be custom for a certain company or website to ensure a specific functionality.

So, what’s the difference between web scraping and an API?

This can be confusing, because both APIs and web scraping aim to collect data from the internet, but they go about it differently. As we mentioned earlier, web scrapers take data directly from what is visible on a website, so anyone with the ability to scrape will have access to the data. However, an API’s data is reliant on the company, website, or person to whom the data belongs. APIs can sometimes be expensive or restrictive if the owner does not want easy access to the data.

I hope this was helpful!

--

--