Web Scraping in R

Sree
The Startup
Published in
5 min readSep 20, 2020

--

Photo by Ilya Pavlov on Unsplash

Objective: To fetch the latest product prices that have been hosted on the competitor’s website programmatically.

For the purpose of demonstration, let’s look into the websites of WeWork and Regus; two leading players in the co-working industry who competes among each other to serve hot desks, dedicated desks, and private offices across the globe. Let’s try to scrap their websites in California to retrieve the latest product price listings programmatically.

There were four milestones to accomplish the objective:

  1. Web scraped Regus sites using httr/rvest packages.
  2. Cleaned the dataset and incorporated geospatial coordinates.
  3. Repeated steps 1 & 2 for WeWork websites.
  4. Embedded R script in Power BI and visualized the final output.

Phase 1: Web scraped Regus sites using httr/rvest packages

  • Step 1.1. Imported Libraries: Imported all the relevant libraries upfront.
library(tidyverse)
library(rvest)
library(revgeo)
library(“opencage”)
library(dplyr)
library(sqldf)
library(formattable)
library(stringr)
library(ngram)
library(httr)
library(rlist)
library(jsonlite)
library(lubridate)
library(splitstackshape)
  • Step 1.2. Regus Location API: Extracted the co-working locations in California from Regus…

--

--

Sree
The Startup

Full-Stack Data Scientist | Strategic Marketing | srees.org