Web scraping Websites

Andy Estevez
Strategio
Published in
2 min readNov 7, 2022
https://deadentertainment.com/uploads/criterion-collection-sale-fqo4574j18.png

What is Criterion?

Criterion is a company that releases and distributes films in DVD, Blu-Ray, and 4K formats every month after getting the licensing and completing their restoration process. The company’s main objective is “dedicated to publishing important classics and contemporary films from around the world”.

Purpose of creation

I created a website called CriterionPrices that tracks the prices of their titles without having to go to a specific product page to view the price of that product. I did this because having to go to the particular page became annoying to navigate just to see the cost since some of their titles can vary in price depending on if it’s a box set, 4K format, products on sale, and TV miniseries.

Technologies used to create

  • React.js for the front end
  • MySQL and RDS to store the web scraped data
  • Express.js and Node.js for the back end
  • Puppeteer for web scraping with JavaScript

Visual of code and website

Here is some of the code specifically for scraping the pre-order/newly released products done with JavaScript and a sample screenshot of a page of the site

async function scrapePreorderOrNewReleases(url){    
try {
const browser = await puppeteer.launch({headless: true});
const page = await browser.newPage();
const page2 = await browser.newPage();
await page.goto(url);
const filmWraps = await page.$$('.filmWrap')
const films = [];

for(const ele of filmWraps) {
const date = await ele.$eval('dt', dt => dt.innerText)
const title = await ele.$eval('figure > img', figure =>
figure.getAttribute('alt'))
const director = await ele.$eval('dd', dd => dd.innerText)
const img = await ele.$eval('img', img =>
img.getAttribute('src'))
const link = await ele.$eval('a', a => a.getAttribute('href')) films.push({date, title, director, img, link});
} for(let film of films) {
let link = film.link
await page2.goto(link)
const price = await page2.$eval('body > div.page-contain
> main > article
> div.content-container.product-primary-content-
container > div > div.right > div > div
> section.purchase-options.pk-c-purchase-options
> form > fieldset.purchase-buttons
> div:nth-child(1) > label > span.meta-prices
> span.item-price',el => el.innerText)
film['price'] = price
}

await page2.close();
await page.close();
return films
}
catch (err) {
console.log("Error: ", err);
}
}

To view all the code for the project, here it is: https://github.com/AndyEstevez/CriterionPrices

--

--