Make your life easier with this library of helper functions.

Image for post
Image for post

Introduction

If you’re coming from languages like Javascript or Python, you could have some surprises programming in Go. You’ll quickly notice that all those functions such as filter, map or reduce are not part of this ecosystem. The fact that you don't have a built-in function to check if an element is part of a slice is even more shocking. When you'd simply do: [1, 2, 3, 4, 5].includes(5) in Javascript or: 5 in [1, 2, 3, 4, 5] in Python, you'll discover that it's a whole other story in Go.

The first reflex, of course, is to Google the classical “golang check if element in slice” hoping to find a built-in way of doing such operations. The first links on the result’s page will quickly put an end to your expectation. You’ll learn that Go doesn’t have a built-in way to handle those functions. …


Let’s unleash the power of Go and Colly to see how we can scrape Amazon’s product list.

Image for post
Image for post

Introduction

This post is the follow up to my previous article. If you haven’t already done it, I’d recommend that you have a look at it so you can have a better understanding of what I’m talking about here and it will be easier for you to code along.

In this writing, I’ll show you how to improve the project we started by adding functionalities such as random User-Agent, proxies switcher, pagination handling, random delays between requests, and parallel scraping.

The goal of those methods is first, to improve the harvesting’s speed of the information we need. Second, we also need to avoid getting blocked by the platform we’re extracting data from. Some websites will block you if they notice you’re sending too many requests to them. I want to specify that our goal here is not to flood them with requests, but just to avoid getting blocked while extracting the data we need at an appropriate speed. …


Let’s scrape Amazon to see how fast this can be. But first, let’s learn about the basics.

Image for post
Image for post

Introduction

In this article, we’ll explore the power of Go(lang). We’ll see how to create a scraper able to get basic data about products on Amazon.
The goal of this scraper will be to fetch an Amazon result page, loop through the different articles, parse the data we need, go to the next page, write the results in a CSV file and… repeat.

In order to do this, we’ll use a library called Colly. Colly is a scraping framework written in Go. It’s lightweight but offers a lot of functionalities out of the box such as parallel scraping, proxy switcher, etc.

This article will cover the basics of the Colly framework. …

About

Jérôme Mottet

Software Engineer - React and React-Native Developer - Web Scraping Enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store