Automating Lead Generation/Email Crawling with python

Amit Upreti
Nov 14 · 3 min read

Today we will learn to automate Lead Generation/Email Crawling with a simple python script.

Photo by Miguel Á. Padriñán from Pexels

Want to skip the post and see the good stuff directly? Here is the Github repo

Lead Generation is a very Lucrative business and people earn a ton of money just by finding emails to their client.

Let’s see what our end product will look like so that I won’t waste your time in case you don’t find this interesting.

Our crawler will visit each and every sub-page of the provided website and look for emails and then save them in a CSV file.

See the code

First, let’s see the code and then I will explain each step

Let’s understand what is happening here

First part __init__() function

We have defined the following Sets

processed_urls → will hold the URLs that we have visited(so that we won’t visit the same URL twice)

unprocessed_urls → will hold the URLs that are on the queue to parse

emails → will hold the parsed emails.

We will use the base URL later to make sure our crawler doesn’t visit outside URLs.
For example: if the user passes then the base URL would be We will use this later to ensure that our crawler will only visit the URL within this domain.

crawl() function

The crawl function is a starting point of our crawler. It will keep visiting all the URLs in the queue until we have visited every URL on the website.

parse_url() function

Our parse_urls function is where extraction happens. Here we

  • parse and filter all the URLs found on the given page.
  • We filtered duplicate URLs, URLs outside the domain and already visited URLs
  • We will also make sure that we don’t try to visit URLs that lead to files such as jpg,mp4, zips.
  • We finally parse the page for emails and then write them to a CSV file.

parse_emails() function

It takes a text input and then finds emails on that text and finally writes these emails to a CSV file.

How do I run this code?

To get a local copy up and running follow these simple steps.


  1. Clone the Email-Crawler-Lead-Generator
git clone

2. Install dependencies

pip install -r requirements.txt


Simply pass the URL as an argument



➜  email_crawler python3
1 Email found
2 Email found
3 Email found
4 Email found
5 Email found
6 Email found
7 Email found

Sample Data

email crawler sample data
email crawler sample data
sample data from crawling medium

If you have suggestions or find some issues.

Feel free to open an issue or a Pull Request on GitHub.

Thank you for reading.


Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Amit Upreti

Written by

probably behind a computer

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade