Python script to fetch LinkedIn Profile URLs !!! | Daily Python #13

Published in

Daily Python

4 min readJan 17, 2020

This article is a tutorial on how to fetch LinkedIn profile URLs using Python.

Searching for a job? Wanna connect to recruiters in an area? LinkedIn is the best place to connect to professionals. This article will focus on developing a python script to fetch the URLs of technical recruiters on LinkedIn.

This article is a part of Daily Python challenge that I have taken up for myself. I will be writing short python articles daily.

Requirements:

Python 3.0
Pip

Install the following packages:

requests — Library to making HTTP requests.
bs4 — Library to scrape information from websites.

pip install requests bs4

Let’s first understand how the Google search works

Any Google search can be made by adding a ‘query (q)’ parameter to https://www.google.com/search

Let’s try searching for Technical Recruiter

The URL for this search will be as follows:

https://www.google.com/search?q=Technical%20Recruiter

The query can be expanded further with more parameters for improving the search results. We can use logical operators for multiple conditions. Also, we can filter the results to be of a specific website, and skip to any page of the Google results using the ‘start’ parameter.

Let’s search for Technical Recruiter Profiles from LinkedIn

The URL for this search will be as follows:

https://www.google.com/search?q=site:linkedin.com/in+AND+“Technical+Recruiter”+AND+”Pune”

The above results show that Google returned over 1,23,000 results, which are divided into various pages.

Each of these pages can be accessed by modifying the URL. There is a parameter ‘start’ which tells Google which page has to be returned. Let’s add this parameter to our URL to fetch the 2nd page. These pages are divided into sets of 10 results, so the 1st page has start=0, 2nd page has start=10, 3rd page has start=20, and so on.

URL for going to the 2nd page will be as follows:

https://www.google.com/search?q=site:linkedin.com/in+AND+“Technical+Recruiter”+AND+”Pune”&start=10

Note: While adding another parameter to the URL, we use the ampersand(&).

Let’s write a Python script to go through all the pages and save the profile URLs in a list.

import requests
from bs4 import BeautifulSoupprofile_urls = [] #To store the Profile URLs
ctr = 0 #To traverse through Google results pages#Loop through first 15 pages (10 per page, so count 15*10=150)
while ctr < 150:
  query = 'https://google.com/search?q=site:linkedin.com/in AND "Technical Recruiter" AND "Pune"&start='+str(ctr)
  
  response = requests.get(query)
  soup = BeautifulSoup(response.text,'html.parser')
  for anchor in soup.find_all('a'):
    url = anchor["href"]
    if 'https://www.linkedin.com/' in url:
      url = url[7:url.find('&')]
      profile_urls.append([url])
      print(url)
  ctr = ctr+10

Snip of the Output of the above code snippet

Beautifulsoup is used to fetch the anchor(a) tags and then the href from these tags is extracted. If the href URL contains ‘linkedin.com’ then we save it into our profile_urls list for future use.

If our program keeps requesting to Google for search results, this may result in Google detecting that the requests are being made by a bot or a computer program. This results in google not responding with a proper response, and we will not be able to fetch the data.

Issue of searching using a bot/computer program

We can introduce a delay of say 1 min and fetch a few more results and then delay again. Use time.sleep() to introduce delay in your program.

This article fetches the profile URLs of the recruiters and stores it for future use. Later articles will focus on scraping the data of these recruiters by going to their profiles and extracting information that is made accessible to the public by LinkedIn.

I hope this article was helpful, do leave some claps if you liked it.

Follow the Daily Python Challenge here:

Ajinkya-Sonawane/Daily-Python

You can’t perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

Login * Instagram

Welcome back to Instagram. Sign in to check out what your friends, family & interests have been capturing & sharing…

instagram.com