Python script to fetch LinkedIn Profile URLs !!! | Daily Python #13

Ajinkya Sonawane
Daily Python
Published in
4 min readJan 17, 2020

This article is a tutorial on how to fetch LinkedIn profile URLs using Python.

Searching for a job? Wanna connect to recruiters in an area? LinkedIn is the best place to connect to professionals. This article will focus on developing a python script to fetch the URLs of technical recruiters on LinkedIn.

This article is a part of Daily Python challenge that I have taken up for myself. I will be writing short python articles daily.

Requirements:

  1. Python 3.0
  2. Pip

Install the following packages:

  1. requests — Library to making HTTP requests.
  2. bs4 — Library to scrape information from websites.
pip install requests bs4

Let’s first understand how the Google search works

Any Google search can be made by adding a ‘query (q)’ parameter to https://www.google.com/search

Let’s try searching for Technical Recruiter

The URL for this search will be as follows:

https://www.google.com/search?q=Technical%20Recruiter

Snip of the result of the above URL

The query can be expanded further with more parameters for improving the search results. We can use logical operators for multiple conditions. Also, we can filter the results to be of a specific website, and skip to any page of the Google results using the ‘start’ parameter.

Let’s search for Technical Recruiter Profiles from LinkedIn

The URL for this search will be as follows:

https://www.google.com/search?q=site:linkedin.com/in+AND+“Technical+Recruiter”+AND+”Pune

Snip of the results of the above URL

The above results show that Google returned over 1,23,000 results, which are divided into various pages.

Each of these pages can be accessed by modifying the URL. There is a parameter ‘start’ which tells Google which page has to be returned. Let’s add this parameter to our URL to fetch the 2nd page. These pages are divided into sets of 10 results, so the 1st page has start=0, 2nd page has start=10, 3rd page has start=20, and so on.

URL for going to the 2nd page will be as follows:

https://www.google.com/search?q=site:linkedin.com/in+AND+“Technical+Recruiter”+AND+”Pune”&start=10

Note: While adding another parameter to the URL, we use the ampersand(&).

Let’s write a Python script to go through all the pages and save the profile URLs in a list.

import requests
from bs4 import BeautifulSoup
profile_urls = [] #To store the Profile URLs
ctr = 0 #To traverse through Google results pages
#Loop through first 15 pages (10 per page, so count 15*10=150)
while ctr < 150:
query = 'https://google.com/search?q=site:linkedin.com/in AND "Technical Recruiter" AND "Pune"&start='+str(ctr)

response = requests.get(query)
soup = BeautifulSoup(response.text,'html.parser')
for anchor in soup.find_all('a'):
url = anchor["href"]
if 'https://www.linkedin.com/' in url:
url = url[7:url.find('&')]
profile_urls.append([url])
print(url)
ctr = ctr+10
Snip of the Output of the above code snippet

Beautifulsoup is used to fetch the anchor(a) tags and then the href from these tags is extracted. If the href URL contains ‘linkedin.com’ then we save it into our profile_urls list for future use.

If our program keeps requesting to Google for search results, this may result in Google detecting that the requests are being made by a bot or a computer program. This results in google not responding with a proper response, and we will not be able to fetch the data.

Issue of searching using a bot/computer program

We can introduce a delay of say 1 min and fetch a few more results and then delay again. Use time.sleep() to introduce delay in your program.

This article fetches the profile URLs of the recruiters and stores it for future use. Later articles will focus on scraping the data of these recruiters by going to their profiles and extracting information that is made accessible to the public by LinkedIn.

I hope this article was helpful, do leave some claps if you liked it.

Follow the Daily Python Challenge here:

--

--