Web Scraping LinkedIn Companies with Selenium and Beautiful Soup

When you want to start your project, you can use the data sets offered by sites like Kaggle, but you can get data using web scraping.

You want to extract a certain number of posts from each company and then apply Machine learning techniques.

An implementation of this code can be directly found at my GitHub page here: https://github.com/anthoguille/web_scraping_linkedin

What information you want to scrape from the website

  1. Name
  2. Date
  3. Post
  4. Likes

Note: You could also add the “Number of posts”, in this case we will.

Inspect the page

Press F12 or right-click on the page then go to inspect.

You need a basic knowledge about “HTML”. However, you could click on any information that you care about on the website and you will know the exact location on HTML lines code.

Python Code

These two Python libraries (BeautifulSoup and Selenium) will do the magic. You can install Selenium and web driver by following the documentation.

You start by importing the libraries into Python.

We will create an instance, open a browser with incognito mode, and maximize the window.

We will send our credentials (username and password) to login.

Nota: You need add the URLs of the companies that you want scrape to the list.

We will create a dictionary to save the data.

Here you will need to tune to see exactly how many scrolls you need, also we need add dynamic delay to load the page.

When driver finished the above action, BeautifulSoup will parser HTML and we could scrape what we need. For example, if we need the content of posts.

Outcome:

Put Data into DataFrame

Outcome:

Awesome! Now we have a dataset that we could use for machine learning or any analysis.

If you want to see the complete code, you could visit my repository (GitHub).

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…