How to Collect Data for a Data Science Project with Python (in 3 Steps)

Start your data science project off on the right foot.

The PyCoach
Geek Culture

--

Every data science project needs data, so we use web scraping tools to extract data from a website and build our dataset.

The problem is that one website doesn’t always have all the data we want or it might have some inconsistencies that lead us to extract only a portion of the data.

That happened to me when I web-scraped football matches from the World Cups 1930 to 2022. Some data was extracted but not all. In this guide, we’ll extract the remaining football data from scratch using Selenium, so we can use this data later in a project.

Step 1: Install selenium

To install Selenium, open up a terminal and run the following command.

pip install selenium

Now we need to download the right version of chromedriver for our computer.

  1. Check your Google Chrome version (on Chrome click on three dots, click on “help” and then select “about Google Chrome”)
  2. Download the right Chromedriver version here (after any Chrome update you need to download the file again)
  3. Unzip the file downloaded and copy the path

--

--

The PyCoach
Geek Culture

My ChatGPT Course - ChatGPT Unleashed: Master GPT-4 & Prompt Engineering: bit.ly/chatgpt-pycoach