Web Scraping Basics — Scraping a Betting Site in 10 Minutes

In this 10-minute tutorial, I’ll show you how to scrape websites with Python even if you don’t code at all.

Frank Andrade
Oct 28, 2020 · 12 min read
scraping a betting site with python
Photo by Aidan Howe on Unsplash

Before learning Python, I always had a problem when starting a new project — there wasn’t any data available! Actually, there was, but it wasn’t exactly the data I needed. One of those projects was betting on sports. I needed as much data as possible to increase my chances of winning. Now, I know that you don’t need to be an expert in Python to scrape websites and get that valuable data. In this tutorial, I’ll give you a step-by-step guide that would teach you the necessary stuff to scrape your favorite website from scratch.

Keep in mind that you can even make extra money by scraping a betting site as I explained in the article How to Make Money From Web Scraping Without Selling Data, so let’s start!

What is Web Scraping?

In this tutorial, we’ll be scraping the betting site ‘Tipico’ (link in the code below). We’ll get all the betting odds available for any sport. After you learn the basics of web scraping, you’ll be able to scrape most websites you know.

Legal disclaimer: Massive scraping of websites causes high traffic and could burden them. If you are accessing websites, you should always consider their terms of service and check the ‘robots.txt’ file to know how the site should be crawled. Moreover, I do not promote gambling, no matter what kind of betting it is.

What Do You Need to Scrape a Web?

  1. Python: To follow this tutorial, you don’t need to be an expert in Python. You at least need to know how for loops,if conditions and lists work in Python. If you don’t know them yet, don’t worry, I’ll explain to you how they work before we use them.

Before we start, make sure you have Python3.x installed on your computer. If so, let’s start with the tutorial by setting up Selenium!

Setting up Selenium

  1. Download the Driver. We need this to allow Selenium to interact with the browser. Check your Google Chrome version and download the right Chromedriver here (you need to download the Chromedriver file again when the Google Chrome browser is automatically updated). If necessary, unzip the driver and remember the path you’re leaving the Chromedriver file.

Note 1: This is a basic project that covers pre-match games. However, surebets are frequently found in live games. The tutorial to scrape live odds is available in the article below. That being said, scraping live odds is a bit harder than pre-match odds, so please make sure you understand all the code explained in this tutorial first.

Note 2: I built a profitable betting tool with Python’s Selenium and Pandas. In the article below I show the full code and explain how I did it.

Time to Code

In case you already followed this tutorial and suddenly the code stopped working, below you can find some updates I made to the code because of some changes made to the website.

Update March 10th, 2021: I added some lines of code to adapt the tutorial to changes made on the website.

Update April 22nd, 2021:

1)Apparently it’s not possible to access the website from some countries. If that’s the case use a VPN and connect to some country in Europe (TunnelBear works good for me and it’s free)

2)The website made some major changes to the live and prematch section, so to keep the tutorial useful, we’ll use those sections with the old structure.

To go to those sections, check the panel on the left side and locate the “top sports” section, and then check the league(s) you wish to scrape. After that, you’ll get a link that we’re going to use to scrape the website. In this tutorial, I’ll use the link got after checking the Spanish League (written in the code below). However, feel free to check the box of any league you want and replace it in the web variable showed in the code below.

Image by author

Every line of code changed is specified in the full code (end of this article) and is working by April 22nd, 2021. Let me know if the code stops working. That being said, let’s start with this tutorial!

Import Selenium

Writing our First Selenium Python Test

In this example, I’m going to scrape the Spanish League, however, you can check the box of the league you want to obtain the link to paste in the web variable.

Image by author

Now we need to set a driver instance that will help us navigate through the website — we called it driver. We do this by writing the first line of the following code.

Once the driver instance is set, we open the betting website using the driver.get command. Run the code and you’ll see that the browser opens automatically:

Let’s do our first interaction on the website through Selenium.

Make ChromeDriver click a button

Breaking down the code:

  • time.sleep(5) is an explicit wait that makes Selenium wait for 5 seconds. Only after this, the next line of code is executed.
  • accept is the variable name we created for the ‘accept’ button we want to click on
  • driver.find_element_by_xpath() helps us find an element within the website. We only need to give the XPath of that element. It’s very simple to find XPath in Chrome. To find the XPath of the ‘accept’ button, do the following:
  1. Open Google Chrome and go to the betting site.
  2. Right-click on any space and select ‘Inspect.’ After doing this, you’ll see the ChromeDeveloper Tool that includes the code behind the website.
  3. To find the element’s code, click on the ‘mouse cursor icon’ on the left. Then hover on the ‘accept’ button and click on it to find the code behind it. After clicking, you’ll notice that there is a code highlighted in the ChromeDeveloper Tool. Right-click on it and click on copy, then select ‘copy XPath.’

Steps 2 and 3 should look like this:

Copy and paste the XPath inside parentheses in driver.find_element_by_xpath(). With this, Selenium identified where’s the ‘accept’ button.

  • accept.click() tells Selenium that we want to click on the ‘accept’ button when we open the website

Now everything is ready to start scraping the betting site.

Scheme for Scraping the Website

  • Sports title: Represents the sports section. The website has many sports available, but we’ll focus only on football to make things simple. The code we’re going to write will help you scrape any sport, though.
  • Single-row event: Events with only one row. Live events may have 2 rows, but we’ll focus on upcoming games
  • odds_event: Represents the odds available within a row. Each row has 1 ‘odds_event’ and each ‘odds_event’ has 3 boxes ‘3-way,’ ‘Over/Under’ and ‘Handicap.’

That being said, let’s build our web scraper!

Building the Web Scraper

Initialize your storage

Select only upcoming matches

Looking for ‘sports titles’

Breaking down the code:

  • sport_title represents the sport name of each section.
  • driver.find_elements_by_class_name() helps us find an element within the website. We only need to give the ‘class name’ of that element. Keep in mind that .find_element will give us a single element, but .find_elements will give us elements inside a list. We’ll loop through this list a bit later, but first, let’s find the class name of the sport title. Do this to find the code behind the ‘football’ title:

After you clicked on ‘Football’, check the code that is highlighted. It looks something like this: <div class=”SportTitle-styles-sport”. We only need to copy and paste the class name ‘SportTitle-styles-sport’ inside driver.find_elements_by_class_name()

So far, you learned how to use .find_element_by_class_name() , .find_element_by_xpath() and .click()with Selenium. Before diving more into Selenium, make sure you know how to usefor loops and if conditions. If you know them, you can skip this section. If not, here’s a refresher:

Refresher on ‘for' loops + ‘if’ statement and lists

Breaking down the code:

  • teams_example is the name of the list of teams we created
  • for team in teams_example is looping through each team in the list
  • print(team) is executed once for each team in our ‘teams_example’ list

If we run this code, we obtain this:

Barcelona
Madrid
Sevilla

When we use the if statement. We’re telling Python ‘only continue when this condition is True’ To do so, we have to add the if condition:

If we run this code, we only obtain ‘Barcelona’ because we told Python ‘only print when the team is Barcelona’

Barcelona

Great! You know for loops and if statements. Let’s continue with the tutorial!

Selecting only ‘football’

Breaking down the code:

  • sport represents each sport name in the sport_tilte variable
  • .text gives us the text attribute of a variable. By comparing sport_title.text with the name of the sport (football), we make sure we’ll only get data from the football section.
  • sport.find_element_by_xpath(‘./..’) helps us find an element with Xpath within a sports section (in this case, football section). The ‘./’ locates where we’re right now, in this case, we’re in ‘sport_title’ (see picture below). If we write‘./..’ within parentheses, we obtain the ‘parent node’ of sports_title. We do this twice to get the grandparent.’ We need this grandparent variable to limit the scrape only to the football section.

Looking for single row events

Breaking down the code:

  • single_row_eventsrepresents each event. Usually, each event has 1 row
  • grandparent.find_elements_by_class_name() helps us find all the football matches/events within the ‘grandparent’ node (in this case football section). To find the code behind a match do this:

Once the code is highlighted, look for the class name. It should be named ‘EventRow-styles-event-row’

Getting data: Team names and ‘odd_events’

Breaking down the code:

  • for match in single_row_events loops through all the matches inside the ‘single_row_events’ list
  • odd_events represent each event with odds available
  • match.find_elements_by_class_name(‘EventOddGroup-styles-odd-groups’) helps us find all the ‘odds_event’ within every match. To find the code behind the ‘odds box’ do this:

Once the code is highlighted, look for the class name. It should be named ‘EventOddGroup-styles-odd-groups’

  • for team in match.find_elements_by_class_name(‘EventTeams-styles-titles’) loops through the elements with class name ‘EventTeams-styles-titles' within the ‘match’ node. Matches have 2 teams (home and away team); we’ll be looping through them. To find the code behind ‘team names’ do this:

Once the code is highlighted, look for the class name. Although the class name highlighted is ‘EventTeams-styles-event-teams EventTeams-styles-additional-margin’, you shouldn’t pick this one because it’ll give you the names of two rows (team names + ‘half time’) in case you’re scraping live games. Instead, pick the class name below that says ‘EventTeams-styles-titles’

  • team.text gets the text attribute inside the team element
  • teams.append(team.text) stores the team names on theteamslist we created in the beginning

Getting data: The odds

Breaking down the code:

  • for odd_event in odds_events loops over the ’n’ matches on the website.
  • enumerate(odd_events)counts the number of elements in the odd_eventslist while looping. That is, it counts the ‘odds boxes’ from left to right starting with the number ‘0.’
  • for n, box in enumerate(odds_events)loops through all ‘odds boxes’ inside a match. As I showed you before, there are 3 boxes: ‘3-way,’ ‘Over/Under’ and ‘Handicap.’ We’re going after ‘3-way,’ this time.
  • rows represents the number of rows within the odd boxes. Remember they could have 2 rows if you’re scraping live matches.
  • box.find_elements_by_xpath(‘.//*’) gives the child nodes inside each ‘odds box’. This gives a list with 1 row (when scraping upcoming matches) or 2 rows (when scraping live matches)
  • n==0 means ‘only take values from the first box’ which is the ‘3-way’ box (1x2)
  • rows[0] tells Python ‘only pick the first row on each odds box.’ With this, we ignored the second row in case you’re scraping live matches.
  • x12.append(rows[0].text) stores the ‘3-way’ odds to thex12 list which we created at the beginning
  • driver.quit()closes the browser

Congratulations! You just scraped your first website

Final Step

If you’d like to scrape websites without getting blocked, check this article:

Full Code

Would you like to know more about how to beat the bookies? If so, check the article below.

If you want to learn Python in Spanish, subscribe to my YouTube channel. Every week I publish videos like the one below.

Final thoughts

The Startup

Get smarter at building your thing. Join The Startup’s +724K followers.

Frank Andrade

Written by

Get my FREE Python for Data Science Cheat Sheet I use for all my tutorials [PDF] 👉 https://frankandrade.ck.page/bd063ff2d3

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +724K followers.

Frank Andrade

Written by

Get my FREE Python for Data Science Cheat Sheet I use for all my tutorials [PDF] 👉 https://frankandrade.ck.page/bd063ff2d3

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +724K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store