Obtaining live cricket scores from Cricbuzz via web scraping with bs4 and requests

Vasudevan Swornampillai
4 min readFeb 18, 2024

--

Image of a player playing cricket shot

In the realm of cricket enthusiasts, staying updated with live scores is a thrilling pursuit. Cricbuzz, a popular cricket website, offers real-time updates on matches around the world. However, extracting this data manually can be a tedious task. This is where web scraping comes into play.

Web Scraping for Live Scores:

Web scraping involves using automated tools to extract data from websites. In this case, we’ll be using BeautifulSoup and Requests libraries in Python to fetch live scores from Cricbuzz.

Features:

  • Scrapes data from the Cricbuzz live scores page.
  • Extracts tour details, team details, score,result and commentary for each match.
  • Select a particular team to view the results indiviually.
  • Remove a particular team from the result
  • Prints the extracted data to the console.

Workflow Diagram

The workflow diagram displayed below demonstrates how to use web scraping to obtain live cricket scores from Cricbuzz.

Flowchart demonstrating workflow of web scraping to obtain live scores from Cricbuzz

Setting the Stage:

Before we dive into the code, let’s understand the libraries we’ll be using. BeautifulSoup is a parsing library that helps us navigate and extract data from HTML documents. Requests, on the other hand, allows us to make HTTP requests to retrieve web pages.

Importing necessary libraries and Crafting the request

The first step is to import the necessary libraries and construct a request to fetch the live scores page from Cricbuzz. We’ll use the requests.get() method to retrieve the HTML content of the page.

import requests
from bs4 import BeautifulSoup
import time

# Get the webpage

url = "https://www.cricbuzz.com/cricket-match/live-scores/recent-matches"

response = requests.get(url).text

Parsing the HTML:

Once we have the HTML content, we’ll use BeautifulSoup to parse it and create a navigable object.

# Parse the HTML content
soup = BeautifulSoup(response, "lxml")

Locating the Live Scores Section:

Next, we’ll use BeautifulSoup’s find_all() method to locate the specific section of the webpage that contains the live scores.

# Finding live scores section 
matches = soup.find_all('div', class_ ="cb-col cb-col-100 cb-plyr-tbody cb-rank-hdr cb-lv-main")

Extracting the Data:

Now comes the fun part — extracting the live scores! We’ll iterate through each live score section and use BeautifulSoup’s methods to extract the match title, team names, and scores.

# Extracting the data
for match in matches:
tour = match.find('h2', class_="cb-lv-grn-strip text-bold cb-lv-scr-mtch-hdr").text

team = match.find('h3' , class_ ="cb-lv-scr-mtch-hdr inline-block").text

score = match.find('div' , class_ ="cb-scr-wll-chvrn cb-lv-scrs-col").text

Providing a select_match feature :

Users are provided an additional feature that allows them to receive details and scores for a certain cricket match while excluding details from all other matches from the extracted data.This is accomplished by creating a get_match() function and calling it in the main program.

# Getting details of a particular match 
print("Enter the matches you want to see")
get_match = input('>')
print(f'Get match:{get_match}')
if get_match in tour:


print(f'''
Tour Details: {tour.strip()}
Team Details: {team.strip()}
Score: {score.strip()}
''')

Providing a filter_match feature:

Users are provided an additional feature that allows them to remove details and scores for a certain cricket match while retaining details from all other matches from the extracted data.This is accomplished by creating a remove_match() function and calling it in the main program.

# Removing a certain match from the extracted data
print("Enter the matches you dont want to see")
remove_match = input('>')
print(f'Remove matches:{remove_match}')
if remove_match not in tour:


print(f'''
Tour Details: {tour.strip()}
Team Details: {team.strip()}

Score: {score.strip()}
''')

Providing a live commentary feature:

In addition to the previously listed features, live commentary is now available. The program extracts and prints the link to the live commentary of the match by analysing and locating the href value in the blocks.

#Live commentary feature 
matches = soup.find_all('div', class_ ="cb-col cb-col-100 cb-plyr-tbody cb-rank-hdr cb-lv-main")
for match in matches:

target_div = soup.find('div', class_="cb-col-100 cb-col cb-schdl")
link_element = target_div.find('a')
href_value = link_element['href']
ahref_value = "www.cricbuzz.com" + href_value
print("Commentary:", ahref_value)

Screenshot of the Final Output:

The script will print the extracted data for each live match to the console. The output will be similar to the following:

For Filter match option

Filter match output image

For Select match option

Select match output image

Web scraping acts as the powerful tool of the digital age, altering how we obtain and use online information for business analytics.Through the successful implementation of web scraping techniques, we have harnessed valuable data that would have otherwise remained inaccessible. This project has exemplified the power of automation in extracting meaningful information from the vast expanse of the internet.

As we stand at the culmination of this endeavor, it is imperative to reflect on the words of the renowned computer scientist, Grace Hopper:

“The most important thing about a computer is that it can be programmed to do what you want it to do.”

This quote encapsulates the essence of our project, where we have programmed computers to extract and process data according to our specific requirements.You can get the Python codes for these visualizations from my Github.

--

--