Automating Data Extraction and Visualization of the Saudi Pro League Points Table

Shoeb Ahmed
CodeX
Published in
3 min readMay 28, 2024
Saudi Pro League Logo

In this article, I’ll walk you through a Python project that extracts the latest Saudi Pro League points table from the web, processes the data into a pandas DataFrame, creates a visualization using Plotly, and posts a tweet with the table image. This project showcases web scraping, data visualization, and automation using Python.

Prerequisites

To follow along, you’ll need:

  • Basic knowledge of Python programming
  • requests, BeautifulSoup, pandas, plotly, and kaleido libraries installed
  • A function to post tweets (we’ll use a placeholder function post_tweet here)

Install the required libraries:

pip install requests beautifulsoup4 pandas plotly kaleido

Step-by-Step Guide

1. Importing Required Libraries

We start by importing the necessary libraries.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import plotly.graph_objects as go
from post_tweet_v2 import post_tweet

2. Sending a GET Request

We send a GET request to the Saudi Pro League points table URL.

# URL of the league table
url = 'https://www.spl.com.sa/en/table'

# Send a GET request to the URL
response = requests.get(url)

3. Parsing the HTML Content

If the request is successful, we parse the HTML content using BeautifulSoup.

# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')

4. Extracting Table Data

We locate the table, extract headers, and clean them.

    # Find the table
table = soup.find('table')

# Extract headers
headers = [header.text.split()[0].strip() for header in table.find_all('th')]

# Clean headers
headers = [header.replace(' ', '') for header in headers]

Next, we extract rows of data and clean the columns.

    # Extract rows
rows = []
for row in table.find_all('tr')[1:]:
cols = row.find_all('td')
cols = [col.text.strip() for col in cols]
cols[-1] = '-'.join(cols[-1].split())
rows.append(cols)

# Create a DataFrame
df = pd.DataFrame(rows, columns=headers)

# Print the DataFrame
print(df)

5. Creating a Visualization

We create a Plotly table to visualize the data and save it as an image.

    # Create the table
fig = go.Figure(data=[go.Table(
header=dict(values=list(df.columns),
fill_color='paleturquoise',
align='left'),
cells=dict(values=[df[col] for col in df.columns],
fill_color='lavender',
align='left'))
])

# Adjust layout
fig.update_layout(
height=450, # Adjust height
width=1000, # Adjust width
margin=dict(l=10, r=10, t=10, b=10) # Adjust margins to reduce white space
)

# Save the table as an image
fig.write_image("df_image.png", engine="kaleido")

6. Posting the Tweet

Finally, we compose a tweet with the table image and post it using a placeholder function post_tweet.

    twitter_post = """
🏆 Saudi Pro League Points Table 🏆
#SaudiProLeague #Football #Soccer #SaudiArabia #SPL #saudileague
"""

print(twitter_post)
post_tweet(twitter_post, 'df_image.png')

# Show the table
fig.show()
else:
print(f"Failed to retrieve the page. Status code: {response.status_code}")
Points Table on Official Website

Here we can see the output and website data below.

Output of the code.

Data Posted on Twitter:

Automated Tweet

Please find the code on github:

Conclusion

This project demonstrates how to scrape data from a website, process it with pandas, visualize it using Plotly, and automate posting to Twitter. Such automation can save time and ensure up-to-date information sharing on social media.

Connect with me on LinkedIn and Twitter for freelance work opportunities in web scraping and Python automation. I’m open to collaborations and new projects!

If you found this article helpful and would like to support my work, consider making a donation via PayPal.

Thank you for reading!

--

--