Automating Data Extraction and Visualization of the Saudi Pro League Points Table
In this article, I’ll walk you through a Python project that extracts the latest Saudi Pro League points table from the web, processes the data into a pandas DataFrame, creates a visualization using Plotly, and posts a tweet with the table image. This project showcases web scraping, data visualization, and automation using Python.
Prerequisites
To follow along, you’ll need:
- Basic knowledge of Python programming
requests
,BeautifulSoup
,pandas
,plotly
, andkaleido
libraries installed- A function to post tweets (we’ll use a placeholder function
post_tweet
here)
Install the required libraries:
pip install requests beautifulsoup4 pandas plotly kaleido
Step-by-Step Guide
1. Importing Required Libraries
We start by importing the necessary libraries.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import plotly.graph_objects as go
from post_tweet_v2 import post_tweet
2. Sending a GET Request
We send a GET request to the Saudi Pro League points table URL.
# URL of the league table
url = 'https://www.spl.com.sa/en/table'
# Send a GET request to the URL
response = requests.get(url)
3. Parsing the HTML Content
If the request is successful, we parse the HTML content using BeautifulSoup.
# Check if the request was successful
if response.status_code == 200:
# Parse the HTML content of the page
soup = BeautifulSoup(response.content, 'html.parser')
4. Extracting Table Data
We locate the table, extract headers, and clean them.
# Find the table
table = soup.find('table')
# Extract headers
headers = [header.text.split()[0].strip() for header in table.find_all('th')]
# Clean headers
headers = [header.replace(' ', '') for header in headers]
Next, we extract rows of data and clean the columns.
# Extract rows
rows = []
for row in table.find_all('tr')[1:]:
cols = row.find_all('td')
cols = [col.text.strip() for col in cols]
cols[-1] = '-'.join(cols[-1].split())
rows.append(cols)
# Create a DataFrame
df = pd.DataFrame(rows, columns=headers)
# Print the DataFrame
print(df)
5. Creating a Visualization
We create a Plotly table to visualize the data and save it as an image.
# Create the table
fig = go.Figure(data=[go.Table(
header=dict(values=list(df.columns),
fill_color='paleturquoise',
align='left'),
cells=dict(values=[df[col] for col in df.columns],
fill_color='lavender',
align='left'))
])
# Adjust layout
fig.update_layout(
height=450, # Adjust height
width=1000, # Adjust width
margin=dict(l=10, r=10, t=10, b=10) # Adjust margins to reduce white space
)
# Save the table as an image
fig.write_image("df_image.png", engine="kaleido")
6. Posting the Tweet
Finally, we compose a tweet with the table image and post it using a placeholder function post_tweet
.
twitter_post = """
🏆 Saudi Pro League Points Table 🏆
#SaudiProLeague #Football #Soccer #SaudiArabia #SPL #saudileague
"""
print(twitter_post)
post_tweet(twitter_post, 'df_image.png')
# Show the table
fig.show()
else:
print(f"Failed to retrieve the page. Status code: {response.status_code}")
Here we can see the output and website data below.
Data Posted on Twitter:
Please find the code on github:
Conclusion
This project demonstrates how to scrape data from a website, process it with pandas, visualize it using Plotly, and automate posting to Twitter. Such automation can save time and ensure up-to-date information sharing on social media.
Connect with me on LinkedIn and Twitter for freelance work opportunities in web scraping and Python automation. I’m open to collaborations and new projects!
If you found this article helpful and would like to support my work, consider making a donation via PayPal.
Thank you for reading!