Scraping data of Football players using Python and use that for making a little React website

We will use beautifulsoup4 for scraping, peewee as our ORM, postgres as database, flask for our API and React.js for frontend.

4 min readApr 17, 2017

I need to tell you I am a football fan, so from there I get the idea of doing something like this. I adore these players, So I thought I can make a tutorial out of it.

First thing first, we need to setup our environment in which we will run our code. So, first install virtual environment wrapper if you don’t have one. Open you terminal and type:

pip install virtualenvwrapper
mkvirtualenv footballers
mkdir footballers
cd footballers
workon footballers

Above, we have created an environment named footballers where we will install python packages . After that we have created a folder footballers and changed directory. Then we have switched on our working environment tofootballers. Now we will install required packages for scrapping data.

pip install beautifulsoup4
pip install lxml
pip install requests

We will scrap data from: sofifa.com, It is a very reliable website for FIFA ratings of football players. Now you can open your favourite text editor and start writing code. Create a new file app.py .

from bs4 import BeautifulSoup as bs
import requestsurl = 'http://sofifa.com/players?offset=0'

Here we are importing all the required python packages, then we are giving the website url to the url variable.

def soup_maker(url):
    r = requests.get(url)
    markup = r.content
    soup = bs(markup, 'lxml')
    return soup

We are defining a function named soup_maker which will accept url and then it will do a GET request to get the html content. After that we will pass it to the beautifulsoup which will return bs object which will use for finding player names.

When you will open sofifa.com you will see list of players with their ratings and some other info on the website and when you click on any player it will take you to that respective player’s page. So here we will try to scrap all the urls of each player first. so we can get their all information which is on their respective page.

def find_top_players(soup):
    table = soup.find('table', {'class': 'table-striped'})
    tbody = table.find('tbody')
    all_a = tbody.find_all('a', {'class: ''})
    return([‘http://sofifa.com' + player[‘href’] for player in all_a])

Wow, what’s happening! find_top_players() is accepting soup which we have got from soup_maker when we passed that url . Open sofifa.com, right click on first player name and then select Inspect .

In the above image, we can clearly see players name is link which takes us to its respective page. So, the innermost markup is <a> tag, which is inside a <td> tag and again which is inside of <tr> tag and so on.

<table class="table-striped">
  <thead>…</thead>
  <tbody>
    <tr>
      <td>
        <figure>…</figure>
        <a href="/player/231677" class=""></a>
      </td>
    </tr>
  </tbody>
</table>

I tried emulating what I have seen on inspecting the page. So our aim is to get to that href property of a . So first we need to get to this table which has class="table-striped" . So, I tried finding all tables with those attributes and I got only one table, which means there is only one table on that html page with that class. So, thats our 1st line of function is doing.

In 2nd line, I am going for tbody because thead also contains lots of un required information which we don’t need and our primary goal is to get to the anchor tag.

In 3rd line, inside tbody we need to find all anchor tags. So we need to use .find_all which returns a list . First I tried just with 'a’ inside the find_all but I got lots of unwanted result.

In the image you can see, 6 anchor tags out of which only two(1st and last) is needed. So I tried to finding similarities between them. They both have the class attribute, which is blank but every other anchor tags have class attribute. So then on tbody I used find_all with a and class attribute nothing. And as I am getting only half url from href property of a because of that I have to add http://sofifa.com in front of it.

Code Available at: https://github.com/rahul3103/footballers-tutorial/blob/master/scrapper.py

Part 2: Scraping Every Players FIFA Stats

Scraping data of Football players using Python and use that for making a little React website

We will use beautifulsoup4 for scraping, peewee as our ORM, postgres as database, flask for our API and React.js for frontend.

Written by Rahul Shrivastava