Web Scrapping: the PISA 2015 Rankings

Ali Wu
Ali Wu
Sep 2, 2018 · 2 min read

The results of PISA 2015 (Programme for International Student Assessment) have been released on December 6th, 2016. This cycle mainly focused on the science domain, while it also measured students’ mathematics and reading performance. The rankings of each domain have been updated in Wikipedia . When browsing the website, the ranking table is in the PISA 2015 headline session. Here, I am going to demonstrate how to extract the ranking table from the Wikipedia website by using the “rvest” package in R.

The functions are used in the “rvest” are: (1) read_html( ), (2) html_nodes ( ) , and (3) html_table ( ). read_html is to read HTML; html_nodes is to find the first node matches a selector; html_table is to extract a content and parse a table into a dataframe structure . Prior to beginning to extract the ranking table, I use selectorgadget to find the table . If you haven’t heard about selectorgadget, please visit their website to watch further instructions

From a below image, “selectorgadget” selects tables. The selector matches “table” that I want, and 12 table nodes are shown in a “Clear” box (Figure 1).

Figure 1. how to use “selectorgadget” to selects tables.

After using”selectorgadget” to find the CSS selector- “table”, I used the “rvest” package to extract the table from the website (Figure 2). Please note there are 12 tables, but I only extracted the rankings table, which was combined with tables[3], table[4], and table[5]. tables[3], table[4], and table[5] are in the line 6, 8, and 10 (Figure 2).


Figure 2. Using “rvest”package to extract the PISA 2015 rankings

I took a screenshot of the table in an R environment (Figure 3). This screenshot shows rankings, countries, mathematics scores and science scores.

Figure 3. A segment of the table
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade