Scraping All-Party Parliamentary Groups in R with parlygroups

Noel Dempsey
Analytics Vidhya
Published in
5 min readDec 4, 2019
Photo by Phil Dolby on Flickr

Please note: I’m writing this in a personal capacity. Any views expressed are not those of my employer.

Over the last few months, I have had some fairly simple and straightforward questions from MPs and their staff about All-Party Parliamentary Groups (APPGs).

They asked: “Who holds the largest number of officer roles on APPGs, and what is the value of funding APPGs have received from external sources?” Shortly followed by a: “Can I have this as soon as possible?”

Unfortunately, answering these questions was far from quick or simple — the register containing details of APPGs is only published by Parliament as a web page or a PDF document. Out came the pen and paper. No neat Excel file for me! A couple of hours later and I had my answer: for them, a nice clean number that they could use; for me, the realisation that this clearly wasn’t the best way for me to work in the long term.

Dreading the thought of future me having to repeat the laborious task of sifting through page after page of the APPG register I decided to do what any inherently lazy person would do — get a machine to do it for me. Luckily dissolution was right around the corner and this gave me the perfect opportunity to start writing some code.

A couple of weeks and a few head bangs against a wall later I can proudly introduce parlygroups — an R package which has all the functions needed to effortlessly scrape the contents of that pesky APPG register into nice tidy data tables. Let’s have a look at just how quick and easy it now is to answer those same questions…

First things first, let’s install the parlygroups package and download the APPG register. For this walkthrough, we will use the latest available register which is dated 5 November 2019. We need to use the download_appg() function to get the register into R, and all it needs to work is the register date in ISO 8061 format — “2019–11–05”. There are two additional arguments you can use, pause, which is the number of seconds to wait in between scraping each APPG page, and save, which lets you save the scraped data on your machine, although these aren’t mandatory.

Once the register has been downloaded (there’s a handy little progress bar telling you how long you have left to wait) we can finally have a look at some data. Our first question was “Who holds the largest number of officer roles on APPGs?” To help answer this all we need is the appg_officers() function which will return a table showing the details on the names of MPs and Lords who are officers for each APPG.

Now we have a table full of data we can write a few more lines of code to see which MP or Lord is the most prolific APPG officer. The functions within the dplyr package are great for this kind of data wrangling — just take our officers table group all the names of MPs and Lords in the officer_name column, then tally them up and arrange in descending order.

Ha! So, it turns out that Sir Peter Bottomley holds the largest number of officer roles. With very little tweaking we could work out which political party has the most officers, the MP or Lord who holds the fewest officers roles, or anything in between.

Our second question was “What is the value of funding APPGs have received from external sources?” Answering this is now just as simple. Let’s use the appg_financial() function as it will give us a table showing all the APPGs which have received funding from external sources.

Here we can see that there were 198 instances where an external source, be that a company, a charity or some other organisation, gave funding to an APPG. Now, we could simply sum the financial_value column to get our answer (£1,517,997) although let’s go one step further and get the total funding received by each group and order them highest to lowest.

It turns out that the APPG on the Fourth Industrial Revolution has received the most funding from external sources. If now you wanted to investigate who exactly was funding this group it would be dead easy to do so — all you’d have to do it is call the appg_financial() function again and supply the name of the group to the appg argument.

These two simple questions took what felt like a lifetime answering first time round. Now it can be done within minutes. This is definitely going to make my life a whole lot easier in the New Year!

If you have any comments or suggestions on how I could improve the parlygroups let me know through leaving a comment here or raising an issue on Github.

--

--

Noel Dempsey
Analytics Vidhya

Data analytics consultant — Twitter: @yespmedleon Github: /dempseynoel