Automating the ‘big count’ and breaking down the numbers geographically
This week we’ve been working on pulling together all the different ways of counting the people we support, with the aim of creating an up-to-date dashboard showing the total number. The goal is to have this dashboard visible and easily accessible, so staff and partners can see our progress towards our 2020 strategic target.
Some of the contributing numbers are already available in regularly updated Google Sheets used within Good Things Foundation. These were the easiest to incorporate — a simple IMPORTRANGE function was all that was needed. Other sources are more complicated…
Internal data from our databases
To get up-to-date counts of registered users on our platforms, we use a PHP client for Google Sheets manipulation. We modified an existing PHP script (used to report on individual projects) to allow it to pull the required numbers from our MySQL databases, and post them to our dashboard Google Sheet automatically. This could then be run as often as we like, or automated using cron, so that we could get updated numbers every month, every week, or even every day if we wanted to.
How did we count users in our databases?
As discussed previously, we want to be careful not to count duplicate registrations where people have signed up for the same service more than once, or different services like LearnMyWay and EnglishMyWay.
First off, we chose to ignore any registrations in our LearnMyWay database who shared the same first name, surname and CentreID number as any other user. We assume these are duplicate registrations, but it is entirely possible that we could be removing genuinely unique people who happen to have the same name at the same centre. We also won’t be removing duplicate registrations where the user has registered at different centres.
We decided not to do exactly this for EnglishMyWay users, because it is quite common for several unique real people to be registered at a single EnglishMyWay centre who share the same first and second names. Instead we use users’ date of birth to de-duplicate in a more restrictive way, i.e. if a user shares the same first name, surname, CentreID and date of birth as another user, we consider them a duplicate. Unfortunately the D.O.B data aren’t entirely complete, so we may be missing further duplicates where D.O.B isn’t supplied.
Finally, we wanted to ensure we weren’t double counting people who we supported through both EnglishMyWay and LearnMyWay. These platforms work with separate databases, so if a user progressed from EnglishMyWay to LearnMyWay (for example), we’d have completely separate entries for them on each platform. To avoid this double counting, we chose to also ignore any EnglishMyWay learners who share a first name, surname and CentreID with a user on LearnMyWay. Again, this might remove genuine unique people who happen to share names and CentreIDs, and it won’t remove duplicates of the same person who sign up for EnglishMyWay and LearnMyWay at different centres (or outside of centres altogether).
Counting by geography
Good Things Foundation operates on a global scale, with a huge Online Centres Network in the UK, and the rapidly expanding Be Connected Network in Australia. We’ve also run pilot programmes of LearnMyWay delivery in Kenya, the Philippines and the Republic of Ireland. It therefore makes perfect sense that we’d like to be able to break down our big count towards 3 million people into geographical regions.
We assume all our project delivery prior to April 2015 was conducted in the UK, and we also assume that all the beneficiaries we count through funded projects outside of LearnMyWay (see last week’s notes) are also based in the UK. We also count people through the Basic Digital Skills Survey (BDSS). This is sent to Online Centres and gauges how many people they are helping with basic digital skills outside of our platforms. The BDSS is UK focussed (in fact, it’s restricted to England only), so we know where to allocate those numbers to as well. The Australian equivalent of the BDSS, the Partner Survey, obviously counts Australian beneficiaries.
Again, our internal databases is where things get a little more complicated. Anyone in the world can sign up to LearnMyWay, so how do we decide which region to count them in? We filter using the following criteria:
- If the user is attached to a centre with a UK address, or logs in with a UK IP address or an IP address with an unknown location, or is an EMW user — we assume they’re in the UK.
- If the user is attached to a centre with an Australian address, or logs in with an Australian IP address, or signs up through the BeConnectedNetwork website — we assume they’re in Australia.
- If the user is attached to a centre in Kenya, or logs in with an Kenyan IP address — we assume they are in Kenya.
- If the user is attached to a centre outside of the above regions (e.g. Ireland), or a centre with an unknown location, or logs in with an IP address which locates them somewhere outside of the UK, Australia or Kenya, we assume they are located somewhere else, and class them as “Other”.
This allows us to see a clear breakdown of where our beneficiaries come from.
How do we differentiate between spam users, genuine deleted users or duplicated users? We do get some website registrations that appear to be spam robots. Should we be looking at a comprehensive method for not counting these?
We also have some users who request their accounts be deleted from our systems. When we ‘delete’ them, we want to remember the fact that somebody registered with us, so instead of literally deleting their entry in the database, we completely anonymise their data so they become just a number, with no traceable personal information.
On top of this, we often have requests to ‘delete’ accidental duplicate accounts. Can we find a suitable method to differentiate these from the genuine deletions above, so we know not to count them?