Eight things you need to know about working with statistics if you aren’t a statistician

In August 2014, Ban Ki Moon, the Secretary General of the United Nations asked an Independent Expert Advisory Group to make concrete recommendations on bringing about a data revolution in sustainable development. While the group has published their report and recommendations there is still a long road to travel to ensure that we deliver the data revolution.

I urge all partners and stakeholders to work together to ensure that the necessary investments are made, adequate technical capacity is built, new data sources are explored and innovative processes are applied to give all countries the comprehensive information systems they need to achieve sustainable development. — Ban Ki Moon

This move to improve and then galvanize action with data has helped to raise Statisticians to Rock-Star status in the development world. However, if we ask the Statisticians to step to one side for a moment, we all need to get-to-grips with some of the nuts and bolts so that we can make sure our work is generating good data which inform good decisions.

This may sound like a no brainer, but not everyone is super numerate. In fact, most people aren’t that great with numbers. This isn’t becuase they are incapable but probably somewhere along the line they started telling themselves a story about not being good with numbers and they now believe that story - it’s a self-fulfilling prophecy. You get the idea? On top of this, many people don’t have the opportunity to learn how to work with, and understand, numbers.

Numbers are not my day-job.

If like me, you have a casual relationship with numbers but you want to respect them and treat them right, here are eight things that should help you navigate the course!

Correlation is not the same as causation

This is a no-brainer, a question of cause and effect. Just because you built a network of schools in region X and the average age of first pregnancy increased, you should not assume that you get all the kudos. Maybe the average age was going to increase anyway and maybe it’s down to other factors which you haven’t think of or didn’t control.

For loads more very funny but totally spurious correlations check out this great website. The best way to ensure that the interventions that you plan and implement are based on good evidence, preferably attained through randomized controlled trials.

Compare apples with apples

Forget about comparing apples and oranges. When you’re making big policy decisions at the population level you need to be absolutely sure you know what data you’re working with. Sounds obvious right? WRONG. The work of the Department for International Development is a prime example. Expenditure by the Department is enshrined in law at 0.7% of Gross National Income (GNI). This is not the same as Gross Domestic Product (GDP) which is the much more frequently used and quoted measure you hear about on the news. So what’s the difference? Quite a lot actually.

  • The Gross National Income (GNI) is the total domestic and foreign output claimed by residents of a country, consisting of Gross Domestic Product (GDP) after factoring incomes earned by foreign residents, minus income earned in the domestic economy by non-residents (Todaro & Smith, 2011: 44).
  • Gross Domestic Product (GDP) is the monetary value of all the finished goods and services produced within a country’s borders in a specific time period. Though Gross Domestic Product is usually calculated on an annual basis, it can be calculated on a quarterly basis as well (Investopedia 2016).
Based on 2013 data (Google accessed 2016)

The budget of the Department for International Development (DfID) is frequently misquoted in the media as being based on Gross Domestic Product rather than Gross National Income. In financial terms there is a massive difference. The figures are huge and the Gross National Income based budget in 2013 was worth over 1.5 Billion US Dollars less than if the allocation was based on Gross Domestic Product. This is just one example of why it it critical to ensure that you are comparing like with like. There are a myriad of other ways to assess economic performance and you just need to make sure you are deploying them consistently. The same is true when talking about poverty (are you talking about absolute or relative poverty) or disability… is the same definition employed to gather and calculate statistics. So make sure you are comparing apples with apples and be tuned in to whether a measure is being used to cast a particularly positive or negative hue on the situation. A personal favorite is the use and abuse of differing methodologies to report on the gender pay gap (i.e. how much more men get paid compared to women) there are two standard methodologies which employ either the mean of the median and they can be used to dramatic effect to amplify or downplay the extent of the problem.

What is it?

Why should you care?

When we don’t specifically measure something we tend not to take acocunt of it in our decision mkaing. For too long we have not accuratly measured disability in our policy and programming. This has resulted in people with being left behind. Their needs have not been taken into account properly. But, take just one statistic and think about it. One in three women will experience violence in her lifetime. This is a shocking and compelling statistic which should be a call to action to all. But it doesn’t tell the full story. It we don’t break this inforation down (disaggrgate it) we can’t see that women in South Sudan are more than twice as likely to experience violence than the global average. We also can’t identify the areas where violence is less to see what might be working well there to prevent violence. So being able to break the information down allows us to have a much more complete picture of what is going on so we can make better decisions.

I have produced a resrouce library for people wanting to get to grips with data disaggrregation. Why not check it out:

Find a good baseline for comparison

You will hear about staggering growth in many African economies with huge percentage terms quoted when describing the expansion. It sounds like great news but doesn’t tell the whole story.

Based on 2013 data (Google accessed 2016)

If if I told you Ghana’s economiy was growing 4 times as fast as that of the USA (it isn’t) you might infer that to mean that they were doing a lot better economically . But they are starting from a low base! 7% of 100 is a lot less that 2% of a million. You get me? While this is a simplified explanation, beware when you see rates of growth or change or ratios as they may not give you the big picture. So that you can see what I mean have a look at the graph below where I compare the GDP of Ghana to the USA for 50 years or so.

Is it statistically significant?

In statistical hypothesis testing, you get statistical significance (or a statistically significant result) when a p-value is less than the significance level (denoted α, alpha). Does that clear things up? Thought not. Basically asking if a finding is statistically significant is the same as asking how likely is it that these results are a random fluke?

There is a fairly long and detailed answer and explanation here. In fact people write books on it. Let me give you the basics. This what you need to know to get by.

You’re going to find two technical sounding terms when people are presenting large data sets. They are Statistically Significant and p-value. For the p-value, small is beautiful. It allows you to turn the volume down on the statistical noise and filter out some of the randomness that comes with trying to measure the things in the real world. As a rule of tumb, you want the p-value to be less that 0.05. This is a commonly used value in journals and academic papers. So if its good enough for them, its good enough for us.

So, if you’re reviewing a paper or some results and want to seem big and clever why not ask;

These data look compelling but are they statistically significant?

If it smells fishy then approach with caution

That is to say there should always be some scope for common sense and if statistical results looks too good to be true they may very well be or at least the methodology used to calculate them may be particularly weighted to deliver a particular outcome. Statistics may be scientific but don’t park your commonsense at the door. You have good instincts so use them!

Data is a means to an end and not an end in and of itself

With such a push to increase the production of data and so many wonderful visualisations it is important not to lose sight of fact that it should spurn action. This isnt to say there is no utility in production. Transparency and accountability are vital and so it is important to produce it and make it available but don’t stop there.

Let me give you an example. We have been pushing for the production of more data which can be disaggrgated on the basis of disability. I was reading some programme documents from a multilateral organisation, which shall remain nameless, which was delivering humanitarian assistance in Malawi following the floods in 2015. The documents declared the number of people with disabilities which had been reached as an outcome. The number of people with disabilities was reported as being in the tens of thousands. When I totted up the figures for the total number of people the programme was meant to reach I could see that the proportion of people with disabilities reached was around 1.5%. This means that the programme was, in fact, grossly underserving people with disabilities.

The problem is that producing the data was seen as an end in and of itself. Clearly no one had then understood the data and seen that it meant their programme must have some pretty big problems. Always ask, ‘what does this tell me?’ ‘what should I do differently now?’

Not everything that counts can be counted, and not everything that can be counted counts

Numbers are awesome. Indicies are awesome. But, don’t forget that there are people involved. How can you measure the connection of people to land and how their knowledge of the earth beneth their feet is is intrinsictly linked to their sense of identity. How can you do this without resorting to horrible terms like cultural assets? How can you measure the impact of conflict on a child without talking about lost productivity, the costs of psycological interventions or something else equally mechanical? Remember that we are social beings. We work to serve others. They have voices. They have a right to be heard. Just becuase you havent worked out how to quantify or measure it, don’t lose sight of the value it could have to others and to future generations.

Further Information




We are all innately social beings. We live in an increasingly populated and mobile environment.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How User Data can Influence AirBnB Ratings

How to Deal with Missing or NA values in the Dataset

Overview of Data Science Portfolios

Predictive analytics in healthcare

Preprocessing data for Predicting Online Shoppers Purchasing Intention|ML

Data driven X+O insights with Qualtrics and SAP IBP — Part 1

Mental Wellness in the Workplace Prediction

Visit global landmark attractions with MDEX! Get $500 in $MDX Halloween rewards

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Fiach O'Broin-Molloy

Fiach O'Broin-Molloy

We are all innately social beings. We live in an increasingly populated and mobile environment.

More from Medium

Data Science | Modular Arithmetic

Confidence level and confidence interval

Statistics — An Intuitive Introduction — I (Variables, Data & Measurement Levels)

Getting Started with R and RStudio