Uncovering Prevalence of Hay Fever and Heart Disease in the UK

Published in

Doorda

6 min readJun 27, 2018

Welcome to the 3rd article of the series in which we help all you DataTrekers uncover new ways of gaining the edge over your competitors. Today we will be diving into the UK drugs prescription data.

Data Used

This article features the monthly prescription data as released by the NHS Prescription Services. The data contains information on drugs prescribed by local GP, with the corresponding number of prescriptions and associated cost. Specifically, we will be focusing on three types of drug:

Nasal Allergy Related Drugs: Found under the category`Ear, Nose and Oropharynx`, with the file name `bnf_paragraph_ear_nose_and_oropharynx_201601` in the Doorda database.
Hypertension and Heart Failure Related Drugs: Found under the category`Cardiovascular System`, with the file name `bnf_section_cardiovascular_system_201601` in the Doorda database.
Obesity and Depression Related Drugs: Found under the category`Central Nervous System`, with the file name `bnf_section_central_nervous_system_201601` in the Doorda database.

Connect to Doorda Host (BigQuery)

Before the analysis, we must first collect the data we need. As mentioned above, we will be pulling 3 dataset from Doorda’s database hosted on BigQuery.

# Pulling prescription data on cardiovascular related drugs
projectid= 'beta-203614'query= 'SELECT * FROM Eval_DoordaHealth.bnf_section_cardiovascular_system_201601'cardiovascular= pd.io.gbq.read_gbq(query, projectid)
 
# Pulling prescription data on central nervous system related drugs
projectid= 'beta-203614'query= 'SELECT * FROM Eval_DoordaHealth.bnf_section_central_nervous_system_201601'central_nervous= pd.io.gbq.read_gbq(query, projectid)# Pulling prescription data on ear, nose and oropharynx related drugs
projectid= 'beta-203614'query= 'SELECT * FROM Eval_DoordaHealth.bnf_paragraph_ear_nose_and_oropharynx_201601'allergy= pd.io.gbq.read_gbq(query, projectid)

The following documentation provides an in-depth guide for connecting to Doorda’s database on BigQuery:
https://www.doorda.com/kb/article/connecting-to-doorda-host-with-python.html

Uncovering Hay Fever

Hay fever is one of the most common allergies found in the UK, approximately 20% of the UK population suffer from it. Additionally, our data shows that nasal allergy medications accounted for over 50% of ear, nose and oropharynx related drugs, amongst a total of 11 drugs within the category.

Therefore, our first analysis aims to identify:

The Severity of Hay Fever in Urban and Rural Areas

To do so, we will compare the differences between Urban (LS1, LS2) and Rural (LS8, LS9) in the Leeds area, demonstrating the average number of nasal allergy drugs prescribed (per person) in the respective postal areas.

Note: The number of drugs prescribed only provides an INDICATION to the number of people suffering from the said disease as prescription volume will vary from one person to the next.

Step 1: Geographical Segmentation

The first step of the analysis involves segmenting Leeds into urban and rural areas, which can be done using postcodes.

Urban: LS1 and LS2

Rural: LS8 and LS9

# Urban area (LS1, LS2) 
urban= ['LS1 ', 'LS2 ']allergy_urban= allergy[allergy['postcode'].str.contains('|'.join(urban))]# Rural area (LS8, LS9)
rural= ['LS8 ', 'LS9 ']allergy_rural= allergy[allergy['postcode'].str.contains('|'.join(rural))]

Step 2: Calculate Average Prescriptions per Head

When comparing the number of drugs prescribed, we must also account for the population of the area. Therefore, we divide the number of drugs prescribed by the corresponding population for each postcode, then calculate the weighted average of drugs prescribed per person. Finally, we calculate the overall average for the urban and rural area.

# Divide number of drugs by population
allergy_per_population_urban= allergy_urban['drugs_used_in_nasal_allergy_items']/ allergy_urban['postcode_population']allergy_per_population_rural= allergy_rural['drugs_used_in_nasal_allergy_items']/ allergy_rural['postcode_population']# Calculate the overall average
avg_urban_allergy= allergy_per_population_urban.mean()
avg_rural_allergy= allergy_per_population_rural.mean()

Step 3: Results

The numbers below represent the average nose allergy drugs prescribed per person:

Urban: 0.0039
Rural: 0.0088

The results show that the number of allergy drugs prescribed in rural areas is double the number in urban, this would indicate that an average person in rural area is more likely to suffer from hay fever. Possible reasons include larger exposure to pollen due to the larger planted areas.

Culprit of Heart Disease- Depression or Obesity

Heart disease is one of the most common causes of death in the UK. Many studies have attempted to identify the causes of heart disease, among which smoking, obesity and mental well-being are considered the top causes. Notably, the following statement were recently made by the British Heart Foundation:

'If you're overweight or obese you are more likely to develop coronary heart disease'
'People with severe mental health problems are two to three times more likely to suffer from heart and circulatory disease'

Hence, the second part of our analysis will verify these claims in the context of Leeds. To do so, we ran correlation analysis to identify the relationship between the average number of heart failure drugs, to the average number of obesity and anti-depressants prescribed.

Note: The number of drugs prescribed will only provide an INDICATION as to the number of people suffering from said disease.

Step 1: Geographical Segmentation

Similar to the above, we first group the data by postcodes.

# Filter cardiovascular data by Postcodecardiovascular_LS1= cardiovascular[cardiovascular['postcode'].str.match('LS1 ')cardiovascular_LS9= cardiovascular[cardiovascular['postcode'].str.match('LS9 ')]

# Filter central_nervous data by Postcodecentral_nervous_LS1= 
central_nervous[central_nervous['postcode'].str.match('LS1 ')]central_nervous_LS9= central_nervous[central_nervous['postcode'].str.match('LS9 ')]

Step 2: Preparing Prescriptions per Head Data

In the 2nd step we calculate the average number of prescriptions, taking into account the population. We are also storing them in a list for correlation analysis in later sections.

# Calculate heart disease per population
heart_per_population_LS1= cardiovascular_LS1['hypertension_and_heart_failure_items'].sum()/cardiovascular_LS1['postcode_population'].sum()heart_per_population_LS9= cardiovascular_LS9['hypertension_and_heart_failure_items'].sum()/cardiovascular_LS9['postcode_population'].sum()# Store the data in a list
heart_per_population= [heart_per_population_LS1, heart_per_population_LS2, heart_per_population_LS3, heart_per_population_LS4, heart_per_population_LS5, heart_per_population_LS6, heart_per_population_LS7, heart_per_population_LS8, heart_per_population_LS9]
# Calculate obesity per population
obese_per_population_LS1= central_nervous_LS1['obesity_items'].sum()/central_nervous_LS1['postcode_population'].sum()obese_per_population_LS9= central_nervous_LS9['obesity_items'].sum()/central_nervous_LS9['postcode_population'].sum()# Store the data in a list
obese_per_population= [obese_per_population_LS1, obese_per_population_LS2, obese_per_population_LS3, obese_per_population_LS4, obese_per_population_LS5, obese_per_population_LS6, obese_per_population_LS7, obese_per_population_LS8, obese_per_population_LS9]
# Calculate depression per population
depress_per_population_LS1= central_nervous_LS1['antidepressant_drugs_items'].sum()/central_nervous_LS1['postcode_population'].sum()depress_per_population_LS9= central_nervous_LS9['antidepressant_drugs_items'].sum()/central_nervous_LS9['postcode_population'].sum()# Store the data in a list
depress_per_population= [depress_per_population_LS1, depress_per_population_LS2, depress_per_population_LS3, depress_per_population_LS4, depress_per_population_LS5, depress_per_population_LS6, depress_per_population_LS7, depress_per_population_LS8, depress_per_population_LS9]

Step 3: Correlation Analysis

Two correlation analysis are conducted to identify:

a) Relationship between heart disease and depression

import numpy as np
from scipy.stats import pearsonr
 
# Correlation Between heart disease and depression
r,p = pearsonr(heart_per_population, depress_per_population)
print (r,p)

Significant results (p= 0.0006) is found between heart disease and depression, in which a near perfect positive correlation (r= 0.91) is found.

Note: A positive correlation doesn’t imply that depression causes heart disease, it only means that occurrences of depression tend to be accompanied by heart disease

b) Relationship between heart disease and obesity

# Correlation Between heart disease and obesity
Correlation Between heart disease and obesity
r,p = pearsonr(heart_per_population,obese_per_population)
print (r,p)

An insignificant results (p= 0.08) is found between heart disease and obesity, despite a positive correlation (r= 0.61) being found.

Step 4: Visualisation

Finally, taking everything a step further we’ll visualise the results in a scatter plot.

import matplotlib.pyplot as plt# Create scatter plot for heart disease and depression
plt.scatter(heart_per_population, depress_per_population)plt.suptitle('Correlation between Heart Disease and Depression ', fontsize=13)
plt.xlabel('Heart Disease Drug Prescribed', fontsize=10)
plt.ylabel('Anti-depressants Prescribed', fontsize=10)plt.show()

# Create scatter plot for heart disease and obesity
plt.scatter(heart_per_population, obese_per_population)
plt.ylim(0, 0.0008)plt.suptitle('Correlation between Obesity and Heart Disease', fontsize=13)
plt.xlabel('Heart Disease Drug Prescribed', fontsize=10)
plt.ylabel('Obesity Drug Prescribed', fontsize=10)plt.show()

Overall, significant correlations are found between heart disease and depression, but further research will be required to test their causation relationship.

Could heart disease affect your mental well-being? Or is mental well-being a determining factor for causing heart disease?

Conclusion

This articles utilised a range of different drug prescriptions data to create interesting insights on hay fever and heart disease. However, many more can be obtained by combining other data-sets such as patient demographics and behaviourial data, such as age profile and smoking habits.

At Doorda we have mapped all our statistical data to a postcode level, which means you can easily add more granular insights to your geographical analysis. Our next article will be on crime data, specifically featuring the occurrences of violent which had been dominating headlines.

DRUG PRESCRIPTION DATA IS THE ONLY TIP OF THE ICEBERG OF DOORDA’S HEALTH DATA, AS DOORDA HEALTH ALSO ALLOWS YOU TO DIVE INTO GP PATIENTS PROFILES, ILLNESS PREVALENCE AND GP SERVICE LEVELS

Visit Us at Doorda for more details!

Uncovering Prevalence of Hay Fever and Heart Disease in the UK

Data Used

Connect to Doorda Host (BigQuery)

Uncovering Hay Fever

Step 1: Geographical Segmentation

Step 2: Calculate Average Prescriptions per Head

Step 3: Results

Culprit of Heart Disease- Depression or Obesity

Step 1: Geographical Segmentation

Step 2: Preparing Prescriptions per Head Data

Step 3: Correlation Analysis

Step 4: Visualisation

Conclusion

Visit Us at Doorda for more details!

Written by Vincent Lao