Published in

Data Science Project Development

26 min readMar 2, 2021

Explore the CKD Patient Survival Data such as 90 day survival

Related Datasets: Check some of the output from code to get an idea on how the dataset looks like

I will utilize multiple related datasets

All chapters of the 2018 USRDS ADR are accessible at this webpage:
https://www.usrds.org/2018/view/Default.aspx

Primary Dataset: Patient Characteristics i.e diagnostic results for Kidney/CKD/Renal patients https://www.usrds.org/2018/ref/ESRD_Ref_C_PatientChars_2018.xlsx From: https://www.usrds.org/reference.aspx

Other Closely Related Datasets: Chronic KIdney Disease dataset https://www.kaggle.com/mansoordaku/ckdisease Data has 25 features which may predict a patient with chronic kidney disease Chronic_Kidney_Disease Data Set https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease Abstract: This dataset can be used to predict chronic kidney disease and it can be collected from the hospital nearly 2 months of period.

Partially Related Datasets: Heart Disease dataset. Might check if this can be used in relation to other datasets https://archive.ics.uci.edu/ml/datasets/Heart+Disease Diabetes Dataset: Might check if this can be used in relation to other datasets https://archive.ics.uci.edu/ml/datasets/Diabetes Remotely Related Datasets: Dialysis Facility Compare https://catalog.data.gov/dataset/dialysis-facility-compare-aa0fa Disease Indicators https://data.world/datasets/chronic-kidney-disease

Reference: Code from Data Visualization Project (also from Labs) is being utilized to some extent (to start with, for language reference). Those were my own code as well Final code will be significantly different and can be treated as new

from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


#from geopy.geocoders import Nominatim

# code from xport library example or from ipwidget code examples - configuration code
# for geo maps
# !conda install basemap
import conda
import os
if 'PROJ_DIR' in os.environ:
    pyproj_datadir = os.environ['PROJ_DIR']
else:
    conda_file_dir = conda.CONDA_PACKAGE_ROOT
    conda_dir = conda_file_dir.split('lib')[0]
    pyproj_datadir = os.path.join(os.path.join(conda_dir, 'Library'),'share')
    os.environ['PROJ_LIB'] = pyproj_datadir
    
#from mpl_toolkits.basemap import Basemap

Location of the Data Files Folder.

Data folder will have data files

import os

data_folder = './data/'
measure_files = os.listdir(data_folder)
measure_files['2018_esrd_Mortality v2_c05_Mortality_18_web.ods',
 '2018_esrd_Mortality v2_c05_Mortality_18_web.xls',
 '2018_esrd_Mortality v2_c05_Mortality_18_web.xlsx',
 'dietician_care_esrd_rf_c_patient_chars_2018.xls',
 'ESRD_Ref_H_Mortality_2018.xls',
 'ESRD_Ref_I_Survival_2018.xls']print('Select a health measure/aspect to visualize\n')
# create the interactive interface
def f(measure):
    return measure

print('Select a measure:')
measure_file = interactive(f, measure = measure_files);
display(measure_file)Select a health measure/aspect to visualize

Select a measure:



interactive(children=(Dropdown(description='measure', options=('2018_esrd_Mortality v2_c05_Mortality_18_web.od…'Selected: ' + measure_file.result'Selected: ESRD_Ref_I_Survival_2018.xls'if (measure_file.result == ''):
    measure_file.result = 'ESRD_Ref_I_Survival_2018.xls'

Load Data and Display

data_file = data_folder + measure_file.result
#measure_data = pd.read_excel(data_file)
#measure_data = pd.ExcelFile('health-status.xls')
excel_file = pd.ExcelFile(data_file)
#measure_data.head()# see all sheet names
sheet_names = excel_file.sheet_names  
sheet_names[:4]['Table of Contents',
 'Footnotes',
 'Unadjusted 90 day survival probabilities incident ESRD patients',
 'I.1.adj']print('Select an aspect\n')
# create the interactive interface
def f(measure_sheet):
    return measure_sheet


measure_sheet = interactive(f, measure_sheet = sheet_names);
display(measure_sheet)Select an aspect




interactive(children=(Dropdown(description='measure_sheet', options=('Table of Contents', 'Footnotes', 'Unadju…print(measure_sheet.result)
measure_data = excel_file.parse(measure_sheet.result)
measure_data.head(20)
# measure_data[measure_data.columns[0]]
#measure_data.loc[24:37]Unadjusted 90 day survival probabilities incident ESRD patients

Find all the performance indicators under this aspect/measure

irrespective we have data for Canada or not

# find details on Indicators
# find all indicators
#measure_data.set_index(['Indicator'])
indicators = sheet_names #pd.Index(measure_data['Indicator']).unique()
print(indicators)['Table of Contents', 'Footnotes', 'Unadjusted 90 day survival probabilities incident ESRD patients', 'I.1.adj', 'I.2.unadj', 'I.2.adj', 'I.3.unadj', 'I.3.adj', 'I.4.unadj', 'I.4.adj', 'I.5.unadj', 'I.5.adj', 'I.6.unadj', 'I.6.adj', 'I.7.unadj', 'I.7.adj', 'I.8.unadj', 'I.8.adj', 'I.9.unadj', 'I.9.adj', 'I.10.unadj', 'I.10.adj', 'I.11.unadj', 'I.11.adj', 'I.12.unadj', 'I.12.adj', 'I.13.unadj', 'I.13.adj', 'I.14.unadj', 'I.14.adj', 'I.15.unadj', 'I.15.adj', 'I.16.unadj', 'I.16.adj', 'I.17.unadj', 'I.17.adj', 'I.18.unadj', 'I.18.adj', 'I.19.unadj', 'I.19.adj', 'I.20.unadj', 'I.20.adj', 'I.21.unadj', 'I.21.adj', 'I.22.unadj', 'I.22.adj', 'I.23.unadj', 'I.23.adj', 'I.24.unadj', 'I.24.adj', 'I.25.unadj', 'I.25.adj', 'I.26.unadj', 'I.26.adj', 'I.27.unadj', 'I.27.adj', 'I.28.unadj', 'I.28.adj', 'I.29.unadj', 'I.29.adj', 'I.30.unadj', 'I.30.adj', 'I.31.unadj', 'I.31.adj', 'I.32.unadj', 'I.32.adj', 'I.33.unadj', 'I.33.adj', 'I.34.unadj', 'I.34.adj', 'I.35.unadj', 'I.35.adj', 'I.36.unadj', '10 year survival probabilities incident living-donor transplant recipients, adjusted for age, sex, race, ethnicity, and primary cause of ESRD']measure_data['Year']

Find years as we can see in the data

All years, also seprately when we have data for Canada

# find all years
import numpy as np
years = [ str(aYear)[:4] for aYear in measure_data['Year'] ]
years = sorted(list(set(years)))
print('Years we have data\n', years)

# sort years for all data
years = [ int(aYear) for aYear in years if (aYear != 'Not applicable') and len( str(aYear).split(' ')) <= 1 ]
years = sorted(years)
years

all_years = [0] + years

Age Groups, Gender, Race, Cause (probably comorbidity)

try:
    age_groups = measure_data['Age Group'].dropna().unique()
except:    
    age_groups = measure_data[measure_data.columns[0]]

age_groupsarray(['0-4', '5-9', '10-13', '14-17', '18-21', '22-24', '25-29', '30-34',
       '35-39', '40-44', '45-49', '50-54', '55-59', '60-64', '65-69',
       '70-74', '75-79', '80-84', '85+', 'Male', 'Female', 'White',
       'Black/African American', 'American Indian or Alaska Native',
       'Asian', 'Native Hawaiian or Pacific Islander',
       'Other or Multiracial', 'Unknown', ' ', 'Hispanic', 'Non-Hispanic',
       'Non-Hispanic White', 'Non-Hispanic Black/African American',
       'Diabetes', 'Hypertension', 'Glomerulonephritis', 'Other cause',
       'All'], dtype=object)

Plot To understand data

# on of June 16, all_years did not work; assigning manually for now

all_years  = []
for y in range (1996, 2017):
    all_years.append(y)
    
all_years[1996,
 1997,
 1998,
 1999,
 2000,
 2001,
 2002,
 2003,
 2004,
 2005,
 2006,
 2007,
 2008,
 2009,
 2010,
 2011,
 2012,
 2013,
 2014,
 2015,
 2016]#print('Select a year\n')
# create the interactive interface
def f(select_a_year):
    return select_a_year

select_a_year = interactive( f, select_a_year = all_years );
display(select_a_year)

plot_by = {}
plot_by['Unadjusted 90 day'] = { 'Age Group' : [0, 19], 'Gender': [20, 23], 'Race':[23, 31], 'Hispanic Status': [30, 37], 'Cause':[37, 43] }
plot_by['10 year survival '] = { 'Age Group' : [0, 5], 'Gender': [6, 9], 'Race':[9, 17], 'Hispanic Status': [30, 38], 'Cause':[24, 32]  }


# print('Select a measure\n')
# create the interactive interface
def f(Plot_by):
    return Plot_by

select_a_plot = interactive( f, Plot_by = plot_by[measure_sheet.result[:17]] );
display(select_a_plot)

measures = {'Data distribution':0}
#print('Select a measure\n')
# create the interactive interface
def f(select_a_measure):
    return select_a_measure

select_a_measure = interactive( f, select_a_measure = measures );
display(select_a_measure)interactive(children=(Dropdown(description='select_a_year', options=(1996, 1997, 1998, 1999, 2000, 2001, 2002,…



interactive(children=(Dropdown(description='Plot_by', options={'Age Group': [0, 19], 'Gender': [20, 23], 'Race…



interactive(children=(Dropdown(description='select_a_measure', options={'Data distribution': 0}, value=0), Out…print( str(select_a_year.result) + str('.') + str(select_a_plot.result))
select_a_plot.result, select_a_year.result, type(select_a_plot) #, measures[select_a_measure.result]1996.[0, 19]





([0, 19], 1996, ipywidgets.widgets.interaction.interactive)# select_a_measure.result, select_a_year.result #, measures[select_a_measure.result]
# print( str(select_a_year.result) + str('.') + str(select_a_measure.result))
pd_data = measure_data[ list(measure_data.columns) ]
pd_data = pd_data.loc[select_a_plot.result[0]: select_a_plot.result[1]]
#pd_data = pd_data.loc[select_a_measure.result]
#print(list(zip(pd_data['Age Group'],  pd_data[prop_field])))

pd_data = pd_data.dropna()
#pd_data = pd_data[pd_data['Age Group'] != 'Unknown']
#pd_data = pd_data[:-1]

#print(list(zip(pd_data['Age Group'],  pd_data[prop_field])))

plt.rcParams['figure.figsize'] = [16, 6]
prop_field = select_a_year.result
pd_data

# for survival data these checks are not required
if ( int(select_a_measure.result) > 0):
    prop_field = str(select_a_year.result) + '.' + str(select_a_measure.result)
try:
    plt.rcParams['figure.figsize'] = [6, 4]
    plt.bar(pd_data['Age Group'], pd_data[ prop_field] );
except:
    plt.rcParams['figure.figsize'] = [3, 4]
    plt.bar(pd_data['Cause-Type'], pd_data[ prop_field] );


# Hispanic Status    
# Cause/comorbidity
# Age Group
# Gender

plt.suptitle('Distribution of Patients for ESRD Survival for year %s\n %s, By  Race \n ' %(select_a_year.result, measure_sheet.result[0:30]));

plt.xticks(rotation=90)
plt.xlabel('\n Race')
plt.ylabel('% Survival')
plt.show()

############### SORT DATA #################
pd_data = pd_data.sort_values(by=[prop_field])
try:
    plt.bar(pd_data['Age Group'], pd_data[ prop_field] );
except:
    plt.rcParams['figure.figsize'] = [20, 6]
    plt.bar(pd_data['Cause-Type'], pd_data[ prop_field] );

plt.suptitle('Distribution of Patients for ESRD Survival for year %s\n %s, By Age Groups \n ' %(select_a_year.result, measure_sheet.result[:30]));
plt.xticks(rotation=90)
plt.xticks(rotation=90)
plt.xlabel('\n Cause/Comorbidity')
plt.ylabel('% Survival')
prop_field #, pd_data1996

# Reference: Code from Data Visualization Project is being utilized to some extent

The Below Code will be removed : older : no longer relevant

Related Datasets

Related Datasets I will utilize multiple related datasets

All chapters of the 2018 USRDS ADR are accessible at this webpage:
https://www.usrds.org/2018/view/Default.aspx

pd_data = pd_data.replace('*', None)
#pd_data = pd_data.dropna()
#pd_data[prop_field]
#pd_data['Age Group']

list(zip(pd_data['Age Group'],  pd_data[prop_field]))
#plt.bar(pd_data['Age Group'], pd_data[prop_field])pd_2012_data_all = measure_data[ ['Age Group', 2012, '2012.1', '2012.2', '2012.3' ]  ] 
pd_2012_data_all = pd_2012_data_all[1:]
#pd_2012_data_all = measure_data[ [2012, '2012.1', '2012.2', '2012.3' ]  ] 


# No care
plt.rcParams['figure.figsize'] = [12, 6]
pd_2012_data_all = pd_2012_data_all.dropna()
pd_2012_data_all = pd_2012_data_all[:-1]
plt.bar(pd_2012_data_all['Age Group'], pd_2012_data_all['2012.1']);
#plt.suptitle('Distribution of Patients who received Dietician Care for year %s\nBy Age Groups' %index[0]);
plt.suptitle('Distribution of Patients for year %s\n who did received dietician care \n By Age Groups' %index[0]);index=[2012, '2012.1', '2012.2', '2012.3']
pd_2012_data_all.set_index(index)
pd_2012_data_all = pd_2012_data_all.dropna()
pd_2012_data_all_t = pd_2012_data_all.transpose()

pd_2012_data_all_t_columns
#pd_2012_data_all_t.columns = pd_2012_data_all_t_columns
pd_2012_data_all_t = pd_2012_data_all_t.rename(columns=pd_2012_data_all_t_columns)

pd_2012_data_all_t.plot.barh(figsize=(10, 10), grid=True);
pd_2012_data_all_t

plt.yticks(range(len(pd_2012_data_all_t[0])), pd_2012_data_all_t[0]);pd_2012_data_all_t[0]

Method to plot over years or only for a year.

#Note: All combinations of UI selections might not work as that will require extensive testing and adjust (adjust with data and real life)

#plt.rcParams['figure.figsize'] = [10, 15]
def plot_measure_by_years(year, indicator, bubble_scale, chart_type = '', ratios=[10, 1], animate=False, provincial_only=False, all_years=all_years, fig_size=[10, 10], sec_fig=True):
    plt.rcParams['figure.figsize'] = fig_size
    
    # redundend code to address a last minute bug
    # countries unique color code
    all_regions = measure_data['Region'].unique()
    region_colors = []
    region_colors_dict = {}
    import random
    random.seed(0)
    for aRegion in all_regions:
        region_colors_dict[aRegion] = np.random.randint(0, 255)
            
            
    # print('year, indicator', year, indicator)   
    
    # keep these, I might need this
    
    """
    plt.ion()    
    f = plt.figure()
    ax = f.gca()
    
    #top box
    f.show()
    """
    
    # benchmark data
    oecd_data = measure_data.loc[  (measure_data['Indicator'] == indicator)   & (measure_data['Region'] == 'OECD average') ]
    #benchmark_value = oecd_data['Value']
    #print(type(benchmark_value), benchmark_value)
    benchmark_value = oecd_data['Value'].tolist()[0]
        
    
    # one improvement that can be made: usually I kept, two plots side by side. where the right one shows country and color
    # as in other cases we do not need the right one, code to hide the right one or just to create and use one subplot is more
    # appropriate
    # if (sec_fig == True):
    fig, axs = plt.subplots(1, 2, figsize=fig_size, sharey=False, gridspec_kw = {'width_ratios':ratios})
    #else:
    #axs = plt.subplots(1, 1, figsize=fig_size )
        
        
    #plt.xticks(rotation=90)
        
    # for one year    
    if ( year > 1 ):
        
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & ( (measure_data['Data year'] == year) & (measure_data['Type of region'].isin(['Country', 'Canada']) ) )   | \
                                         ( (measure_data['Indicator'] == indicator) & (measure_data['Region'] == 'OECD average')) ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                              (measure_data['Data year'] == year) & (measure_data['Type of region'].isin(['Province']))]
        
        #print(indicator_data)
        
        # for country color codes
        if (provincial_only == False):
            c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
            m =  [ x  for x in indicator_data['Region'] ] 
            c_code =  [ x[0:3]  for x in indicator_data['Region'] ]  
        else:
            #province_colors_dict
            c =  [ province_colors_dict[x]  for x in indicator_data['Region'] ]    
            m =  [ x  for x in indicator_data['Region'] ] 
            c_code =  [ x[0:3]  for x in indicator_data['Region'] ]
            region_colors_dict = province_colors_dict
            
        
        
        # this block might not apply to anything for one single year plot
        
        color_as_a_dimension = False
        if color_as_a_dimension == True:
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='^')                                
            count = list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1.0001, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])
            
            
        
        
                
        if chart_type == 'Line': 
            # best to use only for one year
            
            
            
            axs[0].plot(indicator_data['Value'], indicator_data['Region']) #, c=c                 
            axs[0].set_xticklabels(indicator_data['Value'], rotation=90) # can be turned off                        
            # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
            plt.suptitle(indicator + ' For a Year')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Countries/Regions')
            
            fig.savefig('./saved_images_from_visualizations/' + 'line_' +indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
            fig, axs = plt.subplots(1, 2, figsize=fig_size, sharey=False, gridspec_kw = {'width_ratios':ratios})
            axs[0].plot(indicator_data['Region'], indicator_data['Value']) #, c=c                
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 
            
            plt.suptitle(indicator + ' Over a Year')
            axs[0].set_xlabel('Regions/Countries')
            axs[0].set_ylabel('Values')
        
            #fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')
            fig.savefig('./saved_images_from_visualizations/' + 'line_' + indicator.replace(' ', '_')[0:12] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
            
        elif chart_type == 'Bar':
            #fig, axs = plt.subplots(1, 2, figsize=(10, 8), sharey=False, gridspec_kw = {'width_ratios':ratios})
            axs[0].bar(indicator_data['Region'], indicator_data['Value']) #, c=c                
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 
            
            plt.suptitle(indicator + ' Over a Year')
            axs[0].set_xlabel('Regions/Countries')
            axs[0].set_ylabel('Values')
        
            #fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')
            fig.savefig('./saved_images_from_visualizations/' + 'bar_' + indicator.replace(' ', '_')[0:12] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
        elif chart_type == 'Hor Bar':
            #fig, axs = plt.subplots(1, 2, figsize=(10, 8), sharey=False, gridspec_kw = {'width_ratios':ratios})
            axs[0].barh(indicator_data['Region'], indicator_data['Value']) #, c=c                
            #axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 
            
            plt.suptitle(indicator + ' Over a Year')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Regions/Countries')
        
            #fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')
            fig.savefig('./saved_images_from_visualizations/' + 'hor_bar' + indicator.replace(' ', '_')[0:12] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
        elif chart_type == 'Pie':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.barh(indicator_data['Region'], indicator_data['Value'])                
            #ax.pie(indicator_data['Value'], labels=indicator_data['Data year'], autopct="%1.1f%%")
            
            
            axs[0].pie(indicator_data['Value'], labels=indicator_data['Region']) #, c=c                 
            axs[0].set_xticklabels(indicator_data['Value'], rotation=90) # can be turned off                        
            # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
            plt.suptitle(indicator + ' For a Year')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Countries/Regions')
            
            
            
            fig.savefig('./saved_images_from_visualizations/' + 'pie_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
        else:
            
            #print(indicator_data)
            # for country color codes
            if (provincial_only == False):
                c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ]  
                
            else:
                #province_colors_dict
                c =  [ province_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ] 
                #c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
                region_colors_dict = province_colors_dict
            

            #ax.scatter(indicator_data['Data year'], indicator_data['Region'], s=indicator_data['Value'] * bubble_scale, c=c )
            axs[0].scatter(indicator_data['Region'], indicator_data['Value'], s=indicator_data['Value'] * bubble_scale, c=c )
            #axs[0].set_xticks([year])
            #axs[0].set_yticks(indicator_data['Value'])

            plt.suptitle(indicator + ' Over a Year ' + str(year) )
            axs[0].set_xlabel('Regions')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off                        
            
            
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='^')                                
            count = list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1.0001, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])
            
            fig.savefig('./saved_images_from_visualizations/' + 'bubble_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
                    
        #plt.show()        
    else:
        # for multiple years
        # though this method is primarily to use for one year, unless in some specific cases
        """
        plt.ion()    
        f = plt.figure()
        #ax = f.gca()

        #top box
        f.show()
        """
        
        
        
        # redundend code to address a last minute bug
        # countries unique color code
        all_regions = measure_data['Region'].unique()
        region_colors = []
        region_colors_dict = {}
        import random
        random.seed(0)
        for aRegion in all_regions:
            region_colors_dict[aRegion] = np.random.randint(0, 255)

        #list(region_colors_dict.keys())[:5], list(region_colors_dict.values())[:5]


            
            
        #print(all_years)
        for aYear in all_years:
            if aYear == 0:
                continue
                
            if aYear == 'Not applicable':
                continue
                
            # older : original # 
            #indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == aYear) ]
            
            """
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & ( (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) ) )   | \
                                         ( (measure_data['Indicator'] == indicator) & (measure_data['Region'] == 'OECD average')) ]
            
            """
            
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
            if (provincial_only == True):
                indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                              (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]
                
            #print(indicator_data)
        
        
             #print(indicator_data)
            # for country color codes
            #print(region_colors_dict)
            if (provincial_only == False):
                c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ]  
                
            else:
                #province_colors_dict
                c =  [ province_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ] 
                #c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
                region_colors_dict = province_colors_dict
                
                
        
            #c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
            #m =  [ x  for x in indicator_data['Region'] ] 
            
            #print( list(zip(m,c)))
            
            if chart_type == 'Line':                
                # plt.plot(indicator_data['Data year'], indicator_data['Value'], c=c)
                # plt.xticks(indicator_data['Value'])
                
                
                # was here axs[0].plot(indicator_data['Data year'], indicator_data['Value']) #, c=c           
                # was here axs[0].set_xticks(indicator_data['Value'])
                
                # brought from one year
                # best to use only for one year
                #plt.plot(indicator_data['Data year'], indicator_data['Value'], c=c)            
                #plt.xticks(indicator_data['Value'])

                #axs[0].plot(indicator_data['Data year'], indicator_data['Value']) #, c=c                       
                axs[0].plot(indicator_data['Data year'], indicator_data['Value']) #, c=c     
                #axs[0].plot(165, 'OECD', color='Red')
                #axs[0].set_xticklabels(indicator_data['Data year'], rotation=90) # can be turned off                        
                # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
                #plt.suptitle(indicator + ' For a Year')
                #axs[0].set_xlabel('Years')
                #axs[0].set_ylabel('Values')

                fig.savefig('./saved_images_from_visualizations/cancer_mortality_years_country.png')

                #plt.show()

                """
                fig, axs = plt.subplots(1, 2, figsize=(10, 8), sharey=False, gridspec_kw = {'width_ratios':ratios})
                axs[0].plot(indicator_data['Region'], indicator_data['Value']) #, c=c                
                axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 

                plt.suptitle(indicator + ' Over a Year')
                axs[0].set_xlabel('Regions/Countries')
                axs[0].set_ylabel('Values')

                fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')

                plt.show()  
                """
                
                # end of brought from one year
                
                #f.canvas.draw()
                

                
            
                
            elif chart_type == 'Bar':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                plt.bar(indicator_data['Data year'], indicator_data['Value'])
            elif chart_type == 'Hor Bar':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                plt.barh(indicator_data['Region'], indicator_data['Value'])
            elif chart_type == 'Pie':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                #plt.barh(indicator_data['Region'], indicator_data['Value'])                
                ax.pie(indicator_data['Value'], labels=indicator_data['Data year'], autopct="%1.1f%%")
            else:
                indicator_data['Value'] = indicator_data['Value'].div(benchmark_value)
                
                # for country color codes
                c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ] 
        
        
                axs[0].scatter(indicator_data['Data year'], indicator_data['Value'], s = indicator_data['Value'] * bubble_scale, c=c)
                
                plt.suptitle(indicator + ' Over Years \n Values are in multiples of Benchmark (' + str(benchmark_value) + ')' )
                axs[0].set_xlabel('Years')
                axs[0].set_ylabel('Values/Becnhmark Value(' + str(benchmark_value) +')' )
        
                fig.savefig('./saved_images_from_visualizations/transport_mortality_over_years.png')
            
                #plt.show()
            
            
                # show country colors
                axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='o')
                
                count = list(m) #list(region_colors_dict.keys())
                #print(count)
                for j in range(len(c)):
                    axs[1].annotate(count[j],  (1, j*5))
                    
                axs[1].set_xlabel('Country and Colors')
                axs[1].set_ylabel('')

                axs[1].set_xticks([])
                axs[1].set_yticks([])
                
            if (animate):
                plt.pause(0.01)
            #f.canvas.draw()
    
    #plt.show()

Plot over a region for different countries

Some of the plots as can be generated from this method can also be generated using the method above. The use of this method will be mostly averaging over years. However, in the dataset, data for all years might not be available; hence, to make the visualization, the years selected need to make sense consideringreal-world Ideally, I could give options to select multiple years individually and show info to the users what data are there and what are appropriate (I am considering that out of scope for now)

plt.rcParams['figure.figsize'] = [10, 10]
def plot_measure_by_regions(year, indicator, bubble_scale, chart_type = '', ratios=[3,1], provincial_only=False, all_years=all_years):    
    #print('year, indicator', year, indicator)    
    plt.ion()    
    # now f = plt.figure()
    # now ax = f.gca()
    #plt.xticks([])
    # now f.show()
    
    # indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicators[0] ) ]
    # (20, 10)
    fig, axs = plt.subplots(1, 2, figsize=(20, 10), sharey=False, gridspec_kw = {'width_ratios':ratios})
    #plt.xticks(rotation=90)
    
    if ( year > 1 ):
        #indicator_data = indicator_data [ indicator_data['Data year'] == year]
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year) ]
        
        # for country color codes
        c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
        m =  [ x  for x in indicator_data['Region'] ] 
        c_code =  [ x[0:3]  for x in indicator_data['Region'] ]  
    
    
        #print(indicator_data)
        if chart_type == 'Line':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.plot(indicator_data['Region'], indicator_data['Value'])
            
            
            axs[0].plot(indicator_data['Region'], indicator_data['Value']) #,  s=indicator_data['Value'] * bubble_scale, c=c )
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            
            fig.savefig('./saved_images_from_visualizations/' + 'line_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
        elif chart_type == 'Bar':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.bar(indicator_data['Region'], indicator_data['Value'])
            
            indicator_data = indicator_data.sort_values(by=['Value'])
            axs[0].bar(indicator_data['Region'], indicator_data['Value']) #,  s=indicator_data['Value'] * bubble_scale, c=c )
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            
            fig.savefig('./saved_images_from_visualizations/' + 'bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
        elif chart_type == 'Hor Bar':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                #plt.barh(indicator_data['Region'], indicator_data['Value'])
                
                indicator_data = indicator_data.sort_values(by=['Value'])
                axs[0].barh(indicator_data['Region'], indicator_data['Value']) #,  s=indicator_data['Value'] * bubble_scale, c=c )
                plt.suptitle(indicator + ' Over Regions')
                axs[0].set_xlabel('Regions and Countries')
                axs[0].set_ylabel('Values')
                #axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            
                fig.savefig('./saved_images_from_visualizations/' + 'hor_bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
        elif chart_type == 'Pie':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                #plt.barh(indicator_data['Region'], indicator_data['Value'])                
                #ax.pie(indicator_data['Value'], labels=indicator_data['Region'], autopct="%1.1f%%")
                
                
                axs[0].pie(indicator_data['Value'], labels = indicator_data['Region'], ) #,  s=indicator_data['Value'] * bubble_scale, c=c )
                plt.suptitle(indicator + ' Over Regions')
                axs[0].set_xlabel('Regions and Countries')
                axs[0].set_ylabel('Values')
                axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    

                fig.savefig('./saved_images_from_visualizations/' + 'pie_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
        else:
            axs[0].scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale, c=c )
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
                                    
            
            
            # plt.show()
            # show country colors
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='o')

            count = list(m) #list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])
            
            fig.savefig('./saved_images_from_visualizations/' + 'bubble_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
            
        # now plt.show()
    
    # multiple year selected
    else:        
        #indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                          (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]


        # for country color codes
        c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
        m =  [ x  for x in indicator_data['Region'] ] 
        c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
        
        indicator_data = indicator_data.set_index(['Region'])
       
        
        x = indicator_data.groupby(['Region']).mean()
            
       
        
        #print(x.index, x['Value'])
            
            
        ### for aYear in all_years:
        # now indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == aYear) ]

        """
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                          (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]


        indicator_data = indicator_data.set_index(['Region'])
        indicator_data['mean'] = indicator_data.groupby(['Region']).mean()
        print(indicator_data)
        """

        

        if chart_type == 'Line':
            
            axs[0].plot(x.index, x['Value']) #,  s = x['Value'] * bubble_scale, c=c )
            
            # though I am repeating this block of code - this can be just placed at the end of the else block
            # just trying to save on debug time            
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')            
            axs[0].set_xticklabels(x.index, rotation=90) # can be turned off   
            
            fig.savefig('./saved_images_from_visualizations/' + 'line_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            


        elif chart_type == 'Bar':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            # now plt.bar(indicator_data['Region'], indicator_data['Value'])
            
            
            x = x.sort_values(by=['Value'])            
            axs[0].bar(x.index, x['Value']) #,  s = x['Value'] * bubble_scale, c=c )
            
            # though I am repeating this block of code - this can be just placed at the end of the else block
            # just trying to save on debug time            
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')            
            axs[0].set_xticklabels(x.index, rotation=90) # can be turned off   
            
            fig.savefig('./saved_images_from_visualizations/' + 'bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
                                    
        elif chart_type == 'Hor Bar':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.barh(indicator_data['Region'], indicator_data['Value'])
            
            x = x.sort_values(by=['Value'])            
            axs[0].barh(x.index, x['Value']) #,  s = x['Value'] * bubble_scale, c=c )
            
            # though I am repeating this block of code - this can be just placed at the end of the else block
            # just trying to save on debug time            
            plt.suptitle(indicator + ' Over Regions')
            #axs[0].set_xlabel('Regions and Countries')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Regions and Countries')            
            #axs[0].set_xticklabels(x.index, rotation=90) # can be turned off 
            fig.savefig('./saved_images_from_visualizations/' + 'hor_bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
        elif chart_type == 'Pie':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.barh(indicator_data['Region'], indicator_data['Value'])                
            # now ax.pie(indicator_data['Value'], labels=indicator_data['Region'], autopct="%1.1f%%")
            
            
            
            
            axs[0].pie(x['Value'], labels=x.index) #, c=c                 
            axs[0].set_xticklabels(x['Value'], rotation=90) # can be turned off                        
            # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
            plt.suptitle(indicator + ' Over Regions')
            #axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Countries/Regions')
            
            fig.savefig('./saved_images_from_visualizations/' + 'pie_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
            

        else:
            #axs[0].scatter(indicator_data['Region'], indicator_data['Value'],  s = indicator_data['Value'] * bubble_scale, c=c )
            axs[0].scatter(x.index, x['Value'],  s = x['Value'] * bubble_scale, c=c )
            
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            #axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            axs[0].set_xticklabels(x.index, rotation=90) # can be turned off    

            #plt.show()
            # show country colors
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='o')

            count = list(m) #list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])

            fig.savefig('./saved_images_from_visualizations/' + 'bubble_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
                
            #now f.canvas.draw()
            
    #plt.suptitle(indicator + 'Over Regions and Years')
    #plt.xlabel('Regions and Countries')
    #plt.ylabel('Values')def plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type = '', provincial_only=False, all_years=all_years):    
    #print('year, indicator', year, indicator)    
               
    if ( year > 1 ):
        
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year) ]          
        #print(indicator_data)
        magnitudes = indicator_data[['Region', 'Value']]
        
        
        # Make this plot larger.
        plt.figure(figsize=(16,12))


        eq_map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
                      lat_0=0, lon_0=-130)
        eq_map.drawcoastlines()
        eq_map.drawcountries()
        eq_map.fillcontinents(color = 'gray')
        eq_map.drawmapboundary()
        eq_map.drawmeridians(np.arange(0, 360, 30))
        eq_map.drawparallels(np.arange(-90, 90, 30))

        min_marker_size = 2.5 #* bubble_scale
        #for lon, lat, mag in zip(lons, lats, magnitudes):
        #for i in range(indicator_data.shape[0]):
        for reg, val in zip(magnitudes['Region'], magnitudes['Value']):
            #try:
            #reg = magnitudes[i:i+1]['Region']    
            #reg = magnitudes['Region'][i]
            #print(reg)
            lat = lats_dict[reg]
            lon = lons_dict[reg]   

            #print(lat, lon)
            x, y = eq_map(lon, lat)
            mag = val #magnitudes['Value'][i] #magnitudes[i:i+1]['Value']    
            #print(mag)
            msize = mag * min_marker_size/bubble_scale
            #print(msize, msize)
            #marker_string = get_marker_color(mag)
            #eq_map.plot(x, y, marker_string, markersize=msize)
            eq_map.plot(x, y,  marker='o', markersize=msize)
            
            #x, y = eq_map(0, 0)
            #eq_map.plot(x, y, marker='D',color='m')

            #plt.show()

            #except:
                #print('hello')
                #continue

        title_string = indicator
        title_string += ' for year ' + str(year)
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'geo_plot_for_a_year_' +indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        plt.show()
        
        
        #plt.suptitle(indicator + 'Over Regions and Years')
        #plt.xlabel('Regions and Countries')
        #plt.ylabel('Values')
        #plt.show()
        
    else:
        # this is to plot over multiple years
        # average taken over multiple years will be plotted
        # I could make this more user friendly and data friendly 
        # by giving users the option to select years, countries; also by informing to what extent data are available (skipping as that will extend the work much)
        
        #indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                          (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]

        #print(indicator_data)
        # for country color codes
        """
        c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
        m =  [ x  for x in indicator_data['Region'] ] 
        c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
        """
        
        indicator_data = indicator_data.set_index(['Region'])
        x = indicator_data.groupby(['Region']).mean()
        
        #print(x)
        #print(x['Value'][0])
        
        
        
        #print(indicator_data)
        # not used
        magnitudes = pd.DataFrame()
        magnitudes['Region'] = x.index
        magnitudes['Value'] = x['Value']
        
        #print(magnitudes)
        
        # Make this plot larger.
        plt.figure(figsize=(16,12))


        eq_map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
                      lat_0=0, lon_0=-130)
        eq_map.drawcoastlines()
        eq_map.drawcountries()
        eq_map.fillcontinents(color = 'gray')
        eq_map.drawmapboundary()
        eq_map.drawmeridians(np.arange(0, 360, 30))
        eq_map.drawparallels(np.arange(-90, 90, 30))

        min_marker_size = 2.5 #* bubble_scale
        #for lon, lat, mag in zip(lons, lats, magnitudes):
        #for i in range(indicator_data.shape[0]):
        #for reg, val in zip(magnitudes['Region'], magnitudes['Value']):
        for reg, val in zip(x.index, x['Value']):
            try:
                #reg = magnitudes[i:i+1]['Region']    
                #reg = magnitudes['Region'][i]
                #print(reg, val)
                lat = lats_dict[reg]
                lon = lons_dict[reg]   

                #print(lat, lon)
                x, y = eq_map(lon, lat)
                mag = val #magnitudes['Value'][i] #magnitudes[i:i+1]['Value']    
                #print(mag)
                msize = mag * min_marker_size/bubble_scale
                #print(msize, msize)
                #marker_string = get_marker_color(mag)
                #eq_map.plot(x, y, marker_string, markersize=msize)
                eq_map.plot(x, y,  marker='o', markersize=msize)

                #x, y = eq_map(0, 0)
                #eq_map.plot(x, y, marker='D',color='m')

                #plt.show()

            except:
                #print('hello')
                continue

        title_string = indicator
        title_string += ' Average for selected years '
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'geo_plot_average_over_multiple_years_' +indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        plt.show()

Heatmap to compare across indicators and countries

Options implemented:

Heatmap for one year, one indicator for a health-aspect across countries Heatmap for one year, all indicator for a health-aspect across countries

Heatmap for one year, one indicator for a health-aspect across Canadian province Heatmap for one year, all indicator for a health-aspect across Canadian province

Heatmap for all years with mean values, one indicator for a health-aspect across countries Heatmap for all years with mean values, all indicator for a health-aspect across countries

Heatmap for all years with mean values, one indicator for a health-aspect across Canadian province Heatmap for all years with mean values, all indicator for a health-aspect across Canadian province

Whether taking mean over multiple years is a pragmatic approach or not: — it can be an effective way of measurements provided data exist for all those years (otherwise the years with data will dominate) — in my case, data will not be available for all indicators for all years — we can then just plot for one year or I could give an interface to select years, indicators, countries for custom comparison — that can be a long task. Hence, I am giving the tool that can be extended in different ways

# ref : https://cmdlinetips.com/2019/01/how-to-make-heatmap-with-seaborn-in-python/


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pltdef plot_heatmap_across_indicators(year, indicator = '', ratios = [3,1], provincial_only = False, all_years = all_years, fig_size = [10, 10]):
    
    if ( year > 1 ):
        heatmap_data = measure_data.loc[  (measure_data['Data year'] == year)  & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if ( provincial_only == True ):
            heatmap_data = measure_data.loc[  (measure_data['Data year'] == year)  & (measure_data['Type of region'].isin(['Province']) )  ]
            
        if ( indicator != ''):
            heatmap_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year) & ( measure_data['Type of region'].isin(['Country', 'Canada']) ) ]
            if ( provincial_only == True ):
                heatmap_data = measure_data.loc[ (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year)  & (measure_data['Type of region'].isin(['Province']) )  ]

        #print(heatmap_data)
        indicator_data_heatmap = heatmap_data[ ['Region', 'Value', 'Indicator', 'Data year']  ]
        
        #print(indicator_data_heatmap)
        
        
        heatmap1_data = pd.pivot_table(indicator_data_heatmap, values='Value',  index=['Region'], columns='Indicator')
        plt.figure(figsize=fig_size)
        ax = sns.heatmap(heatmap1_data, cmap="YlGnBu")
        
        # https://stackoverflow.com/questions/48470251/move-tick-marks-at-the-top-of-the-seaborn-plot?noredirect=1&lq=1
        ax.xaxis.set_ticks_position('top')
        ax.set_xticklabels(indicator_data_heatmap['Indicator'], rotation=90) # can be turned off                        
        
        #plt.show()

                            
        title_string = measure_file.result[0:len(measure_file.result)-4]  + ':' + indicator
        title_string += ' for year ' + str(year)
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'heatmap_' + measure_file.result[0:len(measure_file.result)-4] + indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        #plt.show()
        
        
        
    else:
        # this is to plot over multiple years
        # average taken over multiple years will be plotted
        # I could make this more user friendly and data friendly 
        # by giving users the option to select years, countries; also by informing to what extent data are available (skipping as that will extend the work much)
        
        
        heatmap_data = measure_data.loc[   (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if ( provincial_only == True ):
            heatmap_data = measure_data.loc[ (measure_data['Type of region'].isin(['Province']) )  ]
            
        if ( indicator != '' ):
            heatmap_data = measure_data.loc[  (measure_data['Indicator'] == indicator) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
            if ( provincial_only == True ):
                heatmap_data = measure_data.loc[ (measure_data['Indicator'] == indicator)   & (measure_data['Type of region'].isin(['Province']) )  ]

        #print(heatmap_data)
        indicator_data_heatmap = heatmap_data[ ['Region', 'Value', 'Indicator', 'Data year']  ]
        
                
        indicator_data = indicator_data_heatmap.set_index(['Region'])
        
        # x is not used, mean is calculated by seaborn
        #x = indicator_data.groupby(['Region', 'Indicator']).mean()        
        #print(x.index)
        #print(x)
        
        
        #heatmap1_data = pd.pivot_table(indicator_data_heatmap, values='Value',  index=['Region'],  columns='Indicator') 
        #heatmap1_data = pd.pivot_table(x, values='Value',  index=x.index,  columns='Indicator') 
        heatmap1_data = pd.pivot_table(indicator_data, index = indicator_data.index, columns='Indicator', values='Value', aggfunc = 'mean') 
        plt.figure(figsize=fig_size)
        #ax = 
        sns.heatmap(heatmap1_data, cmap="YlGnBu")
        
        #ax.xaxis.set_ticks_position('top')
        #ax.set_xticklabels(indicator_data_heatmap['Indicator'], rotation=90) # can be turned off                        
        
        
        #plt.show()

                            
        title_string = measure_file.result[0:len(measure_file.result)-4] + ':' + indicator
        
        all_years_str = ''
        for aYear in all_years:
            if (aYear > 0):
                all_years_str += str(aYear) + ', '
            
            
        year_str = ' for year '  + str(year) if year > 0 else ' Mean over years \n' + all_years_str
        title_string += year_str
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'heatmap_over_years_' + measure_file.result[0:len(measure_file.result)-4] + indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        #plt.show()

Create the components for the UI interface

Users will be interact with the system to generate custom visualizations

START-RELOAD-DAT

this UI component creation might not be the must

# create the interactive interface
def f(indicator):
    return indicator

#print ('Measure' + measure_file.result)
#print('Select parameters\n')
#print('Select Indicator with or without Canadian data')
indicator_country = interactive(f, indicator=indicators);
#display(indicator_country)
indicator_country.result




def f(year):
    return year

#print('Select Year: 0 = all years')
year_country = interactive(f, year=all_years);
#display(year_country)
year_country.result



def f(what_to_plot):
    return what_to_plot

#print('Select what to plt:')
what_to_plot = {}
# this can come from an excel file as well
what_to_plot['dietician_care_esrd_rf_c_patient_chars_2018.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
#what_to_plot['access-to-care.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
#what_to_plot['indicator-methodology.xls'] = ['Indicator Values over years', 'Indicator Values over countries']
#what_to_plot['non-med-determinants.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
#what_to_plot['patient-safety.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
#what_to_plot['prescribing-primary.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
#what_to_plot['quality-of-care.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']


plots = interactive(f, what_to_plot=what_to_plot[measure_file.result]);
#display(plots)
plots.result


def f(chart_type):
    return chart_type

#print('Select chart type:')
chart_types = ['Bubble', 'Bar', 'Hor Bar', 'Pie', 'Line']
chart = interactive(f, chart_type=chart_types);
#display(chart)
chart.result


def f(scale):
    return scale

#print('Select bubble size')
bubble_scale_country = interactive(f, scale=(0, 100, 1));
#display(bubble_scale_country)
#bubble_scale_country.result = 10


# for heatmap, do we consider the indicator or not
def f(heatmap_consider_indicator):
    return heatmap_consider_indicator

heatmap_consider_indicator_ = interactive(f, heatmap_consider_indicator=False);
#display(heatmap_consider_indicator_)
heatmap_consider_indicator_.result

Render the UI controls

Note: use_data_with_canada — is to indicate that when canadian data are available. Also, Canada_Indicator and Year from the right will be used

com_with_cdn and animate controls are not used so far

Must as part of reload data

# Reference: https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Styling.html
print('Indicator and year list on the right, represent where Canadian data exist')
from ipywidgets import Button, GridBox, Layout, ButtonStyle
GridBox(children=[
                     
                    indicator_country, 
                    year_country, 
                    plots, chart,
                    bubble_scale_country, 
                    heatmap_consider_indicator_, 
                ],
        
        layout=Layout(
            width='100%',
            grid_template_rows='auto auto',
            grid_template_columns='50%50%',
            )
       )

END-RELOAD-DATA

# assign a default value
bubble_scale_country.result = 10

Create the Plot Based on User Selections

plt.rcParams['figure.figsize'] = [10, 8]
#print(chart.result)
#provincial_only = False
heatmap_consider_indicator = heatmap_consider_indicator_.result
# Data irrespective Canada has data or not
indicator = indicator_country.result
year = year_country.result
bubble_scale = bubble_scale_country.result

    
#print(indicator, year, bubble_scale)
if plots.result == 'Indicator Values over years':
    plot_measure_by_years(year, indicator, bubble_scale, chart_type=chart.result, ratios=[3,1], 
         fig_size=[20,10], sec_fig=False)
elif plots.result == 'Geo plot':
    plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type=chart.result, \
                                provincial_only=provincial_only_plot.result)
    
elif plots.result == 'Heatmap':
    if ( heatmap_consider_indicator == False ):
        indicator = ''
    plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot.result, fig_size=[10, 10])   
else: # Indicator Values over countries
    plot_measure_by_regions(year, indicator, bubble_scale, chart_type=chart.result, \
                            ratios=[3,1], provincial_only=provincial_only_plot.result)

Section: Research Questions and Answers

Please Select the related measure and reload all data (as marked with Start-Reload, End-Reload). Otherwise the following visualizations might not work unless that is for currently selected measures

A better solution could be: I could place the measure selection here and could execute all the data reload code

Visualizations plotted independently for the visualizations used in the detail prsentation document

All these can be generated using the UI, I am just showing specific cases as I plotted using UI and provided on my report

How does Canada compare for a health status indicator such as : Cancer Mortality (F) for 2017 (per 100k)? Example: Cancer Mortality, 2017:

year = 2017
indicator = 'Cancer Mortality (F)'

# does not matter
bubble_scale = 11

chart_type = 'Line'

# subplot ratios
ratios = [1000, 1]

# not implemented
animate = False

# if for Provinces - Canada
provincial_only = False

# figure size
fig_size = [5, 5]

# not important
sec_fig = True

print('The right line on the plot does not count; comes from the right subplot; that is not relevant for this case')
plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

Research Question: how does an indicator such as Transport Mortality changed over time for different countries?

year = 0
indicator = 'Transport Accident Mortality (M)'

# does not matter
bubble_scale = 100

chart_type = 'Bubble'

# subplot ratios
# as we will show countries by colors, ratios are useful
ratios = [3, 1]

# not implemented
animate = False

# if for Provinces - Canada
provincial_only = False

# figure size
fig_size = [20, 10]

# not important
sec_fig = True

print('Note: Y axis values need to be multiplied by Benchmark value to get actual values')
plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

Research Question: How do Canadian provinces perform for Transport Mortality (M) for 2017?

year = 2017
indicator = 'Transport Accident Mortality (M)'

# does not matter
bubble_scale = 100

# will plot for other chart types as well
chart_type = 'Line'

# subplot ratios
# as we will show countries by colors, ratios are useful
ratios = [10, 1]

# not implemented
animate = False

# Matters as we are plotting for Canadian provinces
provincial_only = True

# figure size
fig_size = [10, 8]

# not important
sec_fig = True

print('Note: Y axis values need to be multiplied by Benchmark value to get actual values')
plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

Research, Analysis, and Visualization Concern:

How does the transport mortality compare against countries on 2017 based on the data we have? Visualize in different format

chart_types = ['Bubble', 'Bar', 'Hor Bar', 'Pie', 'Line']

year = 2017
indicator = 'Transport Accident Mortality (M)'

# does not matter
bubble_scale = 32

# will plot for other chart types as well
chart_type = 'Line'

# subplot ratios
# as we will show countries by colors, ratios are useful
ratios = [10, 1]

# not implemented
animate = False

# Matters as we are plotting for Canadian provinces
provincial_only = False

# figure size
fig_size = [5, 5]

# not important
sec_fig = True


for chart_type in chart_types:
    plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

How does the transport mortality compare against Canadian provinces on 2017 based on the data we have? Visualize in different format

Note: the code from the above cell need to be executed first as I am reusing some variables

# Matters as we are plotting for Canadian provinces
provincial_only = True

# figure size
fig_size = [10, 8]

# not important
sec_fig = True

ratios = [3, 1]

for chart_type in chart_types:
    plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

For Research Question: What are average alcohol consumption across countries over last couple of years

# you can add or remove years, to get average measures over those years
year = 0 # 0 indicates all years for the list all_years
all_years = [2013, 2014, 2015, 2016, 2017] 
indicator = 'Alcohol Consumption: Adults'
bubble_scale = 91 # NA
chart_type='Bar' # change to Line, Bubble, Pie, 'Hor Bar'
provincial_only = False # if you set true only provincial data will be plotted

# plot_measure_by_regions(year, indicator, bubble_scale, chart_type = '', ratios=[3,1], provincial_only=False, all_years=all_years):    
plot_measure_by_regions(year, indicator, bubble_scale, chart_type, ratios=[3,1], provincial_only=provincial_only, all_years=all_years)

the above case just for year 2015

#year = 0 # 0 indicates all years for the list all_years
#all_years = [2013, 2014, 2015, 2016, 2017] 
indicator = 'Alcohol Consumption: Adults'
#bubble_scale = 91 # NA
#chart_type='Bar' # change to Line, Bubble, Pie, 'Hor Bar'
provincial_only = False # if you set true only provincial data will be plotted


print('if you want for only one year change as follows')
print('For Research Question: What are average alcohol consumption across countries for 2015')
# if you want for only one year change as follows
year = 2015
# other chart type sych as bar will work though will have issues
chart_type='Bar' 
#plot_measure_by_regions(year, indicator, bubble_scale, chart_type,  ratios=[3,1], provincial_only)      
plot_measure_by_regions(year, indicator, bubble_scale, chart_type, ratios=[3,1], provincial_only=provincial_only, all_years=all_years)

Research Question: For 2015, which country smoked the most? Used Geo plot. However, plots like above sections could also be used

chart_typeyear = 2015
indicator = 'Smoking: Adults (M)'

# the plot on the presentation used 2
# note : this is inverse size of the bubble
bubble_scale = 3
chart_type = ''
# are not relevant
# chart_type = '',  all_years=all_years, also provincial_only=False

plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type = '', provincial_only=False, all_years=all_years)

Research Question: Where in the world obesity are more common?

year = 0
indicator = 'Obesity Reported: Adults'
# the plot on the presentation used 2
# note : this is inverse size of the bubble
bubble_scale = 2
chart_type = ''
# are not relevant
# chart_type = '',  all_years=all_years, also provincial_only=False

plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type = '', provincial_only=False, all_years=all_years)

Research Question: Using a Heatmap, How different countries compare for their Non Medical Determinants aspect for 2014

Note: current selection needs to be: Non Medical Determinants.

You can change the Health Measure and then all data have to be reloaded by executing the sections marked: START-Reload, END-Reload

indicator = ''
year = 2014
indicator = ''
provincial_only_plot = False
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot, fig_size=[10, 10])

Research question: How do different countries compare for health status indicators over years

Note: current selection needs to be: Health status with code from START-Reload-data to END-RELOAD-Data need to be executed)

indicator = ''
year = 0
indicator = ''
provincial_only_plot = False
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot, fig_size=[10, 10])

Research question: How do Canadian Provinces compare for health status indicators over years

Note: current selection needs to be: Health status (otherwise, plots will be using current health aspect/measure that I may or may not have tested)

indicator = ''
year = 0
indicator = ''
provincial_only_plot = True
fig_size_=[7, 5]
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot, fig_size=fig_size_)year, indicator

Research Question: How does transport mortality (Female) compare acrosss countries for 2017

This is just an example to show heatmap plot when ‘heatmap_consider_indicator’ option is selected and an indicator is selected

year = 2017
indicator = 'Transport Accident Mortality (F)'
fig_size_=[3, 5]
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot.result, fig_size=fig_size_)

Access to Care

Please select Access to Care Measure, and reload all data

Wait time for specialists in Days

year = 0
indicator = 'Wait Time: Specialist'
bubble_scale = 10
chart_type = 'Bubble'
provincial_only=False
ratios=[3,1]

plot_measure_by_regions(year, indicator, bubble_scale, chart_type=chart_type, ratios=ratios, provincial_only=False)

Access to Care: same or next day appointment

year, indicator, bubble_scaleyear = 0
indicator = 'Same or Next Day Appt'
bubble_scale = 10
chart_type = 'Bubble'
provincial_only=True
ratios=[3,1]

plot_measure_by_regions(year, indicator, bubble_scale, chart_type=chart_type, ratios=ratios, provincial_only=True)

Heatmap: Access to Care Indicators

year,indicatoryear = 0
indicator = ''
plot_heatmap_across_indicators(year, indicator, provincial_only=False, fig_size=[10, 10])

The following code are supposed to be removed at the final step

Code reused from lab 06 the geoplot

References:

Geopy : For Geo Plot: https://pypi.org/project/geopy/

Read excel sheets: https://stackoverflow.com/questions/17977540/pandas-looking-up-the-list-of-sheets-in-an-excel-file