Crash Course in Data: Data Humanism

Published in

AI Skunks

11 min readMar 27, 2023

Connecting people and data

Data humanism is a philosophy that centers around data processing and visualization, prioritizing human needs and values. Rather than relying on traditional infographics, this approach emphasizes a more individualized and nuanced perspective on data visualization, highlighting the importance of context and personal experience. A prominent figure in this field is Giorgia Lupi, author of the data humanism manifesto, which provides a framework for researchers and designers to approach data in a more ethical and human-centered manner.

Humanism And Dataism

Humanism originated from an ideological tendency, which tends to care about human personality, emphasizes the maintenance of human dignity, advocates a tolerant secular culture, opposes violence and discrimination, pursues freedom, equality and self-worth, and eventually develops into a philosophy and a theoretical system of worldview. From the above description, we can also know that Humanism has an important feature. It holds a view that each person is valuable in himself or herself.

For a long time, humanism was considered advanced and universal. But With the development of computer science and biology, another theory was born — dataism. Dataism believes that the universe is made up of data streams, and the value of any phenomenon or entity lies in its contribution to data processing. It is the only brand-new value created by humans after the creation of humanism. For dataism, the highest value is information streams. Therefore, freedom of information is the highest good. Dataism equates human experience with data patterns, undermines the human authority and source of meaning.

Data Humanism

Humanism and Dataism can be said to have many opposites, especially in recognizing personal value. But recently, a new idea was born in the exploration of data-related technologies and means. That’s what I’m going to talk about next, Data Humanism. Data humanism believes data is very important, but humans should not be abandoned. We should find a way to make data and humans closely connected. And this method is data visualization. People can better understand your purpose by transforming data into various visual images and feel the connection between this content and their life. The result of data visualization is a combination of art and data science. It is scientific and also takes into account the author’s self-expression. It can be said to be a good combination of humanism and data. And also, Data humanism is a philosophy and approach to data analysis and decision-making that emphasizes the importance of understanding the human context and impact of data. It stresses the need to consider the ethical and societal implications of data usage and ensure that data is used in a way that benefits individuals and society. Data humanism also emphasizes the importance of transparency and accountability in data usage and encourages the active participation of individuals and communities in the data decision-making process.

Data Humanism and Data Visualization

Most of the previous explanations of data humanism in this article stay in terms of history and central ideas. These are abstract interpretations, but for data science, data humanism is a concrete way of working. In fact, in my mind, data humanism is actually a kind of advanced data visualization which adds more humanistic considerations and artistic pursuits than traditional data visualization. So obviously, in order to understand data humanism, we need to understand data visualization first. Next, I will use some pictures and two examples to help you understand data visualization.

Data Visualization and examples

Consider the following matrix, and you need to find all the numbers 9 from this matrix.

This is not a difficult task, but it still takes a lot of time. So what if the matrix becomes like this

I marked all the nines with green squares so that you can complete the task in a second.

This is the power of vision, in this case, the matrix is equivalent to the raw data we got, and the green square marks are an easy way to visualize the key points. Of course, in general, we will not use this simple method but use various charts for data visualization.

Basic Chart

First, let’s take a look at the first data set we will use — — Covid Deaths and Cases Worldwide

import pandas as pd   
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import regex as re

df=pd.read_csv('../content/sample_data/covid_worldwide.csv', index_col=0)
df.describe()

The data in this dataset is string, this is not conducive to our analysis, so we convert all data to float type.

df['Total Deaths'] = df['Total Deaths'].astype(str).str.replace(',', '').astype(float)
df['Total Cases'] = df['Total Cases'].astype(str).str.replace(',', '').astype(float)
df['Total Recovered'] = df['Total Recovered'].astype(str).str.replace(',', '').astype(float)
df['Active Cases'] = df['Active Cases'].astype(str).str.replace(',', '').astype(float)
df['Total Test'] = df['Total Test'].astype(str).str.replace(',', '').astype(float)
df['Population'] = df['Population'].astype(str).str.replace(',', '').astype(float)
df.head(10)

Here, if we want to explore the number of deaths in the top ten death countries, we can use the histogram. Of course, before drawing, we also need to sort the required data.

topDeath = df.sort_values(by="Total Deaths",ascending=False)
topDeath = topDeath.head(10)
topDeath

plt.bar([x for x in range(10)], topDeath['Total Deaths'], label="Deaths", width=0.4)
plt.xticks([x for x in range(10)], topDeath['Country'], rotation=330)
plt.xlabel("Country")
plt.ylabel("Numbers")
plt.title("Top 10 Deaths Country")
plt.legend()
plt.show()

This is the simplest histogram, we can intuitively see the number of deaths in each country.

But sometimes, we want to know more. For example, we want to explore the relationship between the number of tests and the number of cases in the countries with the top ten highest numbers of deaths.

In this case, we can use a two-column histogram.

totalWidth=0.8 
labelNums=2 
barWidth=totalWidth/labelNums 
seriesNums=10

plt.bar([x for x in range(seriesNums)], topDeath['Total Test'], label="Test", width=barWidth)
plt.bar([x+barWidth for x in range(seriesNums)], topDeath['Total Cases'], label="Cases", width=barWidth)plt.xticks([x+barWidth/2*(labelNums-1) for x in range(seriesNums)], topDeath['Country'], rotation=330)
plt.xlabel("Country")
plt.ylabel("Numbers")
plt.title("Tests and Cases of Top 10 Deaths Country")
plt.legend()
plt.show()

In the figure above, we compare the relationship between the number of tests and the number of cases in the countries with the top ten highest number of deaths.

In the similar way, we can also draw a histogram with three or more columns.

We also have another way to represent the same situation, which is to use a stacked-histogram

plt.bar([x for x in range(10)], topDeath['Total Test'], label="Test", width=0.4)
plt.bar([x for x in range(10)], topDeath['Total Cases'], label="Cases", width=0.4, bottom=topDeath['Total Test'])

plt.xticks([x for x in range(10)], topDeath['Country'], rotation=330)
plt.xlabel("Country")
plt.ylabel("Numbers")
plt.title("Tests and Cases of Top 10 Deaths Country")
plt.legend()
plt.show()

After looking at so many histograms, we can find that histograms have their limitations. The histogram does not have a clear overall concept. Although we can compare different parts of data, but if you want to know the proportion of each different part, you cannot draw conclusions intuitively.

And in this case you can use pie chart，let’s take the sum of the deaths of the top ten deaths countries as 1 to see what the proportion of each country is.

plt.title('Deaths proportion of top ten deaths country')
plt.pie(topDeath['Total Deaths'],autopct='%1.1f%%', labels=topDeath['Country'])
plt.show()

In this example, we explore several basic ways of data visualization, using various methods, that can be used to compare the relationship between one or more aspects of different objects.

Next, let’s move on to another example. We’re going to illustrate how to use a graph to show the changing trend of a certain property of an object.

Dataset 2 — Rainfall in India

df2=pd.read_csv('/content/sample_data/district wise rainfall normal.csv', index_col=0)
df2

In this data set, we know the monthly rainfall for each region of India, and these data have a clear time order.

If we want to know the rainfall in the NICOBAR area during the year, we can use a scatterplot

nico=df2[['DISTRICT', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC']].head(1)
plt.scatter(['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC'], nico[['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC']])
plt.xlabel('month')
plt.ylabel('rainfall')
plt.title('Monthly Rainfall in NICOBAR')
plt.show()

In the picture above, we can indeed vaguely find out the trend of rainfall changes, but it is not really intuitive. Scatter plots are more used in classification. In order to clearly know the trend of rainfall over time, we should use line chart.

plt.plot(['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC'], [nico[x] for x in ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC']],alpha=0.8, label='NICOBAR')
plt.xlabel('month')
plt.ylabel('rainfall')
plt.title('Monthly Rainfall in NICOBAR')
plt.legend()
plt.show()

In this way, we can clearly see the trend of rainfall over time. Of course, we can also add data from more regions to compare the trends of them.

sa=df2[['DISTRICT', 'JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC']][1:2]
plt.plot(['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC'], [nico[x] for x in ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC']],alpha=0.8, label='NICOBAR')
plt.plot(['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC'], [sa[x] for x in ['JAN', 'FEB', 'MAR', 'APR', 'MAY', 'JUN', 'JUL' ,'AUG', 'SEP', 'OCT', 'NOV', 'DEC']],alpha=0.8, label='SOUTH ANDAMAN')
plt.xlabel('month')
plt.ylabel('rainfall')
plt.title('Monthly Rainfall in NICOBAR and SOUTH ANDAMAN')
plt.legend()
plt.show()

Apparently the rainy season in SOUTH ANDAMAN is more violent

Through the above two examples, we have learned a lot of basic charts for data visualization. After years of development, in fact, the current data visualization have more intuitive and diverse forms.

Advanced Data Visualizaion

First of all, of course, through continuous development, we have more types of charts, such as mekko chart, radar chart, circular rose diagram and so on.

BI(Business Intelligence) System

In addition to the increase in the types of these charts, the biggest progress in the data visualization is the birth of Business Intelligence system.

It refers to an operation interaction system of specific advanced analysis technology, which can allow business personnel to participate more in data processing and chart production by accessing enterprise data, thereby releasing the development pressure of technical personnel and increasing the participation of other personnel Accuracy and control over data, making analysis work more agile and efficient.

Compared with various previous data visualization methods, the biggest change in BI lies in the enhancement of interactivity. The data is made into a UI interface. Users can dynamically and interactively display high-dimensional data through HTML5, JS and other technologies. And even if users have no technical background, they can also select and drill down on the data they are interested in.

In the End, Data Humanism

In the latest data visualization attempts, data has become a variety of interactive, dynamic and concise elements.

data humanism wants to go a step further, to add some elements of humanism — artistry.

Each week, people represented their activities in a unique and creative way. They decided to represent each week with a bouquet of flowers, one for each activity. This person assigned a different activity to each flower, including fitness, family time, food, and more. To track their progress, this person came up with a clever system: the more they engaged in an activity, the more petals they earned for the corresponding flower. Their goal was to win up to 5 petals for each flower by completing a week’s worth of goals. This method allowed the person to visually see their progress and identify areas for improvement. At the end of each week, they had a beautiful bouquet of flowers representing their activities and accomplishments.

It is a vision that wants their bouquet to evolve as they grow as individuals. They envisioned that the composition of the flowers would change as they built stronger habits and discovered new interests.
During the second week of the experiment, the first digital family gathering took place. They found this activity to be very fulfilling and decided to make it a weekly habit. To represent this new activity, this person added a new flower, purple salvia, to their bouquet and began to track its development.
This person is excited about the possibility of their bouquet growing and developing in this way and is looking forward to seeing the changes in their personal development.

Conclusion

We must acknowledge that while data humanism brings about an optimistic approach to integrating ethical and human-centered principles into data-driven technologies, we must scrutinize its potential limitations and flaws. The difficulty in executing data humanism principles lies in the intricacy of data, the competing interests of stakeholders, and rapid technological advances. Those who are benefiting from the present data-driven processes may resist the implementation of these principles, creating obstacles to widespread adoption. To tackle these challenges, continued research, dialogue, and interdisciplinary collaboration are imperative to find a delicate balance between the idealistic vision of data humanism and the reality of our dependence on data. It is vital that data humanism does not become an overused buzzword but indeed becomes a practical framework to develop a fair and ethical digital society.

Reference:

[1] Qlik-oss. (n.d.). Sn-mekko-chart — 梅科图. GitHub. https://github.com/qlik-oss/sn-mekko-chart

[2] Zhihu. (n.d.). 什么是梅克图？. Retrieved February 23, 2023, from https://www.zhihu.com/question/52240981

[3] Kaggle. (n.d.). Rainfall in India. Retrieved February 23, 2023, from https://www.kaggle.com/datasets/rajanand/rainfall-in-india?select=district+wise+rainfall+normal.csv

[4] Kaggle. (n.d.). Covid deaths and cases worldwide. Retrieved February 23, 2023, from https://www.kaggle.com/code/finnheaslop/beginner-covid-deaths-and-cases-work-in-prog/notebook

[5] Zhihu. (n.d.). 饼图与其他图形. Retrieved February 23, 2023, from https://zhuanlan.zhihu.com/p/345262150

[6] Zhihu. (n.d.). 什么是数据可视化？. Retrieved February 23, 2023, from https://zhuanlan.zhihu.com/p/162338503

[7] Khowala, D. (2018, June 7). Data Humanism: Visualizing data to connect people with numbers. Devika Khowala. https://devikakhowala.com/data-humanism

[8] Zhihu. (n.d.). 数据可视化. Retrieved February 23, 2023, from https://zhuanlan.zhihu.com/p/439354353

[9] Winkler, L. (2019, July 15). Data Visualization for Humans: How I Turned My Data Into Watercolour Art. Towards Data Science. https://towardsdatascience.com/data-visualization-for-humans-how-i-turned-my-data-into-watercolour-art-651d6acb16a3