“Bullshit” and Exploratory Data Analysis

It’s time to use Exploratory Data Analysis to avoid “bullshit” in a dataset.

Kexin Zhai
Spring 2019 — Information Expositions
6 min readJan 28, 2019

--

About “Bullshit”

Our lives are filled with so much “bullshit”. “Bullshit” exists in the interaction between people, in reality, exists in the network. “Bullshit” can be found everywhere. According to On Bullshit by Harry Frankfurt, “The expression bullshit is often employed quite loosely — simply as a generic term of abuse, with no very specific literal meaning.[…]Bullshit is disconnected from a concern with the truth.” However, bullshit is not the same as misinformation.

From On Bullshit by Harry Frankfurt:

“Deceptive misrepresentation: This may sound pleonastic. No doubt what Black has in mind is that humbug is necessarily designed or intended to deceive, that its misrepresentation is not merely inadvertent. In other words, it is deliberate misrepresentation. […]A person may be lying even if the statement he makes is true, as long as he himself believes that the statement is false and intends by making it to deceive.”

Although bullshit is not about truth, it is not necessarily false which is a fundamental aspect of the nature of bullshitIt. This points to that it is easier to get away with bullshit than with lies. It is true that people are more tolerant of bullshit than lies, perhaps because we are more likely to get hurt from a lie. Then, what about bluffing?

From On Bullshit by Harry Frankfurt:

“Bullshit is just this lack of connection to a concern with truth — this indifference to how things really are. […]It does seem that bullshitting involves a kind of bluff. It is closer to bluffing, surely than to telling a lie. Lying and bluffing are both modes of misrepresentation or deception. Now the concept most central to the distinctive nature of a lie is that of falsity: the liar is essentially someone who deliberately promulgates a falsehood. Bluffing too is typically devoted to conveying something false. Unlike plain lying, however, it is more especially a matter not of falsity but of fakery. This is what accounts for its nearness to bullshit.”

Lying requires a certain level of skill, and the liar must submit to the objective constraints imposed by what he considers to be the truth. A liar is inevitably concerned with true value. To make up a lie, he must think he knows what is true. In order to fabricate an effective lie, he must design his lie under the guidance of truth. For the bullshitter, He does not care if what he says accurately describes reality. He just picks them out, or makes them up, for his purposes.

Application of Exploratory Data Analysis on “Bullshit”

Exploratory data analysis contributes to limiting bullshit. For this project, I found a Google Play Store Apps dataset on Kaggle and applied the steps from Exploratory Data Analysis Checklist on this dataset by using Jupyter notebook. My question to this dataset is “how do ratings differ between categories?”.

From Professor Brian Keegan’s slide
First 6 rows of Google Play Store Apps dataset
Last 5 rows of Google Play Store Apps dataset

For step 5, checking the “n”s, I used .value_counts() on “Type” column and “Content Rating” column, and I found out there is a “0” value nither belongs to “Free” nor “Paid”. The row having “0” value on “Type” also has unknown value on “Content Rating” column. Since originally there were 10840 rows in this dataset, I deleted the row that has incomplete information. Now, this dataset has 13 columns and 10839 rows.

Counts of Content ratings for apps and games

This is what “Content Rating” column looks like after deleting the incomplete row.

However, I found out another problem after using .count() on the whole dataFrame.

Counts for each column

We can see the “Rating” column only has 9366 rows having a value which is far away from 10839 rows. So I could not directly delete the rows with a blank value.

The incomplete rows described above will cause data bias in this project.

Next, I analyzed which category has the most apps, and the result shows that in this dataset most of the apps are the family apps and secondly, game.

The number of apps on each category
Average rating of “Family” apps and the number of reviews of “Family” apps

From the plot below, we can see “Events” apps have the highest average score. The ratings of “Personalization”, “Sports”, “Auto and Vehicles”, and “Dating” apps span a relatively large range.

My future work to this dataset is to find out a solution to deal with blank values. If the dataset we find only has a few blank values, then it is applicable to delete them without having a huge influence on the analysis result. There are 1473 out of 10839 “N/A” values in my dataset, and I could not directly delete them.

A “Bullshit” example in the wild

Browsing on the internet, we can watch lots of videos shared on various social platforms. There is a type of video that can easily arise people’s interest — life hacks. While some life hacks videos are actually useful, the other life hacks videos are just “bullshit”. Troom Troom is a YouTube channel teaching people how to “DIY”. 11 million users subscribe to this channel. However, does this show this what Troom Troom teaches are useful?

Troom Troom YouTube Channel

Another YouTuber Danny Gonzalez made a video called Trying Troom Troom’s Awful Crafts 2 to “analyze” Troom Troom’s video. In this video, Danny tried to recreate Troom Troom’s school related crafts. One of the crafts from 10 Weird Ways To Sneak Gadgets Into Class / School Pranks And Life Hacks is going to teach you how to sneak music into class. So you can listen to music without your teacher knowing.

Here are the steps to make a musical pencil sharpener.

“Take apart a pencil sharpener of a suitable size. Cover it with a decorative tape.”

Screenshot of 10 Weird Ways To Sneak Gadgets Into Class / School Pranks And Life Hacks

“Put an MP3 player inside, plug headphones into the pencil sharpener and sneakily turn on some hype music.”

Screenshot of 10 Weird Ways To Sneak Gadgets Into Class / School Pranks And Life Hacks
Screenshot of 10 Weird Ways To Sneak Gadgets Into Class / School Pranks And Life Hacks

“This doesn’t fix the problem with listening to music in class! The problem is that you’ve got wires running directly to your ears!”, Danny Gonzalez said, “Okay, be honest. Can you tell that I’m listening to music right now?”

“Oh what this” No, I’m not listening to music! I’m just listening…to a…pencil sharpener.” Danny Gonzalez jokingly said.

Moreover, in Troom Troom’s video, they said “The teacher doesn’t notice anything. […]Next thing you know a dance battle will start.”

Danny Gonzalez commented: “If you were my teacher and you saw me, just at my desk like this, with my headphones plugged into a pencil sharpener, I mean, either you would assume I was listening to music or you would just be really worried about me and ask what was wrong, right?”

At last, for the “analysis” of Troom Troom’s video, Danny Gonzalez concluded, “Overall, I got to say this is pretty inventive and a great way to trick your blind teacher!”

--

--