‘R’ You Kidding Me?

Sanjay Unni
Data Science Library
7 min readFeb 27, 2019

--

Some Interesting Projects That Prove Data Science Can Be Fun!

Programming may be dull and boring for some, but you don’t need to be part of a professional team working on a cutting-edge project to make something interesting. A few data scientists on the Internet have gone out of their way to prove that, making entertaining programs for everyone to enjoy. Listed below are a few different projects that can inspire you to show off your own unique ideas and carve your own path in the world of data science!

Did You Know 43% Of All Statistics Are Made Up On The Spot?

You’d be surprised how much influence big data has on comedy. But it should be pretty straightforward: either you know your audience, or you get heckled off the stage. That is to say, if you don’t know what jokes to tell, you won’t go far. Professional comedians don’t just have this perfected down to an art, they’ve got it down to a science (more like a data science, am I right?). For example, comedian Aziz Ansari held a series of small-scale test sessions to try out new jokes on different demographics (like single people over 30 or newly-married couples or college students) and tweak them so audiences in his actual shows would respond better. Essentially, he took the idea of A/B testing and applied it to his own material.

Source: http://nathanieldphillips.com/wp-content/uploads/2016/10/threeivpp-1.png

This isn’t the first time data science has popped up in comedy, either. Oregon State University professor Heather Knight has created a small robot named Data that relies on massive amounts of data (pun intended) to tell stand-up comedy jokes. It takes in the audience reactions, and tweaks its next act accordingly. Some of them are pretty good too, although Knight’s plan is to work on making robots more intelligent and charismatic so that they stand out that much more.

If you’re so inclined to add a bit of humor to your data science portfolio, there’s a few resources available to you. The Python scripts many have used to scrape the Internet for jokes can be found here, in a GitHub repo made by user amoudgl. Whether you decide to use them or not, there’s still other pieces of information you could use. The Kaggle dataset ‘Short Jokes’ has around 230,000 jokes to use and this GitHub dataset made by user taivop has around 210,000 jokes in it. Either one would make for great data to analyze, not to mention all the visualizations you could create out of specific keywords or phrases!

Correlation Doesn’t Always Mean Causation, But It Should.

Data scientist Tyler Vigen created a program that searches through huge amounts of data to create hilarious graphs connecting completely unrelated sets of information together.

Are divorce rates in Maine really tied to the per capita consumption of margarine? Is the per capita consumption of chicken really connected to total US crude oil imports? Probably not, but the data does speak for itself here. You can’t argue with science.

A few of these charts can be found on his personal website, located here. His book Spurious Correlations, which contains much more than just what’s listed on his website, can be bought on Amazon right here. It’s definitely worth a read!

I’ve Been Listening To This One Indie Band, But You’ve Probably Never Heard Of Them…

The modern music industry has its roots entirely in data science, because big data from streaming platform fuels all of their decisions. Even back in 2013, both Spotify and Shazam accurately guessed several Grammy winners off of their streaming data alone. Technology has only progressed from there, and with new online resources anyone can make their own visualizations and analysis using massive amounts of music data.

Source: https://spotify.me/en

Spotify has their own stylish statistics page you can use, with additional options to check out their research on streaming patterns and branding your own music. Since it’s Spotify, you’ll obviously need your own Spotify account for it to work. Just for reference, here’s a piece of the website taken from my own data. But there’s much more than just this!

For those without a Spotify account, don’t worry! There’s plenty of other visualizations that you can look at. Here’s just a few:

  • UX designer Ashrith Shetty created a series of radar charts that takes in a Kaggle dataset for the top Spotify tracks of 2017 and displays their different attributes for you to look at. Compare songs like “Shape of You” with “Bad and Boujee,” and try to form your own conclusions on what the data says before peeking at what it really means (which can be found here, in Shetty’s article about the project). The charts can be found here!
  • RCharlie, a programmer with a talent for coding in R, used data from Spotify as well as different packages in R to create a variety of custom scatterplots. Check it out here! Pick an artist and display their entire discography, with each song labeled by album and placed into 1 of 4 sections (“angry”, “joyful”, “peaceful”, “depressing”). The same can be done with playlists too, instead of a specific artist.
  • For any rock fans out there, a few writers at the Silicon Valley Data Science blog came up with a unique visualization that ties together 100 notable rock songs by the influence the bands that made the songs had on each other. See how many artists Led Zepplin and AC/DC inspired, or check out what surprising connections you may find (like with Coldplay, interestingly enough). The visualization is located here!
Source: https://blog.prototypr.io/have-you-heard-about-the-spotify-web-api-8e8d1dac9eaf

In case you’re interested in making your own projects with this data, there’s plenty of resources available to you! Spotify has its own Spotify API that is useful for any major data science project, since you don’t have to have a Spotify account to use it. Additionally, you could use the R package spotifyr to check out even more data, like what The Beatles’s favorite musical key was. Combined with the R package shiny, you can create beautiful graphs and finely-detailed charts that can rival the type of infographics that big streaming companies themselves would develop!

AIs Are Just A Ton Of Customized ‘If’ Statements, Change My Mind.

Source: https://old.reddit.com/r/ProgrammerHumor/comments/9u449e/machine_learning_drugs/

Neural networks, machine learning, and artificial intelligence are all tossed around a lot these days, mainly because of how sophisticated these programming concepts have become. But for every Fortune 500 company that uses an AI to predict and analyze its customers’ needs, there’s someone on the Internet using their own AI for increasingly pointless yet somehow necessary things.

One example would be the several different simple AIs created by scientist Janelle Shane. Using IBM’s “IBM Debater”, a machine learning algorithm designed to argue with humans, Shane trained a neural net to generate topics written by robots in the post-human era (as in, after they’ve conquered everybody).They’re very interesting to look at, mainly because issues like “allowing parents to develop nuclear weapons” are both hilarious and horrifying to think about.

Source: http://aiweirdness.com/post/182111367827/what-the-machines-will-debate-when-the-humans-are

Shane has also used the same neural network framework in other projects, like generating pick-up lines and names for guinea pigs. Personally, I find the guinea pig name creator to be the more interesting one, because you can almost exactly identify the point when the AI just kind of gave up (hint: it’s somewhere between the name “Me” and “Boooy”).

In Conclusion, Anything Can Be Data Science If You Try Hard Enough.

If there’s anything to take away from this article, it’s that the difficulty level for data science projects isn’t as high as you’d think. Given a basic understanding of R, a few APIs open to the public, and enough time, and anyone can make something equal in quality to some of the projects mentioned here.

Don’t let any preconceived notions of handling and manipulating data get in the way! Check out some of the articles that we here at the Data Science Library have to offer you, like a comprehensive list of beginner data science courses or a guide to learning how to code in R. Or try to jump in headfirst, learn by doing and not by watching others. There’s limitless opportunities out there, all you have to do is go find them!

Thank you for reading. If you found this article useful, please give it a clap! Add me on LinkedIn here!

--

--