There are tons of ways to find new music on Spotify, which is just one reason why the Swedish company is leading the pack in music streaming user experience. The most exciting of these options use machine learning algorithms, implementing recommendation systems and collaborative filtering. I have found that they are missing some low hanging fruit: recommending songs based on samples used. This is part one of a series on achieving this goal, starting with locating all the samples within a playlist and delivering them to you instantly in the form of a playlist.

Image for post
Image for post
Crate-digging is a huge part of creating and appreciating original hip hop instrumentals

I love hip hop and electronic sampled music, and I always want to know where my favorite snippet of a song originated. I often find that the sample is well-worth listening to on its own, whether it comes from jazz, soul, classical, or some experimental sub-genre. Listening to samples is a way to better appreciate your favorite producers, and better understand their stylistic influences. It also allows you to appreciate the artist’s technique in distorting and chopping up audio. And finally, there are few feelings as satisfying as hearing a new popular song and recognizing where different parts of the instrumental have originated! …


Twitter is a huge platform, with over 300 million monthly active users. And as tweets contain a limited number of characters, they are ideal for performing sentiment analysis and gauging what people are saying about a company or topic of interest. In this post I will outline the steps to set up a relational SQL database using an ORM, and show two methods for pulling and storing tweets from the Twitter API. The use cases for this type of task are virtually endless, so lets consider that we have an unhealthy (in more ways than one) obsession with PIZZA.

Image for post
Image for post
“pepperoni pizza” by Vita Marija Murenaite on Unsplash

First of all, we will need to get access to the Twitter API. At this link you can register an application and start collecting tweets. The API has many different endpoints, and we will using two of the free options. The standard search API will allow us to collect tweets from the last 7 days, which is unfortunately the limit for historical tweets without paying for premium access, but will allow us to do some exploratory data analysis. The realtime streaming API allows us to screen live tweets and collect any that mention the query term of interest. We will write a ‘listening’ script which will gather all pizza-related tweets, in order to satisfy our ungodly cheesy desires. Before we build our database, lets take a look at the data we can get from Twitter. …


Monte Carlo (MC) methods are a subset of computational algorithms that use the process of repeated random sampling to make numerical estimations of unknown parameters. They allow for the modeling of complex situations where many random variables are involved, and assessing the impact of risk. The uses of MC are incredibly wide-ranging, and have led to a number of groundbreaking discoveries in the fields of physics, game theory, and finance. There are a broad spectrum of Monte Carlo methods, but they all share the commonality that they rely on random number generation to solve deterministic problems. …


I recently had the chance to use machine learning to address an issue that is at the forefront of the American media, the difficulty of recognizing fake news. Specifically, my classmate David Masse and I applied two ML approaches to identify deliberately misleading news articles: logistic regression and a naïve Bayes classifier. Using a kaggle dataset of 20,000 labeled articles, we achieved an accuracy of 93% when predicting labels for a test set. It was a great opportunity to practice natural language processing, as well as some of the effective techniques for building a powerful classification model.

Image for post
Image for post

Natural Language Processing is the field of computer science devoted to processing and analyzing any form of natural human language (written, spoken or otherwise). Put simply, computers understand zeros and ones, while humans use a wide range of language to communicate. NLP aims to bridge the gap between these two worlds, so that data scientists and machine learning engineers can analyze large quantities of human communication data. …


Image for post
Image for post
The Higgs boson decays into two jets of hadrons and two electrons in this simulation (CMS, CERN)

In the summer of 2012, ATLAS and CMS collaborations at CERN’s Large Hadron Collider announced the confirmation of the Higgs Boson at 126 GeV. Instantly recognized by many as the greatest scientific achievement of the 21st century so far, this immediately preceded the 2013 Nobel Prize in physics being awarded to Peter Higgs and François Englert. When describing the significance associated with this discovery, physicists often use the term “5 sigma”. But what does this mean? And how can we be so sure about the existence of a particle with a mean lifetime of 1.6×10^(-22) seconds? …


The New York Times offers a few APIs at http://developer.nytimes.com, which are great for beginner programmers to start playing around with for data-driven projects. I recently used the article search API to query articles discussing to the two most recent US presidents, to see what differences, if any, there were in coverage. This was a great opportunity for me to practice data exploration using an API and pandas, which I will describe in this post.

What is an API? An API is an Application Programming Interface, which allows us to instantly get data from a database hosted on a server. It is one of the best ways for a data scientist to get well-organized data. Accessing one is as easy as signing up for a key. Once you have one, you can begin making queries. First looking at the NYTimes API, I took about fifteen minutes just playing around with the console that is provided on the api…

About

Christopher Pease

Exploring the world through the lens of data science. Former physics researcher with a passion for machine learning and statistics.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store