Image for post
Image for post

The web app uses parsed headlines from the most highly rated Florida Man subreddit posts of all time.

For almost a decade, “Florida Man” has been a mainstay antihero of internet culture. Headlines like “Florida man too fat for jail” and “Florida man steals dinosaur bones” are easy fodder for meme-ification. In early 2013, “Florida Man” was canonized on Twitter with @_FloridaMan and on Reddit with the r/FloridaMan subreddit. And after seven years of retweeting and upvoting, we can gather the most popular headlines to see what makes a “Florida Man” headline successful.

Below is a quick overview of the numbers behind these headlines and here is a link to a web app where you can explore the articles on your own. …


Image for post
Image for post

How a simple three-dimensional structure reduces error, outcompetes more complex models, and doubles savings.

In early 2019, we built a deep learning model that predicted electric consumption on an hour-by-hour basis. Because the smallest error can cost an electric utility tens of thousands of dollars, we explored a number of more complex forecasters. In the end we discovered that a simple day-long approach is the most effective, often cutting error in half.

THE STRUCTURE


Image for post
Image for post

The Weakest Link, a British TV quiz show, ran its last episode in 2017. Did players miss their chance to make the most money possible?

The Weakest Link has a group of people answer questions in a circle. For each correct answer, the team goes up the totem pole of values they can “bank.” After one correct answer they can bank £250, after two correct answers they can bank £500, and so on. Once a player gets a question wrong, however, the amount returns to £0. What is the optimal stopping strategy? Should players hedge their bets after reaching £500? £4000? What if they have low accuracy?

A Monte Carlo simulation allowed me to calculate average earnings when comparing stopping strategy to percent accuracy. Here is the link to the Jupyter notebook that runs these simulations. I estimated 100 questions per game. I averaged over 500 simulated games. …


Image for post
Image for post

In some departments, the number is as high as 85 percent.

Almost every Princeton graduate — from Senator Ted Cruz to Supreme Court Justice Elena Kagan, actress Brooke Shields to Chair of the Federal Reserve Jerome Powell — has written a senior thesis. All these graduates have also used a titular idiom that plagues nearly half of Princeton theses: the colon.

  • Cruz ’92 — “Clipping the Wings of Angels: The History and Theory behind the Ninth and Tenth Amendments of the United States Constitution”
  • Kagan ’81 — “To the Final Conflict: Socialism in New York City, 1900–1933”
  • Shields ’87 — “The Initiation: From Innocence to Experience: The Pre-Adolescent/Adolescent Journey in the Films of Louis Malle, ‘Pretty Baby’ and ‘Lacombe…

Image for post
Image for post

A 99-point error spread

The third-party Twitter apps aren’t built to be used on accounts with millions of followers. Of course, that’s what users did anyway.

Fake-follower calculators, especially a platform called Twitter Audit, have been cited by a number of news outlets, including The Telegraph, Vanity Fair, and the Columbia Journalism Review. But these platforms’ statistical techniques are far from rigorous for large accounts.

One fake-follower calculator created by SparkToro, a startup claiming approximately $1.8 million in funding, even uses machine learning algorithms to separate real accounts from fake. But a bottleneck in how much data Twitter allows third-party developers to access in a given time has forced these web apps to use a tenuous definition of “random sample” for accounts with millions of followers.

The math is wrong.

When a programmer requests Twitter for data, Twitter returns a maximum of 75k follower IDs in a 15-minute period. If the IDs that Twitter returned were random, there wouldn’t be a problem. But these IDs are in order of who followed a user most recently. …


Image for post
Image for post

Research in coordination with the Open Modeling Framework.

Forecasting technology has given utilities an opportunity to flatten their load curves, raising a whole new family of questions. Below are solutions to important questions that can save utilities a good deal of money by reducing capital and operating expenses from peaking power plants. All testing can be found here.

This research can also be viewed on my website:

Part I: What’s tomorrow’s load?

Main takeaways:

  • To get any kind of useful energy consumption forecast, simple machine learning isn’t appropriate. Deep learning, however, can get us the accuracy we need.
  • Given historical load and temperature data, a straightforward neural network can give a 24-hour forecast with about 97 percent accuracy. …

Image for post
Image for post

Peak Shaving with Neural Networks: Part III

How one 19th-century physics equation can increase electric utilities’ savings by more than 60%

Research in partnership with the Open Modeling Framework.

This is the third in a three-part series about peak shaving with neural networks. Consider checking out the other two:

Even the best models for predicting energy consumption aren’t good enough to capture a majority of the possible value of peak shaving. When a forecast has just 3% error, it’s not unusual to lose half of possible savings as a consequence. Consider how the smallest inaccuracies dramatically affect these utilities’ expected savings from peak shaving (testing here):


Image for post
Image for post

Peak Shaving with Neural Networks: Part II

Electric utilities can detect monthly peaks with only a three-day forecast.

Research in partnership with the Open Modeling Framework.

This is the second in a three-part series about peak shaving with neural networks. Consider checking out the other two:

For electric utilities, reducing monthly demand charge can be hugely profitable. Implementing a peak shaving strategy every day, however, could be costly. If a utility is using direct load control (paying customers to turn off air conditioners, water heaters, etc.), they may frustrate customers if they do so too frequently. If a utility uses storage, overuse can force them to replace expensive batteries more often than necessary. Therefore, it’s not only important to predict the load shape for the next day, but also to predict the month’s peak. …


Image for post
Image for post
“This is Tina.” http://redd.it/ctj32b #aww

The easy way and the hard way.

The “aww” subreddit is arguably the cuteness singularity of the internet. With more than 20 million members, usually tens of thousands of users judge the cuteness of every post. We can leverage Twitter and Reddit’s developer APIs to share the best content of r/aww to Twitter.

How it works.

Here’s the Twitter account @reddit_says_aww. And here is a link to the code. Naturally, I’ve removed the public and private keys for my Reddit and Twitter accounts, but if you apply for a developer account at both companies, they can generate authorization tokens for you. …


Image for post
Image for post
Album art from 2020 candidates’ playlists.

What can Spotify data tell us about how some presidential campaigns are targeting voters?

Lizzo is one thing The Democratic Party can agree on. Three 2020 candidates have integrated Spotify into their campaigns: Senator Kamala Harris, Senator Kirsten Gillibrand, and Mayor Pete Buttigieg. In their playlists, she’s featured more than any other artist, and only she and Aretha Franklin appear on all three.

While it’s easy to notice Lizzo in each playlist, we can use Spotify’s datasets to reveal less obvious trends. …

About

Kevin McElwee

🏳️‍🌈 Machine learning engineer and data journalist. Learn about me and my projects at www.BrownAnalytics.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store