For almost a decade, “Florida Man” has been a mainstay antihero of internet culture. Headlines like “Florida man too fat for jail” and “Florida man steals dinosaur bones” are easy fodder for meme-ification. In early 2013, “Florida Man” was canonized on Twitter with @_FloridaMan and on Reddit with the r/FloridaMan subreddit. And after seven years of retweeting and upvoting, we can gather the most popular headlines to see what makes a “Florida Man” headline successful.
Below is a quick overview of the numbers behind these headlines and here is a link to a web app where you can explore the articles on your own. …
In early 2019, we built a deep learning model that predicted electric consumption on an hour-by-hour basis. Because the smallest error can cost an electric utility tens of thousands of dollars, we explored a number of more complex forecasters. In the end we discovered that a simple day-long approach is the most effective, often cutting error in half.
The Weakest Link has a group of people answer questions in a circle. For each correct answer, the team goes up the totem pole of values they can “bank.” After one correct answer they can bank £250, after two correct answers they can bank £500, and so on. Once a player gets a question wrong, however, the amount returns to £0. What is the optimal stopping strategy? Should players hedge their bets after reaching £500? £4000? What if they have low accuracy?
A Monte Carlo simulation allowed me to calculate average earnings when comparing stopping strategy to percent accuracy. Here is the link to the Jupyter notebook that runs these simulations. I estimated 100 questions per game. I averaged over 500 simulated games. …
Almost every Princeton graduate — from Senator Ted Cruz to Supreme Court Justice Elena Kagan, actress Brooke Shields to Chair of the Federal Reserve Jerome Powell — has written a senior thesis. All these graduates have also used a titular idiom that plagues nearly half of Princeton theses: the colon.
Fake-follower calculators, especially a platform called Twitter Audit, have been cited by a number of news outlets, including The Telegraph, Vanity Fair, and the Columbia Journalism Review. But these platforms’ statistical techniques are far from rigorous for large accounts.
One fake-follower calculator created by SparkToro, a startup claiming approximately $1.8 million in funding, even uses machine learning algorithms to separate real accounts from fake. But a bottleneck in how much data Twitter allows third-party developers to access in a given time has forced these web apps to use a tenuous definition of “random sample” for accounts with millions of followers.
When a programmer requests Twitter for data, Twitter returns a maximum of 75k follower IDs in a 15-minute period. If the IDs that Twitter returned were random, there wouldn’t be a problem. But these IDs are in order of who followed a user most recently. …
Research in coordination with the Open Modeling Framework.
Forecasting technology has given utilities an opportunity to flatten their load curves, raising a whole new family of questions. Below are solutions to important questions that can save utilities a good deal of money by reducing capital and operating expenses from peaking power plants. All testing can be found here.
This research can also be viewed on my website:
Research in partnership with the Open Modeling Framework.
This is the third in a three-part series about peak shaving with neural networks. Consider checking out the other two:
Even the best models for predicting energy consumption aren’t good enough to capture a majority of the possible value of peak shaving. When a forecast has just 3% error, it’s not unusual to lose half of possible savings as a consequence. Consider how the smallest inaccuracies dramatically affect these utilities’ expected savings from peak shaving (testing here):
Research in partnership with the Open Modeling Framework.
This is the second in a three-part series about peak shaving with neural networks. Consider checking out the other two:
For electric utilities, reducing monthly demand charge can be hugely profitable. Implementing a peak shaving strategy every day, however, could be costly. If a utility is using direct load control (paying customers to turn off air conditioners, water heaters, etc.), they may frustrate customers if they do so too frequently. If a utility uses storage, overuse can force them to replace expensive batteries more often than necessary. Therefore, it’s not only important to predict the load shape for the next day, but also to predict the month’s peak. …
The “aww” subreddit is arguably the cuteness singularity of the internet. With more than 20 million members, usually tens of thousands of users judge the cuteness of every post. We can leverage Twitter and Reddit’s developer APIs to share the best content of r/aww to Twitter.
Here’s the Twitter account @reddit_says_aww. And here is a link to the code. Naturally, I’ve removed the public and private keys for my Reddit and Twitter accounts, but if you apply for a developer account at both companies, they can generate authorization tokens for you. …
Lizzo is one thing The Democratic Party can agree on. Three 2020 candidates have integrated Spotify into their campaigns: Senator Kamala Harris, Senator Kirsten Gillibrand, and Mayor Pete Buttigieg. In their playlists, she’s featured more than any other artist, and only she and Aretha Franklin appear on all three.
While it’s easy to notice Lizzo in each playlist, we can use Spotify’s datasets to reveal less obvious trends. …
About