Hacking Nuzzel

As an information junkie, I am a huge fan of the news aggregation app Nuzzel. Finding interesting things to read is one of my core use cases of Twitter, and Nuzzel makes that process significantly faster. For the uninitiated, the app ranks articles over some time period (24 hour default) by how many of the people you follow have shared that article. Very neat.

One feature in the app that I find particularly useful is the ability to look at other people’s feeds. If I’m on a macroeconomics binge, I can check out Marc Andreessen’s feed. Curious about what tech journalists are buzzing about today? Check out Alexia Tsotsis’s feed.

While surfing other people’s feeds one night, I noticed that the top articles on everybody’s feed had a maximum of about 10 shares. This number felt surprisingly low. Low enough to be able to manipulate.

So I decided to try to do just that.

Framing the project

Given that I work in technology and most of the people I follow on Twitter do as well, I figured that there would be significant overlap in the users that are followed by the people I follow (what a mouthful). With such, I decided to determine which 10 users would need to tweet about the same article in order for it to make it to the top of the largest number of my Nuzzel “friends” feeds.

My strategy was as follows:

Get a list of all my Nuzzel friends
Get a list of all the people they follow on Twitter
Find the users with the greatest intersection of followers who are my Nuzzel friends

Feels pretty straightforward, so let’s dive in. Or if you’re impatient, rapidly scroll down to the “Results” section.

(make sure to check out the notes for extra data, info, and pithy comments. — >)

Getting a list of all my friends on Nuzzel.

Luckily, Nuzzel has a discovery feature that shows you everybody that you follow who also uses Nuzzel. Unluckily, they only expose the full list through their mobile app, which made it a bit harder to gather.

I could have gathered the list manually, but I’m lazy. Instead, I decided to fetch the data straight from their API. In order to do that, I had to do a bit of packet sniffing with a tool called Charles. Here’s how I did it:

Downloaded, installed, and launched Charles.
Set up my iPhone up to proxy through Charles.
MITM myself by installing the Charles root certification so I could read SSL requests.
Add https://api.nuzzel.com to the SSL Proxying settings in Charles.
Launch Nuzzel, open the left menu, click on “Friends’ Feeds”, wait for it to load, and go check out the payload in Charles.

Looks like Nuzzel has a nice, RESTful API using a X-NuzzelApiKey header for authentication. The endpoint for your friends list is:

https://api.nuzzel.com/v1.0/users/[Your Nuzzel ID]/friends

The response from this endpoint is a well-structured JSON response with some interesting data: which services the user authenticated with, whether Nuzzel thinks they are spammy, all of their Twitter lists, etc.

But the only piece of information that I really cared about from the response was their Twitter ID, so I could fetch their following list from Twitter. Little bit of python made parsing that a breeze:

In my case, I had 128 friends on Nuzzel (~25% of the people I follow on Twitter).

Step 1 accomplished.

Fetching a list of the users my friends follow

Twitter might not have the best developer relations in the Valley, but their API is simple and well documented. Let’s just get straight to the code:

Twitter’s rate limiting here is kind of brutal. Depending on how many Nuzzel friends you have, this might take a while. In my case, the run time was 2 hours.

On to step 3.

Analyzing the data

Once I acquired all the data it was time to dig in. First, I checked out a couple interesting stats:

7 people in the set follow more than 5,000 people.
The 128 people in the dataset follow an average of ~1,300 people.
The data has 175k total follows, with 89k distinct people being followed.
The most followed person among my Nuzzel friends was Marc Andreessen — 102 of them follow him. He just narrowly beat Fred Wilson’s 101. Game on fellas.

Ok, on to the actual challenge: finding the set of 10 people that would cause an article to show up at the top of the largest number of my friend’s Nuzzel feeds. Turns out this isn’t all that easy.

With 89k distinct people, there are a goddamn-near-infinite number of distinct 10-person sets to check. So, the first step is to try to reduce the search space. We can make good progress by making the basic observation that the top 10 most followed people in my list are likely to make a pretty strong 10-person set. Let’s find the number of my friend’s feeds that an article would show up with this baseline set:

In my case, the number of common followers amongst the top 10 most followed people was 43. Pretty impressive. This baseline allows us to cut anybody who doesn’t have at least 43 of my Nuzzel friends following them. Sorry y’all. Here’s some code to do that:

My filtered list now contains only 194 people. This is a nice reduction, but still leaves us with 1.6 x 10^16 possible 10-person sets. Clearly still too many possibilities to try a brute force attempt.

Now is where the actual fun begins. Because the search space is so large, there’s no way to find the guaranteed global maximum 10-person set, but we can search for an approximation.

The strategy I implemented was a mix of some type of genetic algorithm with simulated annealing. The process was as follows:

Start with the base set of the 10 people with the most followers.
Find the person that is most dragging down the number of common followers. This is done by removing each person from the list and checking how many common followers the remaining set has. Whoever’s presence reduces the list the most is our target to kick out.
With some probability, ignore the kick target and instead choose a random member of the group as the kick target. As the algorithm gets closer to completion, this probability reduces.
Replace the kick target with a random, non-group member. Check to see if the number of common followers in the new group is more than the old group. If it is, keep it. If it isn’t, leave the old group intact.
Repeat a lot of times.
To push it even further, run the algorithm again but set the most influential people from your first pass as the base set.

Code time. Here is the optimization method. I passed in the filtered list of users from the previous step and 100,000 as parameters.

You can take a look at this gist if you’re interested in seeing the rest of the code.

The Results!

Without further adieu, I present you with the best 10-person set my algorithm came up with:

Marc Andreessen (https://twitter.com/pmarca)
Fred Wilson (https://twitter.com/fredwilson)
Chris Dixon (https://twitter.com/cdixon)
Aaron Levie (https://twitter.com/levie)
Hunter Walk (https://twitter.com/hunterwalk)
Elon Musk (https://twitter.com/elonmusk)
M.G. Siegler (https://twitter.com/mgsiegler)
Dick Costolo (https://twitter.com/dickc)
Keith Rabois (https://twitter.com/rabois)
Josh Elman (https://twitter.com/joshelman)

Welp. Not an easy task to get this list of tech-industry heavy hitters to share the same article within a 24 hour perod.

But if they did, that article would land solidly at the top of 48 of my Nuzzel friend’s feeds (excluding their own feeds). Remember, a list of the top 10 most followed people would have made the story go to the top of 43 people’s feeds. All this work got us an extra 5 top feeds. That’s 37.5% of all my Nuzzel friend’s feeds. Not. Too. Shabby.

Congratulations M.G. Siegler, Dick Costolo, Keith Rabois, and Josh Elman. You have more feed topping power than Chris Sacca, Paul Graham, Ev Williams, and Dave McClure.

To be complete, the list of 48 people’s feeds can be found at this gist.

Wrapping Up

Hopefully this was as fun to read and follow along with as it was to explore and write about. The long and short of the story is that getting the right group of people to share a story will get it more exposure than getting the most prominent people to do so. No surprise, I suppose, but certainly interesting to concretely examine.

If you have a better way to do the optimization step, I’m interested in learning how you might do it! Happy to share my data dumps :-)