Really simple diagram (and why I should improve my UX skills)

Rethinking Twitter’s “who to follow” (using Node.js and d3.js)

I did some work with the Twitter API and my approach gave me some pretty nice suggestions 😄

Rafael Specht da Silva
Statuscode
Published in
5 min readFeb 26, 2018

--

Sometimes I look at Twitter’s “who to follow” area and never see people that I really want to follow. I don’t know which approach Twitter is using (and I don’t want to say how anyone should do their job), but a long time ago I created followInsights as a way to find new GitHub profiles to be inspired by — I’ve applied the same logic here.

Let’s say that we have a “root user” (because this kind of remember me a tree). If this “root user” follows some people, we can surmise that they trust this network, right? Let’s call the users followed by the “root user” as “first level” users. Now, if this “first level” is relevant to the “root user”, the people that they follow (let’s call them “second level”) are probably relevant also. So what we have to do is to get all the users from the second level and count how many times a user appears there. Because if lots of people on the first level follow a given user that indicates some relevance (funny jokes, cats gifs, nice ideas).

The problem: Twitter’s API has a rate limitation (which is obvious, they don’t want us to download everything from there). So we need to do this in batches and it’ll take some time.

Once we get the ones on the first level we need to repeat the process to all this users to get who they are following. This took more than one week with a running process on terminal (at least it does not consume much memory 😬). Also, I established a limit (configurable on the repository) to only consider profiles that are following less then 5000 users because some brands/companies usually follow lots of people and it’s not necessarily someone that they are interested in.

👋 Followers

Using myself as example, here are the first eleven results counting how many times a user appears:

  1. paul_irish, 277
  2. rafael_sps (it’s me 👋), 249
  3. jeresig, 247
  4. addyosmani, 247
  5. github, 228
  6. BrendanEich, 213
  7. elonmusk, 187
  8. nodejs, 183,
  9. rauchg, 181
  10. mathias, 177
  11. BarackObama, 175

Obviously if I follow many people that follow me back that will make me relevant to my own network (even more than Obama? 🤔😄).

So let’s remove people that I’m already following from these results:

  1. elonmusk, 157
  2. BarackObama, 156
  3. chriscoyier, 126
  4. ChromiumDev, 124
  5. brianleroux, 114
  6. sarah_edo, 112
  7. slightlylate, 110
  8. SaraSoueidan, 108
  9. sindresorhus, 103
  10. rmurphey, 102

And here is a list with some indications. Yay!

But WHAT IF we consider a score based on how many times a user appears? Let's say that a user is followed by jeresig (who appears 247 times) and paul_irish (277) so this user will have a score of: 247 + 277 = 524. The list becomes (username, score):

  1. paul_irish, 14306
  2. jeresig, 13284
  3. addyosmani, 12941
  4. rauchg, 10971
  5. mathias, 10968
  6. BrendanEich, 10957
  7. jaffathecake, 10474
  8. brianleroux, 9707
  9. tomdale, 9481
  10. stubbornella, 9388

And removing who I’m already following:

  1. brianleroux, 9707
  2. slightlylate, 8956
  3. dalmaer, 8613
  4. chriscoyier, 8586
  5. sarah_edo, 8399
  6. rmurphey, 8008
  7. ChromiumDev, 7777
  8. cramforce, 7752
  9. reybango, 7706
  10. sindresorhus, 7469

And we have some new faces here!

If we parse the data to fit the amazing d3.js Hierarchical Edge Bundling graph, this is the result:

d3.js Hierarchical Edge Bundling graph for rafael_sps as the root user

The cool thing is that you can hover over a username to see how all the users are interconnected:

d3.js graph with hover effects

🌎 Locations

Since we have the users location (or kind of) we could group them to see if I’m missing someone that lives in the same city.

The problem with this approach is that the location does not follow a pattern, so if you want to check how San Francisco is written here are some examples:

San Francisco, CA
San Francisco
San Francisco, California
San Francisco Bay Area
San Francisco, California USA
San Francisco Bay Area, California, USA
San Francisco, CA, US
San Francisco, The Internet (🤔 cool state btw)
Vancouver || San Francisco

Maybe parsing some names I could remove the state and try to group (wow such regular expression) then ordering by users quantity and/or score.

Here it is how many acumulated score (all user’s scores summed) and how many users that I’m not following are on a specific location:

(Location, score, total users on the location)

  1. Bend, 10088, 2
  2. Vancouver || San Francisco, 9707, 1
  3. Glasgow, 7798, 2
  4. South Florida, 7706, 1
  5. Bangkok, 7469, 1

Nice to see really relevant people that is not only in Silicon Valley (duh, mr obvious!).

And when sorting locations according users quantity:

  1. “”, 1419, 224 (mostly keeps the location blank)
  2. San Francisco, 1377, 116
  3. São Paulo, 610, 58
  4. New York, 2784, 40
  5. London, 808, 36

So I’m mostly following people from big cities and that’s a shame because this is kind of a bubble. I should try to read more from folks that are from different places, not only white men working in California.

📝 Descriptions/Bios

We can group users by bio/description. My networks top five words are (word, frequency):

  1. web, 46
  2. developer, 25
  3. javascript, 21
  4. google, 14
  5. creator, 14

So, the users that have Web in their description are:

  1. slightlylate, 124
  2. reybango, 105
  3. lukew, 100
  4. cramforce, 96
  5. timberners_lee, 96 (I think this guy knows a thing or two about the “web”)

And the Developer word:

  1. ChromiumDev, 140
  2. sarah_edo, 126
  3. SaraSoueidan, 120
  4. MylesBorins, 100
  5. ThePracticalDev, 90

This is a good method to find relevant people on your network, but I believe that we should try to have a more diverse network to read about different subjects. Maybe I should run the same script to a different profile, someone that I trust as a voice from unrepresented people and take a look at that network, (not only mine) and that would give even better insights.

I’m using Twit as the API wrapper and d3.js to do the graph. The code is available here.

There are a lot of improvements to do, but I hope that this can be useful for some people! Liked this? Be sure to give it a clap. 👏

--

--

Rafael Specht da Silva
Statuscode

Web developer, gif sommelier, once called “weird webby wizard”