Modeling the Contemporary Social Internet
This last week, I attended a several day seminar about combating online harassment, saw an academic friend get an award for her book on trolling from the Association of Internet Researchers (you should buy it), and I ended the week by seeing this cover from Time:
There’s something about the last several years, nevermind the last several days, that have left me with the distinct sense that something qualitatively changed in terms of how individuals interact online. Declarations of 2014 as the year of Internet Outrage, a read-through ofJon Ronson’s “So You’ve Been Publicly Shamed”, and numerous examples lost in a streaming set of media delivered over the past few years makes it feel like perhaps, far from people changing, the conditions underlying interactions between people have changed such that we are in a new Internet era in terms of socializing online.
What I want to do with this is provide a model that has percolated from many conversations about the Internet these days, the “trolling/harassing/outrage” problem, and the potential dynamics that drove us to this point. I’m going to provide a bit of historical context, a formalization of this context into a theoretical model for contemporary social networks, and then we’ll walk through a series of tests to see the degree to which this general approach could potentially help us understand why things have developed as they have.
Pre-Web 2.0 Socialization Online
The way I’m approaching this question is from a light observation of historical interaction online. To be very fast and loose, without a proper accounting, the story of online socialization has largely involved a scaling up in terms of the number of participants in any single network of interaction. For the BBS platforms, the current edit on Wikipedia shows that there 60,000 BBSes serving 17 million users in the United States, or about an average of 280 total individuals associated with each BBS on average (though of course, the skewedness of this factor is likely considerable). And even then, the number of concurrent users was bound to the number of phone lines, which rarely exceeded the single digits.
Moving on to the early Internet, systems like IRC and USENET (which of course also preceded and overlapped with BBSes), system sizes likely rarely exceeded 104 in most common cases. As we scaled through, and various technological and hardware affordances were given, we scaled to 10⁵ on systems like PHP bulletin boards, which predominated from the late 90’s to the mid-aughts. And, then, of course, “Web 2.0” brought new platforms like LiveJournal, which pushed system scales to 10⁶ and 10⁷, and over the past five to ten years, we have witnessed the creation of systems that are routinely single online social networks 10⁸ in size in terms of the participants, largely as a result of leveraging highly distributed architectures and affordances largely given by very low level hardware advances in terms of compute facilities and very high level useful programming language and operating system abstractions.
The main point, however, is not the historical frame — the historical frame only shows us that there has been a relatively steady exponential increase in terms of the size of the average popular social network online. We have gone, over the last ten years, finally, from federated small online social networks (like phpBB forums) to singular, dominant forums that are multi-use and allow us to engage with many different communities of individuals simultaneously. To give a visual of this, this is what a world of 1,000 people on the Internet would have looked like, structural-relationally speaking, in 1997:
In other words, we would have been siloed to different, single or few-topic platforms (or forums), and though we may belong to several or even many, we would have to re-register ourselves across platforms, at which point we could either join with with a linked name or pseudonymously, free to start a new form of ones expression altogether. In 2016, we live in a system that looks more like this (again, the same number of people, and same number of relationships):
In this view, we have a single network. We may use Twitter as well as Facebook, but, against the great many forums that used to exist, we are left with essentially few to no options about the networks we participate with others on. Subsuming all former small “villages” on the Internet, we have all abandoned our bespoke, isolated digital villages for the big cities of Facebook, Twitter, and the like. Over time, these new systems have scaled enormously, and we have been left with the conditions for Internet Outrage, Online Harassment.
I’m open to discussions about this, but I think the necessary conditions for these types of behaviors are somewhere between the intersections of these phenomena:
- A system where users with widely heterogenous beliefs are encouraged to participate and discuss those many beliefs
- A system that has scaled beyond a point at which users can typically resolve conflicts relatively informally through social mechanisms
- A system which encourages users to find and interact with people that share their particular niche of the heterogenous belief space with as much ease as their polar opposites
In a small social network in 1997, interests were generally self-selectingly aligned, the system scale was usually of a manageable order, where individuals could collectively and informally regulate behavior, and interaction with socially distant alters, as compared to socially proximate alters, had a higher cost. In the Internet of 2016, user beliefs are highly heterogenous across large systems, systems have become so large that informal social mechanisms fail to regulate bad behavior, and on platforms like Twitter, social interaction is just as easy at any distance.
What I’d like to do now is construct a model that creates Twitter-like social networks, and look at how, as compared to alternative models of social network evolution algorithms, this particular type of evolution creates networks particularly well-structured for the ease of information transmission. Of course, this only partially responds to those three points above — this approach only shows that some networks are particularly more “infectious”, and a more infectious network allows for more frequent infections, good and bad. Further work would have to be done to completely ground these claims, but this is a good starting point that I think is useful to share at this point.
Modeling the Evolution of a Social Network
While controversial to some (and really only in terms of academic attribution and gripes beyond it’s explanatory utility), the Barabási–Albert model of preferential attachment as a network grows is a simple, powerful idea for showing how network structures like the Internet look the way they do. Simply put, the algorithm creates a network like so:
- Start with a few connected nodes.
- Add a new node. Add a few edges for this new node. The nodes that this new node connects to the other existing nodes should be proportional to the number of edges that the existing nodes have.
- Repeat until the desired system size is hit
From this simple set of rules comes emergent properties of networks such as their scale-free property, the short diameters (or the “Six Degree” game), and their clustering into sub-communities. One of the most distinct properties, of course, is the highly skewed degree distribution (which is the “scale free” in the scale free network), which we also see on most social networking platforms (particularly directed social networks as opposed to the undirected, mutual-tie Facebook). As a general model, it works close enough that we can use it to approximate the evolution of social networks. For our purposes though, we will create a specialized model that approximates the phenomenon we’re interested by simulating the evolution of one social network, Twitter. First, I want to point out several evolutionary processes that are particular to Twitter:
A1. In the Barabási-Albert model, new nodes make new ties to existing nodes, but existing nodes never tie back. Obviously, there is reciprocity on Twitter, so sometimes, nodes should tie back. Typically, very popular nodes on Twitter follow only a few accounts, while small accounts follow many people back. So, we will estimate that the probability for reciprocal ties is proportional to the inverse percentile of how popular the account is. In other terms, the lowest node ranked by degree distribution will always follow someone back, the most popular by degree distribution will never follow someone back, and everyone else will fall somewhere along those extremes.
A2. In the Barabási-Albert model, nodes only link to other nodes proportional to the indegree of the node. That is to say, the only consideration “users” make in terms of “who to follow” is more or less, who is popular. Obviously, on Twitter, that is not the case. While people still likely trend towards this tendency, clearly people follow people who have similar tastes in some way or another. So, a model approximating Twitter should incorporate this type of behavior.
A3. Some opinions are more salient than others — while some sets of opinions may not be meaningfully correlated (i.e. preferred laundry detergent and favorite color), others may be (i.e. abortion and gun control). So, some subset of “tastes”, or “opinions”, should be somewhat correlated for users.
Finally, we can outline a formal model for estimating the evolution process of a platform that could look remarkably similar to Twitter:
B1. Suppose there are ultimately N nodes on a network, and that each node Ni has a vector of “opinions” Oi within a larger matrix O (where every other node also has an opinion on the same number of opinions across the matrix). Each “opinion set”, or the set of values for one matter to have an opinion on, are normally distributed with a randomly selected mean and randomly selected standard deviation. Some proportion of opinions PcPc are correlated to one another. When Pc=1, all opinions are correlated with one another, and when Pc=0, no opinions are correlated with one another.
B2. The evolution starts with a small set of initial users. New users arrive one at a time, just as in the BA model. For each new users, they create k new ties to other users that already exist on the platform. This tying process is a joint maximization against the visibility of existing users and the degree to which the new user and the existing user are similar in their opinions (operationalized as the MSE across the opinions). In the process, ew users first surface a set of users randomly, proportional to their degree, and instead of just following them as in the BA model, they further select that larger set according to their similarity. Reciprocal ties are formed according to (A2), and occur proportional to the degree to which the two users are similar.
B3. This process continues until the system size, in terms of total nodes, is equal to NN, at which point, summary statistics can be drawn about the network.
B4. These statistics can then be compared to other networks generated solely by preferential attachment, and solely by similar tastes in opinions. Then, we can look at various aspects of these networks in comparison.
By example, this orange, new node, would likely attach as shown — it selects people jointly through a process of filtering by popularity as well as by interest matching.
To analyze networks constructed in such a way, a series of simulations were run to see how the BA model differs from a pure Interest Matching (IM) model (where people only maximize on interest similarity as described in B2), and then the IM/PA model which includes preferential attachment and jointly maximizes as described in B2. Networks of various sizes and various edge densities were generated, and varying degrees of Pc were tested. Once the networks were generated, a series of stochastic rumor models were run on the network to determine how well a rumor could spread through the generated networks.
Network sizes varied between 100–2000 nodes with a step size of 100, edge densities varied between 2 to 6 edges on average per node with a step size of 1, and Pc varied from 0 to 1 with a step size of 0.05. Ultimately, 4,200 triplets of networks (one BA, one IM, and one IM/PA) were generated. For each triplet of observations, t-tests were run on various aspects of the networks, all of which returned significant differences between the three models:
The results, unsurprisingly, show the IM/PA model somewhere between the two extremes of network generation, obviously as a result of the way in which the agents in this particular model generates their network. In a series of T-tests, all of these differences between the three models significantly differ. Surprisingly, the rumor model shows interesting results:
In other words, while the IM/PA model finds itself likely between the two extremes in terms of its basic topological properties, when a rumor is spread on the IM/PA model, the rumor tends to spread a bit further than either of the two extremes. Digging in, we can see these affects with varying densities of networks and system sizes:
Together these charts show some interesting results — first, the infection process is largely size independent, while higher density always corresponds with higher infection rates. Surprisingly, the IM/PA mixed model consistently infects more individuals than BA or IM alone — which gives us a good view of the dynamics that were outlined above.
In other words, while there are potentially many different competing reasons for why the internet is such a hotbed of outrage and harassment, this model shows that the networks created by processes similar to the process that likely created Twitter makes networks that are ideally built for rumor spreading, even beyond the Barabási Albert model. Further, while system size doesn’t have an impact, increasing density does. Altogether, these results seem to support further inquiries of this model, and show that this dynamic could be one viable explanation of why the internet has become so awful — we have constructed our social networks in a way that is particularly ideal for the diffusion of information, good and bad, even though our individual goals were to create small communities of like minded individuals within the larger context of a social network. And because we have a highly infectious space, where people have many different beliefs, at a scale where those spreading rumors look more like a swirling mass than a set of noticeable individuals, in a space where that mass can interact with anyone at any time, it seems reasonable to expect these networks to be particularly well suited for the problems we’ve been seeing.