Exploring Subjectivity in Algorithms
Let’s explore subjectivity in algorithm design using one the most iconic algorithms of the internet age: PageRank.
First, a brief primer on the formulation of the PageRank algorithm. The algorithm starts with a model of the world as a directed network; fitting as it was invented by Sergei Brin and Larry page for ranking pages on the world wide wide web. The PageRank of a network is only well defined after one imposes a model of a mixing process on the directed network. The canonical PageRank algorithm uses a simple heuristic, each link has the the same probability of being traversed and the mixing processes is easily interpreted in terms of random walks. The vector called the PageRank is the stationary distribution of the random walks assuming those random walks start at a particular seed vector and that any particular random walk ends and new one is started with probability alpha, also known as the teleportation rate. If uses additional link metadata to give the links different weights, the resulting ranking could differ significantly.
Changing the seed vector will also significantly change the algorithms results; the canonical PageRank algorithm uses a uniform distribution over the nodes as the seed vector. The algorithm is referred to as Personalized PageRank when using a single node or a subset of nodes as the seed. The results can be interpreted as the importance of each node relative to the seed, in this case often referred to as the personalization vector. We have introduced our first notion of subjectivity. Even if we all agree on one heuristic for applying weights to the mixing process, we can arrive at different rankings simply by acknowledging a position in the network implies a unique perspective, and thus a unique ranking. I will demonstrate this affect using a simple reference case, a circle graph. This directed circle graph has a uniform canonical PageRank because it is symmetric.
When the seed is changed from the uniform distribution to the Personalization vector for node 3, the symmetry is broken and the direction of the cycle is immediately apparent.
So for this is still pretty basic, selecting personalization vectors gives context dependent rankings but it doesn’t yet scratch the surface of the family of mixing process based ranking algorithms, and the more subjective choices an algorithm designer makes (intentionally or unintentionally) affecting the outcome of the metric.
In order to examine these questions, let’s expand our interpretation of the underlying network being hyperlinks and webpages to a more general attribution network. In this article, an attribution network is any directed graph where a link assigns credit (or credibility) from the source to the destination. Examples of such networks include academic citations, software dependencies and twitter style social networks (following assigns credibility). In introducing this form, we further parameterize our algorithm to allow our mixing process to flow both up and down a directed edge, albeit with different weights.
Using the the academic citations reference case, this assumption acknowledges that citing a well known reference my paper gains credibility, albeit not to the same degree it confers it. We will not require that the edges flow more weight downstream than upstream but it is a good convention to follow.
Why do we need a convention?
Because this a subjective choice, motivated by an intuition that when work A cites work B, A lends more credibility to B than B being cited lends to A. Now let’s look at what happens to personalized PageRank in our circle graph when upstream downstream credit is conferred, but upstream flows have half the weight.
A great deal more weight is capture by nodes 1 and 2which have the closest upstream links from node 3, but somewhat counter-intuitively the ranking of the nodes a but further downstream like 17 and 18, lost a lot of ground. This occurs because the ranking algorithm is stochastic vector, that is, it has a conserved sum of 1. In giving upstream nodes credit, nodes immediately downstream of the seed gained importance through the creation of short cycles, and the nodes further downstream lost out proportionally. An important message here is that even a well intentioned alteration to a simple algorithm can have unintended and undesirable consequences.
With this in mind, I would like to pose some simple reference cases with no ‘right answer’ per se. For purpose of this discussion let us suppose the links represent academic citations, and for plotting purposes, I will retain the assumption that credit flows upstream at half weight.
How should credit be distributed over a sequence of dependent papers?
(As represented by a line graph of 20 nodes)
If we simply take the canonical PageRank algorithm with no personalization something very strange happens. The algorithm considers the second from the end (node 18), not the end (node 19), interpreted as the first paper to the most important.
Why does this happen? The mathematical answer is node 19 is disadvantaged by the fact that it sits on the boundary and receives no upstream credit. Most of us would intuitively prefer to node 19 at or above node 18, because it feels more consistent. However, others might argue that we could not have appreciated the work in node 19 without corroboration by node 18; in practice, there are a lot of cases where an early paper exists in near obscurity until other begin to build on it. Who is to say that node 19 should be more important than node 18.
Let’s suppose we wish to rectified this perceived slight of the root node. I can do so with a simple change to the algorithm: introducing self loops at weight equal to the downstream citations.
Do the self loops make the algorithm better? It depends on who you ask. These algorithms are equally ‘objective’ in that they represent mathematical properties of a mixing process on the same graph. If you look carefully, our addition of self loops also improve node 0’s score. Where the mixing model without self loops disadvantaged nodes on the edges, the self loops favor them.
Another salient feature of the line graph reference case is the rate the PageRank falls off. The uniform seed vector drives some base level of credit into every node and the mixing process allows it to diffuse throughout the network; the coefficient alpha can interpreted as the rate at which this credit is injected: larger alpha will drive the solution closer and closure to uniform PageRank and smaller alpha will allow the credit to diffuse much further resulting in more accumulation in pockets and thus a much wider spread between the largest and smallest values. Here is what happens when I set alpha = 0.001.
What happens in the case that node 0 is deemed to be an important discovery and the community is instead interested in how prior work deserves credit in making this possible? This context calls for a personalized PageRank, but does not indicate whether to keep the self loops, the upstream links or how to set alpha. I will demonstrate the result with self loops and half power upstream links, but acknowledge this choice is discretionary.
Since we are using a personalized PageRank the scores are very sensitive to our choice of alpha, so I have computed the results for a rage of alpha values.
What stands out is that decreasing alpha shifts the credit from the most immediate dependencies to the earlier dependencies. If one believe the most foundational work was the most important, then a small alpha is required. On the other hand, if one believes that the most recent work leading up to the discovery deserves more credit, then a larger alpha is required. Personally, I like the result from alpha = 0.03 because allows value to accrue for foundational results while still heavily weighting work closer to the discovery.
That is however not something I plan for directly. The moment we change the network, the alpha that “feels right” or even an alpha optimized for a particular characteristic for one graph, will not necessarily have that characteristic on another graph. Let’s examine a few more reference cases.
How should credit be distributed for Star Graph?
As one might expect the credit accumulates at the shared dependency.
It is interesting to note that changing alpha has almost no affect at all on this reference case.
A more interesting case arises when the direction of the star is flipped and a large number of unconnected references are used are referenced by one paper. Now the focus is on the relative power of the upstream weights. When the upstream weights are small as in the case below, the outer nodes all get more credit than the hub.
However, as the upstream weights are increase this is reversed and the hub captures the lion share of the credit because it draws on the credibility of all of the references none of which have any other links from which to accrue credit.
This flip in importance can be examined by repeating experiments over a range of relative values for the upstream weights (relative to the weight of 1 assigned to downstream edges and self loops.
This change happens much more quickly than one might expect; the influence of the central node referencing all the others surpasses the referenced nodes in the periphery with less than 1% upstream weight. This brings to our attention an important fact about upstream weights; if they are even within an order of magnitude as downstream weights, a contributor wishing to pad their own ranking might spam references to elevate themselves in appropriately. However, one should also note that this reference case is hyper sensitive to upstream weights. We should examine a reference case where the dependencies are more spread out.
How should credit be distributed for Tree Graph?
In this reference case the network is directed binary tree of depth 4. The directed tree graph shares properties of both our line graph and our star graph. As was the case with the line graph, without self loops, the node with the root node actually receives less credit than its immediate predecessors.
As was the case with the line graph, the inclusion of self loops with weight equal to the downstream links corrects for this effect.
As was the case with the star graph, switching the direction of the reference to fan out, rather than converge, highlights the effect of the upstream weights.
Unlike the case of the star graph the node without only outbound reference does not accumulate a significant amount of credit. There is still however an interesting phenomena: the second to last layer accumulates the most credit per node in that layer. Further exploring how the credit accumulates in the layers it is clear that the depth 4, the last layer does still accumulate the most credit but it is spread out over more nodes.
It is yet again not clear whether this is the behavior we want for a credit attribution algorithm. Choosing the strength of the of the upstream weights can significantly affect these credit distributions.
Similar to the case of the star network, larger upstream weights drive credit to the node with no inbound links, but unlike the star network, those weight must be quite large the PageRank rises significantly.
Now, let’s repeat experiment run on the line graph; suppose that the node that doesn’t have any references to it yet is new breakthrough and we want to decide to allocate credit to prior work. Our personalized PageRank version of the above analysis will show us.
Results indicate value still accumulates in the last two layers with the penultimate layer capturing the most per capita influence and the last layer capturing the most total influence.
Again, these results are very dependent on parameter choices. When sweeping the upstream weights parameter lower upstream weights shift the importance to nodes which come much further down the dependency graph. When sweeping the alpha parameter, smaller alpha shifts the credit further down the dependency graph.
These two parameters have notably different effects on nodes in the intermediate depths; decreasing the upstream weights has a significant adverse affect on the nodes in the penultimate layer, whereas changing alpha causes little to no change on the credit attributed to these nodes.
I will conclude with one of my favorite experiments. What happens when a group of attackers contribute bogus content to the network, but start creating interconnected references among themselves?
In this reference case my project is made up of contributions in nodes 7 to 13, and my attack has submitted contributions 0 through 6 with one link to the rest of the project and a lot of internal links amongst their own contributions.
Well at first it doesn’t look great. Due to the absence of links from the legitimate work clique to the bogus work clique the legitimate work has marginally higher PageRank, but not enough to be a meaningful defense from this kind of attack.
This is where seed vectors can be used as a form of trust anchoring. In an environment where some activity might be falsified in order to artificially inflate the credit of certain contributions, it is best to avoid the uniform seed which rewards sybil attackers and instead leverage a human knowledge to determine a seed comprised on human verified content.
Supposing instead of using the uniform seed, the same experiment is repeated with nodes 9 and 13 set as trusted seeds. The results are very different.
One might argue that these choices of trusted seeds are arbitrary and subjective! To which I will say of course they are, but what the person making this argument often fails to acknowledge was how much subjectivity was in the algorithm even before we allow human experts to decide the trusted seed.
Who decided whether and two what extend credit can flow upstream on links? Who decided whether there would be self-loops and how strong they are? Who decided what the alpha parameter should be? All of these are subjective choices and they have non-trivial ramifications on how credit is assigned, and once put into practice such a measure will have equally non-trivial affects on the behavior of participants seeking credit or compensation for their contributions.
To be clear, this isn’t a statement about the PageRank algorithm, it is a statement about algorithms, and in particular algorithms used to reduce complex human efforts into quantitive boxes so we can compare them. There is no absolute objective measure of value. Every metric is objective conditioned on subjective choices of what to value. We make these subjective value judgements everyday, and there is nothing wrong with that, but when it comes to encoded our subjective value judgements into code, and implicitly or explicitly imposing them on others, it is our obligation as algorithm designs to be mindful of those subjective choices, and the impacts they can and will have on others.