In this recent article Dandelion Mané introduced SourceCred an Open Source Software project taking aim at the age old challenge of credit attribution in Co-Creative Communities. I say age old because human collaboration to achieve collective goals with incentives driven predominately by social credit is a fundamental building block of human society.
The study of coordination appears in appears in anthropology and psychology, but advances in statistical methods in the mid 20th century gave birth to the first formal studies leveraging mathematics and data to structure and interpret data collected according to social scientific traditions of ethnographic research. I was introduced to the topic in 2004, working on an evolutionary game theory project using simulations to explore the results of Axelrod on the Evolution of Cooperation (1984); synopsis linked.
For the purpose of this article, I’ll highlight some early works focused on academic collaboration networks that set the stage for the modern study of attribution networks which are network formation games where the goal is to get credit from the community for your contributions.
- Networks of Scientific papers; Derek de Solla Price (1965)
- Bibliographic Coupling Between Scientific Papers; M. Kessler (1963)
- A dynamic look at a class of skew distributions. A model with Scientometric applications; A. Schubert & W. Glanzel(1984)
It is a bit humorous to note that of course academics were most interested in their own credit attribution problem, but as early as the 1970’s there is cross referencing between graphs of Firms internal networks and academic collaboration networks.
In the internet era this research arc has exploded as the data from social systems became more and more abundant. The field of network science, appeared as mathematicians, physicists, systems engineers, roboticists and computer scientists followed the data and migrated to computational social science. I find it particularly telling that many of the researchers who were working on non-linear dynamics and chaos in the 1990’s became network scientists as an underlying network structure can account for a lot of the unintuitive emergent behavior of large scale systems. To catch up on the modern methods and tools see textbooks:
- Social and Economic Networks; M. O. Jackson
- Network Science; A.L. Barabasi
- The Structure and Function of Complex Networks; M.E.J. Newman
To give due credit the field exploded around the turn of the century and its worth exploring the work of Duncan Watts, Steven Strogatz, Eva Tardos, Alex ‘Sandy’ Pentland and many more. Tardos for example wrote the chapter on Network formation games in the commonly used reference text Algorithmic Game Theory. As my research progresses, I plan to build out a broader set of references to aid the open source software community in accessing knowledge created by their academic counterparts.
The incredible thing about the open source software movement is, that unlike the academic collaboration network, the granularity of the data around open source software peer production is much higher. Here is an example of data collected using the Git and Github plugins in the SourceCred project:
The network above is a little bit hard to parse visually so here is an example of a smaller repo.
SourceCred allows maintainers to be real time game designers, but game design is hard and for an open source project lead there is a lot at stake. In theory, it’s nice to be able to manipulate the incentives but set the incentives poorly and people won’t respond as intended. Worse yet, manipulate the incentives too often or two much and contributors will lose faith in the project and depart for other opportunities.
Aside: What exactly are Network Effects Anyway?
At this point, I’d like to go on a short detour to review the concept of Network Effects, the seemingly magical phenomena that take software products and platforms from obscurity to ubiquity.
It turns out these effects are structural properties of network formation games with dynamics evolving over those networks. Specifically, they can occur when the process evolving on the network has positive externalities. When working with networks these are encoded relatively simply as benefits conferred to your neighbors in the graph caused by your actions, not theirs.
When I cite your previously published work confer credit to you; while ostensibly this is a by product of your past action, the immediate action that caused the increase in your reputation or social credit actually came from was taken by another actor. Furthermore, attribution networks are positive sum games in the sense that the action of citing or building on another’s open source work is creating value for both parties rather than competing for, trading or exchanging something as is the case in many economic game formulations. There may still be financial layers to these games that are conserved quantities, but we will discuss this later in the article.
The point here is that positive externalities in network formation games create incentives for new members to join the network and in turn joining creates additional incentives for more to join. Platform businesses benefit from this dynamic because the process on the network is a service market, transactions between service consumers and service providers have the dynamics where increasing service consumers incentivize more service providers and vice versa. It is also important to note that this dynamic cuts both directions, too few of either group, and the whole market breaks down.
In this article, I will set aside service market dynamics and continue to focus on attribution networks. For this case, the dynamic process of interest is PageRank as a measure of credit distribution, but we will augment this metric with a magnitude of credit being released as a function of accomplishments of the co-creative community. PageRank is a stochastic vector so if PageRank alone is the measure of credit, the game is ‘zero-sum’. However, by adding a magnitude or volume of credit being attributed proportionally to the PageRank vector, a ‘positive-sum’ game is realized by observing that coordinating effectively can be defined as growing the size of the pie.
Formalizing an Evolving Attribution Network
The field of network science uses formal mathematical representations of networks in order to combined formal theory with observed behavior. Below is a formal definition of a graph which is evolving in time through the addition of new nodes and edges. Networks are viewed as sets of nodes and edges which are monotonically increasing in magnitude since past contributions and their references do not go away, even if they are no longer being used or referenced by new work.
The graph evolution dynamics are one part of the state of the a complex dynamical system. I call this a ‘complex’ dynamical system because its state-space includes the network G, whereas traditionally a dynamical systems state-space is comprised of real-valued fields.
The evolution of this complex dynamical system may be viewed as a driven by random behavior modeled according to the existing literature on attribution networks and graph formation processes.
In our SourceCred models, there are two mechanisms which update the network:
- The arrival of a new contributor
- A contribution submitted by a current contributor
In this system model we distinguish between nodes which represent contributors and nodes which represent contributions. Once the network is updated, a Personalized PageRank may be computed with respect to any node in the network, returning a stochastic vector and a magnitude of credit may be assigned at that time. In order to make credit relevant over time, it is possible to disperse Cred as a result of milestones associated with project goals. In our computational experiments, there are no strong assumptions about how or why Cred is induced in the location (seed) and time, but this will come into play later.
Event Based Differential Games
In the next section, I will explore some results from a simulation organized according to the map above. However, this research is on an arc that will take it to analyzing this system as a ‘Differential’ Game, so I will first review the subject in minimally mathematical terms, and then describe the SourceCred Dynamic Network Formation Game before proceeding to the results of our preliminary computational experiments.
I specifically refer to these systems as event based differential games because the state of the system only matters when a player takes an action, but the actions themselves are asynchronous. Unlike traditional one-shot and evolution games, there are no ‘rounds’ of play with payoffs computed directly from the actions taken. Instead, the players objectives are private and state dependent; they may choose to take a legal action denoted u if and when they believe that this will increase their future utilities.
Being event based these games have the most in common with Hybrid Dynamical systems. “Optimal” Rational actors in this paradigm are stochastic optimal controllers which are solved using a method called dynamic program. In practice, these are computationally intensive methods. In robotics, approximations are used to stay within hardware limitations; in human decision making, the quality of a heuristic decision making policy can be judged in the same manner as approximation methods in robotics and control. Performance bounds are defined by much worse a policy performs than the optimal policy given any particular objective function.
It is important to note that the mechanism designer is only creating the mechanisms, even successfully incentivizing desired behavior in an instantaneous sense, can lead to undesirable emergent properties once we account for the temporal feedback.
In the differential games paradigm, it is computationally imprudent to implement the perfectly rational strategy so one cannot simply reduce the problem to characterizing Nash equilibria. However, control theoretic stability analysis is still valid for characterizing the tendencies of the system towards particular emergent properties. For further reading see:
- Overview of Concepts in the Jargon Party Article
- Textbook on Differential Game Theory
- Textbook on Hybrid Systems
- Economic Networks with Expanding State-spaces are introduced in this ICCS Paper.
There will be future publications expanding on mechanism design for according to the principles of these fields.
Open Source Credit Attribution as a Network Formation Game
Revisiting our map of the attribution network formation game, we introduce the strategic actions. We don’t eliminate the process that adds random contributors and contributions because there may always be some who are contributing for wholly different reasons and we do not want to over bias our conclusions too heavily by our assumptions about strategic actors. In general, it is best practice to explore a range of randomized conditions and strategic behavior polices.
SourceCred as defined essentially gives the maintainer the powers of a game designer. However, anyone who has worked in the gaming industry knows that balancing games is quite hard. There is no one magical setting such that everything always works out forever.
In an effort to empower open source communities, we are also creating the potential destroy the ecology that has made them so successful. It is important to also provide the guidance, and where possible tools to help maintainers be good stewards of their communities.
Taking as a given that the maintainer wishes to have a productive project and to generally guide the goals and values of that project. The game defined above allows the maintainer to define Cred as being released from a manually added seed nodes representing goals or value, and the volume of the release of that Cred may vary in time.
Specifically, suppose the maintainer release Cred in an event based manner according to some policy known to the contributors. For example 100 Cred issued for completing a milestone such as merging a pull request which resolves as specific issue. Rather than simply allocating the Cred to the contributor who wrote the code, the Cred is spread across the contributors.
This would be problematic if there was a fixed supply of Cred because collaborating would result in a smaller amount of Cred per contributor. However, when collaborating increases velocity and quality, more Cred will be release, and as mentioned earlier, the fixed sum nature of the PageRank vector is offset by increased volume in released Cred.
Controlling the inflow of Cred to be roughly proportional to the projects progress is comparatively easy. Challenges arise as the Cred may pool in a manner that doesn’t feel fair even if the individual allocations seem fair. This happens because network formation games tend to exhibit rich get richer phenomena. This is due to the structure of the game, and has been studied in much of the literature introduced in the first part of this article.
Some control on these dynamic is available through the parameter alpha, the choice of seed vector and the weighting heuristics as discussed in the Exploring Subjectivity in Algorithms article. However, these selections can have counter-intuitive effects even on simple reference graphs.
Ensuring that the incentives feel fair to contributors over time is even more challenging. A second order challenge is that challenging the parameters of the Cred release policies too often or too much can undermine the faith the contributors have in the incentive system. Blowback from players on game designers is not an uncommon story; one such event is part of Ethereum’s origin story.
Accounting for the maintainers intervention in the system as part of the system itself brings us into the domain of Second Order Cybernetics. Fortunately, our differential games representation is prepared for that leap, characterizing the maintainers also has players in the game, simply with different objectives. From this frame, one can analyze the balance of power of the roles and help steer the system towards desirable Schelling point where both parties are behaving in manner that is both expected and appreciated. It is worthwhile noting that when Schelling introduced what he called ‘Focal Points’ they were in the example was a positive-sum game.
The current status of computational mechanism design work in SourceCred is a model system with only random behaviors to gain intuition. The simulations in this section share their construction with probabilistic graph generator models analyzed by Barabasi and others. Below is a snippet of python code representing a complex differential equation.
In order to demonstrate the growth process of the attribution network, consider the Network updates at two points in the process:
These stochastic dynamic graph evolution process is continued for 50 iterations; resulting in a rich interconnected network with statistical properties associated with the body of collaboration network research cited above. A summary of the network growth is shown below:
However, what is really interesting is the way in which Cred changes over the course of the network’s evolution. Networks are discrete objects, so even though the PageRank itself is continuously valued computing PageRank over an evolving graph results in jerky scores.
It is unsurprising that the oldest contributions tend to retain the highest scores, but it is also interesting to note that the early contributors scores are actually declining to ‘make room’ for the new contributors because the Cred vectors sum at each time t sum to one.
The jerky dynamics are resolved by introducing a momentum term in the Cred update. Smoothed Cred is precisely the exponentially smoothed version of the PageRank, aka SnapShot Cred in the previous plot. Like in the previous plot, the early contributors remain at the top of the leaderboard, but their Cred is going down in time to ‘make room’ for the new contributors.
When Cred is assigned according to PageRank or smoothed PageRank, there is still a perceived zero-sum game. In order to combat this perception, the cumulative Cred which is the sum of the PageRank is introduced.
The resolves the effect of seeing ones Cred decrease as a result of other contributors efforts but it introduces an even strong favorability for early contributors when the Cred is introduced to the system uniformly. There is a delicate balance between having earlier contributors get too much credit as a result of being early, and recognizing that later work may be legitimately dependent on early contributions.
To handle this in a nuanced way, the current expectation is that Cred will be injected in different volumes, in different places in the graph over time, to ensure that Cred continuing to accumulate at early contributions does so because it continues to enable ongoing milestones, not simply through accumulation over over time.
- The Computational Experiments can be run as Monte Carlo experiments in order to extract the statistical properties of the Attribution Network Process over time.
- We can continue to extend our models to explore the implications of allocating funds via bounties distributed proportionally to Cred. As discussed above, in the short term this becomes a zero sum game, but over the long term, it is not because the totals funds injected will depend on some externally determined value of the output to the outside world. Multi-time scale considerations are critical to understand any particular maintainer strategy.
- Extend the simulation to include strategic behavior first on the part of the contributors and later on the part of the maintainers. With help from software engineering and data science resources, eventually plan to package these simulations up into flight simulators that help maintainers make inform decisions despite the complexities embedded in Network Formation Games.
Call for Contributors
SourceCred is an open source project actively seeking more engineers and researchers to join.
- The Production codebase can be found at https://github.com/sourcecred/sourcecred
- The computational experiments and data science work is in
- The ongoing discussions regard goals, values, narrative and design is in https://discourse.sourcecred.io/
- Any way to meet the contributors is to join our discord channel.
- Follow SourceCred on Twitter for more progress updates.
Thanks to Dandelion Mané, SourceCred founder and project lead for over a year of deep work to build the SourceCred software as it is today, as well as lots of ongoing work extending the project and guiding its values.
Other contributors to this research and discussions are its implications include Ryan Morton, David Sisson, Tyler Mace, Max Kudinov and Stephen Zargham. As always, thanks to BlockScience team for their operational support and the ongoing refinement and application of the methods and tools discussed during this article.
SourceCred is an open-source technology. We are grateful to Protocol Labs for funding and support.