DRE: Developer Reputation Estimator

Andrey Karnauch
ASE Conference
Published in
5 min readNov 6, 2019

The growth of the Free/Libre Open Source Software (FLOSS) ecosystem has provided developers with millions of projects to choose from in terms of contribution. As a developer, this comes as both a blessing and curse.

On one hand, we have the freedom to stumble across a project on a platform such as GitHub, become engrossed in it for a week, make a contribution, and then move on to the next project that grabs our attention. This contribution becomes a footprint in our overall journey and experience as a developer.

On the other hand, we may run into projects where our fascination once again fuels our work, but the act of contributing is not as simple and often met with barriers. These projects tend to be more structured, a hierarchy of developers connected together through the core developers, who take precautions as to not jeopardize the structure, quality, etc of the project they have put substantial effort into. As a first-time contributor, it would be helpful to have a way to present our reputation as a developer — all those footprints we have left behind — to the core developers to help reduce that barrier to contribution.

The above scenario is just one example of how a your reputation as a developer can aid in making decisions. However, the use cases extend far beyond just code contributions and can touch on aspects such as employers seeking out developers with a certain reputation, backing up your resume claims, advertising your expertise to the FLOSS community, etc.

To enable these opportunities, we developed DRE: Developer Reputation Estimator, a tool that aggregates your contributions across several open source platforms and generates your developer profile highlighting your experience, social network, and the impact of your work. Together, these measures serve as a summary of your overall reputation as a developer and can be made accessible for others to view.

The Data

DRE’s functionality is enabled by the World of Code (WoC) infrastructure. The WoC dataset has over 34 million developer identities used in over 1.6 billion commits gathered from over 73 million non-forked repositories. Furthermore, this data is collected across multiple open source Git platforms, including GitHub, SourceForge, and Bitbucket.

Although WoC has over 34M developer identities, it does not mean we have access to over 34M developers. Instead, when git commits are made, each commit is associated with an author or developer identity that is stored in the commit’s header. This identity changes depending on which machine the developer is using, the email and name specified for git to use on that machine, etc (e.g. John Doe <john@doe.org> and J. Doe <jd@doe.org>). As a result, the first step in generating a developer profile consists of the user searching and selecting all developer identities that they believe belong to them. This way, the tool identifies all your contributions, rather than just contributions made using one author identity on your laptop from 5 years ago.

The Tool

After this identity selection stage, all you have to do is…wait. DRE’s backend performs calculations upon the massive WoC dataset described earlier, traversing millions of projects, blobs, developers, and commits to identify and expand upon the ones you are involved with. The more contributions you have made, along with the size of the projects you have worked on, the longer these calculations can take (several hours). Your patience, however, is rewarded invaluably with your own developer profile containing the following insights:

Your experience

We give an overview of your experience in the FLOSS ecosystem with basic statistics including:

  • Total number of commits you have made
  • Total number of projects you have worked on
  • Total number of people you have collaborated with
  • Total number of files you have modified
Experience: The start of your developer profile

A deeper dive into your projects…

Rather than just providing a count of the projects you have worked on, DRE also displays the actual names (and provides links to) said projects. Furthermore, an overview of your commits vs. total commits for each project is shown. To others, this can reveal not only which projects you have contributed to but also to what level you contributed.

Projects: A deeper look at your project work

What do you like to code in?

DRE analyzes the file extensions of all the files you have modified in commits and uses that to show your coding language breakdown. This gives you the opportunity to show off your expertise in a particular language or back up claims on your resume about being proficient in C.

P.S. I swear I don’t like Javascript that much, its frameworks and NPM just love to generate the most number of files possible per action.

Coding Language Breakdown

Torvalds Index

DRE also provides a term that we coined as the Torvalds Index. Like Erdős number, it is the shortest path, with collaborators as nodes and projects as edges, to Linus Torvalds, the creator and developer of the Linux kernel. Apart from being cute, this index allows you to measure your position in the social network graph defined by people you collaborate with. It can provide further insight about your collaborators, revealing projects or persons that may be of interest.

Torvalds Index: The path of collaborators between myself and Linus
Torvalds Index: Projects appear as edges on mouseover

What impact do your contributions have?

You might find yourself on top of the world with a small Torvalds Index, but to what degree do you actually matter? (in terms of your contributions, of course) DRE provides more insight into your contributions through a measure called blob duplication. This is essentially a measure of how many other developers have used blobs (files) that you created without any modification. How many people have copied your source code files for their own projects? DRE lists these blobs that you have created along with their duplication count, how many child commits each one has, and it lists the people you have worked with (collaborators) that have used it before in their own commits.

Blob Duplication: The list scrolls to the right to reveal collaborators using the blob (not captured in image)

In-Depth Walkthrough

For a full walkthrough of the tool, starting from signup to viewing the developer profile, please watch the video below.

This blog post summarizes the paper DRE: Developer Reputation Estimator, accepted to the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE 2019) for the Demonstrations Track.

--

--