Do your git blames show your technical skills? Yes and No.

Generating tech-stacks based on git blames — an experiment

Jonas Peeck
Axel Springer Tech
Published in
6 min readDec 29, 2022

--

> TL;DR: Today we’re releasing two POC Open Source projects:
sklls-cli — sklls is a simple CLI tool that aggregates git-blames by committer and generates line-counts for every committer based on file-extension and NPM dependencies.

codecrew — codecrew makes it easy to discover the software developers inside your Github organization based on techstack

— — — —

In 2022 we set out to try and answer one simple question: Can you tell somebody’s tech-stack based on their git-blames?

The purpose of this experiment: Try to create a data-driven way of connecting software developers at Axel Springer based on their tech-stack.

Did it work?

Yes and no — but the reason is not what you think ;)

The idea

No matter if you work in frontend, backend, devops, fullstack, mobile or data & AI — we all produce the same work output: Code ❤

And no matter if you use Github, Bitbucket or Gitlab, they all have one thing in common: Version Control with git. So every line of every piece of code ever written can be accurately attributed to one person inside of that Github / Gitlab / Bitbucket organization in the form of git-blames.

So the idea was simple: How can we turn git-blames into tech-usage profiles that can roughly tell how much somebody uses specific technologies?

Hey Siri, find me someone who uses redux-mock-store v1.5.4

As part of our experiment, we created two pieces of technology which we’re open sourcing today.

One of them is sklls-cli, a go based CLI that converts git-blames into line-counts for each contributor of a repository, so that it becomes easy to figure out what kind of files they contributed most to (e.g. .tsx, .py, .js etc.) and which NPM libraries were used most often by that person.

The result was stunning: All of the sudden we could actually find people who were using specific NPM libraries — in the example below you can see the search results inside codecrew when typing in “redux”.

Exactly what we wanted to achieve with our experiment: Create the ability to connect with specific people based on a shared use of a specific technology.

codecrew is the other project we’re open-sourcing today — a visualization layer for the data we generated with sklls-cli. Together with the data created by the sklls-cli, and the Github API, we wanted to create a place where all members of a Github organization can find each other — based on tech-stack.

codecrew is a completely backendless approach. It’s a statically built website, hosted on (private) Github pages, automatically updated through Github Actions and fed through the Github API.

It was all coming together at that point. The right data, meeting the right kind of visualization to be useful, and then came a huge buzzkill….from Github.

Screenshot from the codecrew search box — finding developers based on NPM dependencies

Github’s missing verification of usernames & emails

So we created a tool that converts git-blames into technical profiles and a UI that combined that data with data from the Github API to create a self-updating team-directory for Github organizations.

Sounds pretty damn great right?

Yes in theory. But in practise there was no good way to actually map the git-blames we found to the actual Github users of a Github organization. And it’s not git’s fault — it’s Github’s fault for not verifying the usernames & emails used in git commits.

Case in point: Pretending to be Dan Abramov on Github is as simple as entering his username and the public email from his Github profile in your git commit config and pushing your code.

Pretending to be Dan Abramov on Github is as simple as that 👆🏻
Git commit after pushing it to github

And just like that one of JavaScript’s most prolific and best known Developers has contributed a commit to our little project.

Don’t believe me? Check out the live-commit below and be sure to actually click on the “gearon” username to verify that — yes — for all Github knows and makes you believe, this is the actual Dan Abramov (who still works for Meta and not Axel Springer) who kindly contributed a commit to our project:
https://github.com/axelspringer/sklls-cli/commit/26f7be7f3bb03689bd0e62314c874d2b7cedd9e5

Not being able to map emails ruined our experiment

As silly as it sounds, but this is where the experiment fell flat.

We wanted to create this beautiful, auto-updating team-directory of everyone’s dreams — a place where software developers find the other software-developer-needle-in-the-haystack-of-commits who uses the same esoteric NPM dependency as they do, that now decided to introduce breaking changes and a need to find someone to talk migration strategy with.

From the looks of it, we did succeed in creating the perfect team-directory for software developers:

Demo of codecrew in action

But obviously this fun & engaging UI is powered not by the git commits — it’s powered by Github’s API spitting out pretty profile pictures & usernames, links to Github profiles and what team someone belongs to.

And that’s where our experiment broke down: We simply couldn’t map git commits to Github API well enough to be able to deliver a quality experience or even close to a complete coverage of all of the members of our Github Org. In our experience, most software developers in our Github organization actually ended up using 5–6 different email addresses (myself included).

Given the fact that every Github user only has one company email address linked to their account (through the SSO link), we could only get mediocre mapping results, rendering the wealth of git-blames we could aggregate fairly useless to the final product (the visual team-directory).

So to conclude the experiment and this article’s headline:

Do git-blames show your technical skills? No they don’t!

…they just show quantity.

That’s another thing we (obviously) discovered: Counting lines as a proxy for actual skill obviously is as accurate as counting a screen design by how many pixels it has. Line-counts are good as a proxy, but not as an actual proof of skills (not that that was ever out goal).

So no, technically git-blames don’t show your technical skills, they somewhat hint at it. Especially if you take into account our mapping inaccuracies.

And yet they kind of do

Still: We found that the profiles were fairly accurate. Many people had a clear focus (e.g. frontend or backend) just by looking at the file extensions they most committed to.

Tell me you’re frontend without telling me you’re frontend (Screenshot from codecrew)

So we decided to Open Source it instead

In the last couple of years in my role as Intrapreneur Global Community Development at Axel Springer, I was given the chance to build and test out different ways to connect our global tech community.

Creating the sklls-cli and codecrew was another super interesting approach that we tried out, but turned out to be too flawed in the end of the more than just a proof of concept.

Still, the data we were able to draw from git-blames using the sklls-cli was super interesting — and perhaps somebody else can figure out a better way to connect that data to Github profiles. If you’d like to try — have at it!

Today, we’re open-sourcing both sklls-cli and codecrew. Check them out below & happy hacking!

sklls-cli — sklls is a simple CLI tool that aggregates git-blames by committer and generates line-counts for every committer based on file-extension and NPM dependencies.

codecrew — codecrew makes it easy to discover the software developers inside your Github organization based on techstack

Jonas Peeck is the founder of uncloud —the first cloud platform that configures itself based on your code — and has previously worked at Axel Springer as an Intrapreneur for Global (Tech) Community Development.

Connect with me on LinkedIn
Check out uncloud

--

--

Jonas Peeck
Axel Springer Tech

Founder of uncloud - the first cloud platform that configures itself