Prioritizing trust for SWEs

Talent in source code

Since I began my posts on Medium, I keep finding myself coming back to how building trust among engineers will be the critical piece of our service.

The building of customer trust is the standard and definition of what a “Good” business should be. For technology companies, its importance has grown even substantially higher in the Internet age and app economy. Almost everyday we are bombarded with news headlines of a breach in public trust from a brand we utilize and admire to have our best interests put in the fore front, only to find out it’s been compromised illegally or riskily leveraged for purposes of monetization.

Snapchat’s Snap Map feature tracks the location of those using the app has caused concerns by police and children’s advocacy groups worried about the safety of younger users and Uber’s recent accusation of tracking customers even after users have deleted their app have caught the eye of Congress and federal regulators due to the serious privacy implications it imposes; come top of mind.

I believe that trustworthiness for technology companies does not start at the policy level. Yes, there should be regulation and transparency here but I feel there is also much that can be done at the user level that will honor a user’s expectations about trust before engagement happens. I touched on this a bit in my last post “A hash of your proprietary code”. Here I stressed the mechanisms we put forth at sourcerer to ensure maximum transparency with our audience. Things such as having our sourcerer app be open source so that the community can scrutinize what we are doing under the hood and by putting in opt-in controls at the user level in an effort to give maximum control to users over the data we’re hoping to analyze.

Trustworthiness for SWEs

While the above focuses specifically on ensuring privacy and transparency in the trust relationship at sourcerer, I also wanted to speak a little about the data accuracy and it’s interpretation in this tust relationship. This is essential to us, given that we are creating intelligent profiles from the data that software engineers allow us to analyze.

Given this responsibility we focus on a couple things:

1. Everything we analyze and interpret is directly from a SWE’s project contributions; we are not looking at or analyzing any 3rd party sources.

2. Quality commits — Or “Qommits” as we are calling them, is a standard of measurement we are defining into our product. At it’s core, we believe that not all commits are created equal and want to differentiate what we believe to be trivial adjustments such as a single line text change from more meaty contributions like a custom refinement to the company’s search engine algorithm.

What a qommit is exactly, is still TBD, but an idea that seems productive, is to use a neural network trained to differentiate ‘interesting’ commits from ‘non-interesting’ commits. For example, a sign that a commit is ‘interesting’ could be a lively discussion in a respectable public repository on its pull request. Another approach would be to try and train a classifier model that aims to determine if a commit should belong to one of the high quality public repositories. Another useful part of what makes a qommit is boilerplate detection: there is not much value in a commit that was auto-generated by Django or Visual Studio.

3. Avoid making assumptions — One belief of ours, is that it’s very important for us not to make assumptions that would define a user that may not be true. This is explained well in one feature that we are currently working on, the LTV (life time value) of a line of code. This “longevity” can act as a proxy for quality. The longer a line of code has been present in said repo the value of that line can be considered higher than another line that has lived there only shortly.

However, this of course is a broad thesis we don’t want to assume for every line of code an engineer writes. For example, refactoring at an early stage startup is not only common but also a preferred practice when trying to accommodate users in a scaling system or that there exists very different work styles for different engineering group types. For instance, a frontend engineer often deals with regular updates to the UI/UX while backend engineers, if things are architected properly are often presented with far fewer changes.

Keeping the above in mind makes the information architecture for this specific feature vitally important. Presenting the data precisely so that the right inferences can be made is what we are after. For example, the average lifetime value alone isn’t helpful but when you include it along with the team size, lifetime of the repo, and the said engineers specific role we hope will infer the right meaning.

These are a few of the challenges we are addressing in our product here at sourcerer and as I mentioned in my last post, this type of transparency and openness is essential to providing the best service we can for software engineers.


More to come and I hope you can continue to follow me on this journey of mine.