A hash of your proprietary code

As I continue my thought experiment, gaining a developer’s trust to analyze their proprietary source code is an obvious challenge that I’ve been spending a lot of time thinking through.
There are a couple of key points I believe needs to be addressed in order for Sourcerer to be trust worthy enough:
First, analyzing proprietary source code needs to happen locally. At no point should any service including ours lift a working code base from your machine. Your code is yours or your company’s intellectual property. We do NOT want to put ourselves in any situation where this could be compromised, not only for us to avoid litigation but especially to protect the best interests of our SWEs.
Second, we believe in being fully transparent with our technology. Specifically, our source code analyzer will be open source. Despite the fact we are only extracting and processing certain signals that will give us optics into who you are, we also want to make it available for users to see and critique how we are doing this directly from under-the-hood.
Early thinking with our analyzer is to start with quick and dirty fact extractions such as listing the languages an engineer uses. This is trivial: a filename that ends with .py means python. From here, we can move to actually parsing commits and looking at library usage and attribute experience based on the 3rd party library calls. So, intuitively, if an engineer makes a call to calcOpticalFlowPyrLK() in OpenCV, they probably know a thing or two about the optical flow. Further still, we can compare commits to those in open source repositories, and guess the purpose of the commit based on this similarity.
Another interesting source of information is in the implicit graph that links co-workers. For example, if we look at how long an engineer’s code lives before being overwritten by their teammates, we can make a guess about how solid the code is.
Another idea along the same line is whose code do you usually edit: yours or somebody else’s? Yet another one: do people in your team import your modules?
Third, we believe that a software engineer should always be in full control. What this means is that after our “sourcerer” app has analyzed your code repository, you not only will have a list presented of the data we are looking at but also the opportunity to continue with the creation/update of your sourcerer profile or stop the process before this information goes to the cloud. You are in control.
Finally, we will always have an open and engaged feedback loop for users to access. It personally frustrates me when I’m seeking answers to a question and I fail to get a prompt intelligent response. I am looking to have a maintained Slack channel or some sort of ticketing system to funnel feedback and responses from our users.

I believe trustworthiness in technology is a complex puzzle that involves both transparency and thoughtful user level architecture and design. The first step of coarse is awareness in it’s prioritization.
So in conclusion, think of Sourcerer as a hash of your proprietary code that digs deep, analyzes your technologies, habits, and preferences to show the world who you are as a software engineer.
More to come and I hope you can continue to follow me on this journey of mine.
