Supercharge Your Team with Software Engineering Analytics

Julius Uy
Big O(n) Development
8 min readAug 1, 2021

Ronny Kohavi, Head of Data and Experimentation at Microsoft, AirBnB, and Amazon revealed that internal findings has shown that 60–90% of actions fueled by intuition does not create additional value and worse, has negative value.

That’s HUGE!

As much as we want to paint ourselves as the epitome of intelligence, our limited cognitive radius betrays the confidence we exude. We know very little of anything there is to know about everything. The lack of epistemic humility in the professional world has caused so much problems in the workplace that studies have shown time and again that psychological safety is far and ahead the most important attribute indicative of a highly functioning organization.

Of course, despite everything that has been said, right and wrong decisions are made all the time. However, successful organization tend to generate better decision quality in the optimal amount of time. In the age of information, data is your best friend. Data, not intuitions, must inform decisions. Hence, organizations must always be conscious around improving two important elements: the time to make decisions and the correctness of the decisions.

I currently run the Software Engineering department in a Singapore startup. In one year, we have roughly doubled our headcount. As we continue to expand, it likewise became apparent that we need to invest in a scalable approach in productivity management to keep the business running. 1-on-1s and retrospectives just won’t cut it anymore.

Hence, we recently embarked on an initiative to do proper Software Engineering Analytics and boy did we find some very interesting data. In this article, I shall share some of the things I thought would be helpful to every engineering leader. (The technical details will come elsewhere)

Amount of Commits Outside Work Hours

A research by Gallup has shown that companies with highly engaged workforce outperform their peers by as much as 147% earnings per share. For context, just by engaging employees, a company can enjoy as much as 1.47 extra person for free for every 1 person! Most companies have the bulk of their operating expense on staff cost. Imagine how much savings they can make by just treating people better. Now what causes drop in employee engagement? A Microsoft Research has found that 20% of overtime per week leads to 1.6x higher disengagement rate.

Percentage of commits outside work hours

The graph above shows the density of code commits inside and outside work hours. The more commits made outside of working hours, the more likely engineers will be burned out. Looking at the graph, one can see that over a period of 2 years, the amount of code committed outside working hours increased from 5% to 15% (a 3x increase!) This indicates that if the trend continues, the organization is putting itself at a serious risk in two years time.¹

Bus Factor

In Software Engineering, one of the considerations to watch out for is the bus factor. The Bus Factor is the number of engineers that has to be ran over by a bus for the project to reach a screeching halt. To reduce the risk to the company, the manager has to create as high a number of bus factor as possible so that even when a person resigns, gets sick, and so forth, the wheels can continue to run.

Bus Factor Analytics

In the above metric, it shows various projects with various software engineers working on each of them. The higher the delta between the contributors, the more at risk the project is. Consider Project 1. The project is considered low risk because John, Sheryl, and Sandy are all still in the company and they have so much commits in the code that the amount of context they have is quite high. So even if one of them leaves, the others can still cover for the loss.

Now consider Project 4. If Dawson leaves, Pradeep and Vishal will be scrambling. As the engineering manager, one then has to de-risk the project by delegating Dawson’s responsibilities to Pradeep and Vishal. This is so that over time, even if Dawson leaves, the other two can limit the blast radius of his departure.

Consider Project 5 however where both Enrique and Carl have already left the company. A significant knowledge decay is in full swing. Pragati is the only one left in the team and when she leaves, there’s not even one engineer who can sandbag the departure. Moreover, her total commits is less than half of both of them. This means that if something goes wrong in an area covered by Enrique and Carl, Pragati will be thrown into a lot of stress and hence risk burning out. The death spiral on the project thus continues. This is very risky for the company and the manager has to make sure to put in another engineer or two to cover.

Burn out Risk

Burn out is a real issue that is exacerbated by COVID-19. One research for example shows that 52% of survey respondents are experiencing burnout in 2021, up from the 43% pre-COVID. A healthy professional life requires a good proportion of time spent working and resting. However, if an engineer is constantly working outside office hours, it can be a burnout risk.

The amount of days each engineer worked outside office hours

We measure this by the total number of days a commit is made outside work hours. In the illustration above, John spent 40 out of 67 working days committing outside work hours. Sheryl and Sandy are also not far behind. This indicates that if the cadence continues, they will perform worse, projects will be further delayed, and they are all likely to be flight risks. There is no magic number to the total number of times a person works outside office hours. However, these serve as signals for the company to course correct to protect its employees’ mental health.

Commit Size

A study done on a Cisco Team has shown that as the lines of code increases, the density of defects found decreases.

Smartbear study on Defects Density

The suggested lines of code for a review hence should be somewhere along the lines of 200–250. Anything more than that, the reviewer will tend to simply skim through the changes and not point out anything substantial. Hence, it is important to nudge engineers to the right direction. For example, a weekly data on their commit size shows how far they are from the magic number.

Lines of Code per Commit

In the table above, one can see that Anthony, Jasper and Anand are doing quite well. On the other hand, Janise, Bella and Dawson seems to have more tendencies to go beyond the 250 lines of code limit. This data can certainly help improve code quality and increase developer happiness.²

Value Stream Analytics

In running a Software Engineering organization, one of the major role of the manager is to minimize bottlenecks. To do so, he needs to understand the entire value stream from code creation to deployment. Scrum teams call this cycle time. DORA calls this change lead time. Either way, they mean the same.

The graph above shows the number of days it takes for the first commit to a pull request; The number of days it takes for the pull request to be merged; and the number of days for it to be deployed to production. Ideally, the smaller the number, the better they are. It means that the customer can enjoy the benefits of the code in the shortest time possible.

Simply by observing the graph, one can quickly notice that the amount of time it takes for the pull request to be merged is going up over time. That is despite the fact that the other two variables remain constant. This certainly indicates a bottleneck in the PR review. It could be that the senior engineers reviewing the code are overwhelmed and so on. Notice that these kinds of problems cannot be actionable from feedbacks in restrospectives and 1-on-1s. The only way this piece of information can be reliably monitored and acted on is a graph like this.

Social Graph

One research has shown that negativity in the workplace can dampen productivity by as much as 30 to 40%. Another research has shown that for a relationship to reach optimal efficacy, the ratio between positive to negative interactions has to be 5.625 is to 1. Being able to monitor the engineers’ social interactions can identify various levels of toxicity.

This can be done using a social graph. Based on various interactions in PR activities such as the toxicity of conversations, the amount of comments created, the length of time taken to merge each other’s PRs, the density of interaction they have with each other, and so forth, the manager can help find out who among the engineers tend to have good working relationships with the team and what kind of action he has to take on others.

In the graph above, we can observe that John is doing well and Suzy is not too far behind. Roy on the other hand is really ruffling feathers. That said, even if Roy is deemed to be a “high” performer, the manager can use this as a signal to take action. If Roy is pushing the team down, then he might as well be shown the door. One research has shown that it takes 2.4 superstar employees to neutralize the damage created by a toxic employee.

Conclusion

These are some examples of what can be achieved by having good Software Engineering Analytics. Of course, The DevOps movement has been pushing for other metrics such as The Four Key Accelerate Metrics, which is the most researched metrics known today. That said, there are many more that can be done. What you see above is simply the tip of the iceberg.

Happy leading!

If you enjoyed this article, I’d be happy to connect with you on LinkedIn. I also run a non-profit organization called Big O(n) Development.

____

¹ It must be noted that highly engaged employees also tend to gladly donate their personal time to support the business. This data point must be used in conjunction with employee pulse survey among other things to assess employee engagement.

² In an ideal world, a software engineer tend to want to review things that falls within his epistemic radius.

--

--

Julius Uy
Big O(n) Development

Head of Technology at SMRT. ex-CTO here ex-CTO there. On some days, I'm also a six year old circus monkey.