Worst delivery metrics

Published in

Zedge Engineering

8 min readDec 16, 2020

Every managerial textbook tells “if you can’t measure it — you can’t manage it”. And yes — it is a normal practice to keep measuring the results in order to define the success. It’s hard to imagine a company where everyone is working there just because it’s fun but business metrics tell that the company is about to go bankrupt.

The art of measuring success becomes more complex, when we want to identify the performance of a particular function inside that company, in our case — software engineering teams.

Below I’m listing the worst I’ve seen. Have you seen even worse? Please do comment or message me, I’ll definitely add it to the list!

Let’s dive in!

Hours spent per task

Some of the best engineers (know-how wise) I know like to go that extra mile to do things in a way that they feel is best and is not necessarily required by product managers. They will for sure spend more hours on a task than a colleague who doesn’t care about scalability (how easy will it be for his/her team-mate to add another feature on top of it’s code) or who doesn’t write tests because nobody asked him/her to. It all depends on a use case, but if you are working on a long-term product, you will likely be more agile and speedy by having order in your code that is able to scale.

This metric might work for extreme cases, where you have a lazy colleague who spends whole days near coffee machine and Netflix and is doing everything longer than colleagues. But if you know your team is putting the effort, this metric doesn’t tell much.

Why did Peter produce more Lines of Code (LoC) than John??? (LoC produced per developer / team)

At Zedge, from time to time we like to boast with a screenshot after refactoring code, where less LoC were left rather than added. In reality, the less code you have, the less maintenance is needed, the less complexity everyone needs to manage. Producing too much code for simple things might mean your code is not clean enough (recommended reading to tackle this problem— https://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882).

A contrary example would be least amount of LoC added — again, that may lead to cryptic code (e.g. having very smart regex lines or magic 1-liners that needs 10 minutes to process for a human brain) or overusing of “convention over configuration” practices, which might mean too many hidden blackboxes that were likely not covered with tests in your software.

Why did Peter deliver more Story Points (SP) than John if they are in the same team??? (SP delivered per developer)

There is a type of developers, to whom everyone in the team comes to consult on how to do things. They usually have 1 task assigned to them in the jira board, but in reality they are working on everything. It’s even likely they will close only that 1 task during the sprint and colleagues will close more. A rhetorical question — how many SPs did this developer complete?

This was an extreme example, but in reality, in good teams people go to each other, consult, reiterate, consult again, create a merge request, consult again. And some people take more complex tasks than others do during the sprint, which means that predicted estimates by the team were also more vague.

Why did team Alpha make 15 bugs and team Gamma only 5 ??? (Amount of bugs produced per developer / team)

This one is one of my favourites. I have heard of companies who paid bonuses to their QA Engineers per bugs found. As a result, they did barter agreements with their dev colleagues, so that they produce known bugs and share the cut. Sounds like a perfect organisation!

So in reality, there are bugs that you know about, there are bugs that your QA will find, there are bugs that your customers will find and there are bugs that nobody will ever find because that piece of software isn’t used.

Which means that the amount of bugs produced will depend on:

whether you have a QA team/person or not
how often and by how many different people the produced piece of software is used
how thoroughly different features in the software is used

If you compare a team that is building a B2C product used by millions versus another one, which is used only once a year (e.g. elections software) or an internal tool, of course you will find more bugs sooner in the first case.

Why Sonar is always green at frontend team??? (SonarQube results per developer / team)

So Sonar is definitely useful if maintained well and rules are updated according to how teams work and internal agreements in the team.

But the problem arises if you start comparing teams that are using different tech stacks: there are more advanced rules in statically typed languages than in dynamic ones, or even among statically typed languages the amount of rules and their strictness may differ. Which means you may end up comparing apples and oranges.

Moreover, things like Sonar are usually quickly forgotten once the deadline suddenly becomes sooner and you have 2 weeks to complete things. And if you think deadlines are always not met because of software engineers — please rethink that again.

If Sonar is a part of your CI build, critical issues will likely be solved in advance. Then I don’t know what’s the point of connecting it to appraisal reviews or even monitoring how many issues were created BEFORE merging the merge request (MR), because the issues will be solved during it (the build will fail, right?).

Why is everyone commenting so much on John’s Merge Requests (MRs) are and Peter’s MRs are merged immediately??? (Number of comments per Merge (Pull) Request per developer)

Yes it’s quite often that if MRs are hanging for weeks without being merged and there are tons of comments from other team members (or even members outside the team) about how bad something is and how things must be refactored.

BUT, there are at least three more use cases, when I expect a bigger amount of comments in MRs to be normal (and even needed!):

The team is new (they need to discuss on ground rules of coding)
There is a new team member (s/he didn’t know about unwritten ground rules of the team)
Someone is introducing something controversial (e.g. wants to introduce a new library or a new pattern than wasn’t used before)

That burn-down chart didn’t break!!! (Scrum burn-down charts)

I feel the hype of agile and scrum is starting to normalise and many organisations are getting into a love/hate relationship with it. It’s like democracy — there are many flaws, but please point to a better methodology, if you know one!

There are many reasons why that line didn’t go down, some of them are worth attention while the others may create overhead that doesn’t add too much value to your organisation:

QA got all stories in the last day of sprint so couldn’t complete it on time (that’s a classical case, definitely worth fixing)
You have just launched new software update to production and a lot of bugs began popping up (could you predict there will be bugs? Yes you could. But could you predict the exact amount? And could you predict how many SP a bug will take to solve before you know the details to be able to remove user stories worth the same amount of SP? I highly doubt.)
External factors: unplanned meetings in the company, a sudden change of requirements, a sudden change in priorities, someone went sick. These things do happen and they might impact the burn-down chart. The teams shouldn’t feel bad that the graph doesn’t look nice because of this.

Although I do look at it myself and it’s a good metric to look at, it doesn’t give you the right picture of what really happened. That’s why I don’t believe requiring teams to break that graph every sprint makes much sense, because you may end up spending too much time on “scrumming” (devs switching context because PO/SM wants to change something and create a meeting) rather than solving real issues.

So what metrics to use???

at the time of writing, I tend to look at many of the worst metric I described in this post, what a plot-twist!

Probably you have read this enough times — “there are no silver bullets”. And that’s true. I believe, that all metrics are good as long as we look into them in a smart way: let’s quickly think about the most popular metric today — new cases of C19 per day. Although it for, if the number goes up or down drastically, it does tell something. But if you look into two different countries (same as you would view into two different teams), you may not know that one country is testing all inhabitants randomly whereas the other — only those who have symptoms. Or one country does twice more tests per day for the same amount of inhabitants compared to the other (a parallel comparison — one software engineer keeps focusing on bugs which are often impossible to estimate, while the other keeps baking standard features which are very similar every sprint).

The topic itself is definitely worth another blog post (and I will write it for sure), though to give a quick glimpse — at the time of writing, I tend to look at many of the worst metric I described in this post, what a plot-twist! BUT to me they aren’t really important and I only look at them to indicate if anything is going wrong in general or how processes differ in different teams.

I’d highlight two things:

Try focusing more on high level, e.g. satisfaction of your customers or self-assess productivity. Every engineer is an interface between a particular customer (it might be an external user of your product or another department in the organisation) and the software. Whereas self-assessed productivity might sound subjective, but in reality it may cover many more things that you may not have thought about.
Focus on having as minimum “blockers” as possible. I do believe that Circle CI have highlighted very reasonable metrics that do matter, kudos for them: https://www2.circleci.com/rs/485-ZMH-626/images/5-Key-Metrics-Engineering.pdf. Note, that every metric they highlighted is focused towards “what is stopping us” rather than “how many hours did we put” or “how many bugs did we produce”.

Summary

Be very careful when trying to tie engineering metrics to appraisal reviews or giving feedback to your employees, because it may lead people focusing on wrong things. Instead, try focusing on things that really matter and are easy to measure when you work in R&D, such as productivity of the teams and customer satisfaction.