The Metrics of Code
How to Measure The Performance of a Coding Team?
During the last few years, I was a manager of coders and had to evaluate the performance of my teammates. It always ended with feelings, evaluation of soft skills. Opinions.
In this article, I want to only focus on objective metrics of technical skills and their measurable indicators. Facts rather than opinions. Not because soft skills (team spirit, mentorship, and so on) are useless, they probably are as important as the technical skills, but it is not today’s point. Also note that the below indicators are good to evaluate the team performance and not individuals, though they could probably be adapted.
Production — How Much Do You Produce?
An obvious indicator when evaluating how much we produce is the number of lines of code. A very common indicator for this is KLOC (Thousands of Lines Of Code). If you are coding in python on a Linux or Mac OS system, it can be easily computed with the following command
find . -name "*.py" | xargs wc -l
In this command, we have decided to count only python code and then exclude html, css, json, config files of all kinds. Those languages can be very verbose. It is up to you to adapt it and count what is relevant to your project.
If your team uses a bdd (Behavior Driven Development) framework like behave or pytest-bdd — which is a brilliant idea, then you might want to count the number of feature files more than lines. That would be a good motivation to write down those tests also for the developers!
Quality — How Good Do You Produce?
If you read this article I published recently, then you already know I use pylint for linting. When you run pylint, it returns a score out of ten and comments on how to improve your code. I use to divide the pylint score by 10 to get a number between 0 and 1. It is also possible to change the rules that pylint check so that the score it returns matches with your team uses. That is a good indicator of quality. Another very common tool for linting is flake8 but I have never used it, maybe you could use it in the same way.
Reliability — How Confident You Are With Your Code
In the above-mentioned article, we also talked about the coverage tool. Writing unit tests is sometimes painful or tedious for coders. Its main goal is to ensure that, in the long term, changes can be done with a lot of confidence. After you change something in the codebase, simply launch the full batch of tests and if they are still all green then your change can go live with a high probability of not breaking anything. Then again, it’s easy to measure this indicator using a coverage tool. pytest-coverage does this for us, python coders, and returns a ranking as a percentage, so to say a number between 0 and 1. Run
pytest . --cov-report term-missing —-cov=.
To get the overall coverage of your code by unit tests and see which lines are not being run. When it comes to unit tests, 100% of coverage is ideal of course, but focusing on main methods is a vital minimum. In the company I work for, merge requests are refused if the coverage rate is under 80%.
Putting It All Together
If you multiply the 3 indicators listed above you get the following overall indicator:
KLOC x Quality Rate x Coverage Rate
If the quality rate or the coverage rate is very low and close to 0, the overall ranking will be very low, even though you have thousands and thousands of lines. Big projects with no tests and no coding standards rank very badly. That is the worse situation, but it sometimes happens when you inherit from an old project. Nightmare. If you have no other choice than to keep the project up and running and need to maintain and improve it, the indicator (and its follow-up) may be of some help to check how well you recover. As a manager, you might follow up on this indicator on a regular basis and see how it goes up acting on the last two numbers. Most of the time, pylint is the easier rate to improve — though it might be very tedious, as it gives some explanations for each file.
Be careful not to compare projects that are written in different languages using this ranking. Some languages are much more verbose than others and this could lead to confusion.
Another piece of advice is to be cautious with big numbers, I would say more than 50, but this depends on team maturity and uses. Even if the code is well written and tested, big repositories are often very hard to maintain in the long term because it is hard to find out things in them, especially if you are a new joiner and have not participated in the initial writing. Break them up into pieces that call each other.