Implementing Grading System Soup-To-Nuts. Part 1 of 2

Published in

Devexperts

12 min readNov 14, 2022

Sooner or later each software development company approaches the point where it requires a grading system. The process of creating and implementing such systems requires a lot of effort and can easily fail at any stage because of various circumstances.

In this article, I’ll shed some light on the process of creating and implementing a grading system from soup to nuts. It will help engineering managers and C-level managers understand the basic pros and cons of such systems and give them algorithms for choosing between different approaches. Also, this article will show how hard it is to implement systems of such magnitude, and which risks and obstacles one might face during this journey.

From an individual contributor’s perspective, it might be useful for a better understanding of why companies have such systems and how they are created.

The article will be separated into 2 parts: in the first one, I’ll reveal the initial stage of things, prepare a basic solution for 1 competency, and then run a test within it, to get useful insights for further implementation.

Let’s start our journey with research on the current state of things.

Section 1. Initial research

Starting point

It’s crucial to understand the context and current state of things before starting to implement any kind of system. You should know point A to reach point B (wishful state).

This example is no exception. Here is a brief initial state of things.

Devexperts is a fintech company, that operates for 20+ years. The engineering team consists of 800+ engineers from all over the world, with remote-first and hybrid work modes. There are 6 development competencies:

Frontend: React, Typescript, GWT
Backend: Java, C#
Desktop: Java Swing
Mobile: Kotlin, Swift, Flutter

The company develops its own products and takes on outsourcing projects, creating software from scratch and improving the existing ones.

Usually, developers are divided into several grades: intern/junior/middle/senior/lead developer. There were no standard evaluation forms, no criteria, and no clear path.

However, the company conducted Personal Performance Reviews each 6–12 months, creating a personal development plan based on its results.

The company attempted to implement a different grading system for at least two years, but the results were unsatisfactory because of such roadblocks as:

Constant arguing and lack of shared vision,
Too complex systems without proper instructions,
Technically deep systems that lack the business side of things,
Scope changes before release, etc.

As you can see, the reasons look pretty similar to usual software development projects’ problems. Let’s create a high-level roadmap for this project, at least for part 1.

As we understand the initial state of things, let’s start working on our goals!

Goals

When defining goals, separate them into different categories or points of view. Usually, there is more than 1 stakeholder/user group for each initiative.

Everyone wants to be praised and adequately promoted based on their achievements. Long story short, people in IT are obsessed with growth. But when it comes to a certain amount of developers in the company, it’s getting harder to provide the same context for everybody. Based on my experience, 50 engineers is a threshold, beyond which the grading system is a must-have. However, it’s always about business priorities, as time is a limited resource.

A good grading system should answer the questions and meet the expectations of all stakeholders. The engineers might want to know:

Who is the Junior, Middle, Senior, Team Leader, and so forth?
What should I do to reach the next level as an engineer?
Why am I a Middle for 10 years already?
Why do Helena and Jack from another team have higher grades?
What goes after a senior position? Is leading a team the only way to grow?

The second point of view is business. Here are typical expectations:

Don’t ruin the project’s work. Business should work not worse than usual with the new system.
Create a system of adequate evaluation of performance, skills, and value for each developer.
Establish a transparent process of evaluation.

Finally, we have team leaders, who are responsible for the performance and growth of their team:

How can I spend less time creating a personal development plan for my teammates?
What are the standard criteria for developer levels in the company?

Considering these 3 stakeholder groups’ questions and goals, we are ready to start. Let’s see what our colleagues from other companies did.

Solutions on the market

I consider market research before building a custom solution from scratch as a must-have pattern for almost every task. You might have a lot of useful insights, plus, there is a chance of having an out-of-the-box solution.

Basically, there are 3 approaches to grading system building:

1. Hard-skills based. Deep knowledge of a certain technology becomes crucial. The grading formula looks more or less like that:

Frontend Grade = JS + CSS + TS + React + ….

Mainly used in outstaffing/outsourcing companies, where business success is tightly connected with the ability to pass technical interviews with clients.

Seems more or less valid in case of well-written criteria and evaluation process. However, the knowledge of a certain technology doesn’t guarantee the results required from business, tech progress might be limited by a project itself, which could easily lead to a conflict of interests.

For example, one of the biggest outsourcing companies in the world — EPAM — previously had a mainly hard-skills-based system.

2. Soft-skills based. Cultural fit has a bigger meaning than engineering skill/knowledge. The grade is defined by the following criteria:

Frontend Grade = business impact + communication skills + area of impact + area of accountability + …

This approach makes sense for big product companies, where the amount of social connections is very high, therefore, the ability to communicate becomes crucial. It could be considered subjective because it’s tricky to evaluate soft skills properly. Also, the level of technical knowledge can be out of scope, which may lead to future issues (it’s engineering, after all). Perhaps, the most famous example is a Dropbox career framework. On the one hand, it’s harder to evaluate impact rather than technical knowledge, on the other, this approach is more aligned with businesses’ goals.

*Pic 3 — Dropbox engineer-level example*

3. Hybrid. This approach consists of both soft- and hard-skills parts that are evaluated and calculated together. In case of the right implementation, it brings advantages of both approaches (or drawbacks in worst-case scenarios).

A good example of such an approach is a CircleCI Competency Matrix.

*Pic.4 — CircleCI has guidelines which very useful*

Let’s choose the basic approach for our system.

The best way to choose the right basic approach is to align it with company goals and needs. There is no silver bullet, as for any other kind of task.

We decided to try a hybrid approach with more soft-skills criteria. Market solutions seemed not customized enough for our needs, because they gave some grasp of the overall framework, not less, not more.

The reason we did that is pretty simple: our products and projects are very complex and require a lot of communication. Also, we do care about technical excellence, but to be fully aligned with business needs and goals, the ability to implement something is more important than bare knowledge.

Let’s finally start the creation of our new shiny grading system!

Section 2. Proof of Concept

Creating a structure

I believe that an incremental approach with several iterations is probably the only way to succeed. There is only one way to eat an elephant: a bite at a time

Having those 2 things in mind, we decided to start creating and implementing a grading system within 1 of 6 development competencies — The Frontend Department. Despite the obvious advantages of such an approach, we kept the following drawbacks in mind:

After the first iteration, you will have to scale on other competencies. So, the proper abstraction for the scaling must be done.
You’ll have to reach a consensus with other departments’ heads as they might have their own vision.

These 2 aspects require additional time and effort. However, I do believe that these drawbacks are bearable. I’ll reveal common approaches to it in the second part of the article.

What will be step 0? I’ve started with defining the number of grades. Given that there were some positions already, and keeping backport compatibility in mind, I’ve decided to make 8 levels:

intern,
junior,
3 levels for middle developers,
senior,
lead,
principal.

We have levels, but how will they distinguish from each other?

First, define primal criteria:

Tasks’ complexity level. The most important thing. It defines your technical skills from practical experience and proofs from real-life projects. This factor is heavily aligned with business needs.
Independence. How often does a developer need a helping hand from other developers? What level of technical control does their code require?
Level of influence on the team. If a person is actively mentoring, growing and onboarding other team members, this parameter should be high.
Level of influence on other teams. It includes various kinds of knowledge sharing with developers from other projects. It can be a speech, sporadic advice, or an open-source contribution.

*Pic. 6 — Part of the future presentation*

I considered these 4 things crucial for defining a grade. However, it’s not a full picture, so more things were added initially.

Secondary parameters to pay attention to are:

Hard skills. They are still playing a big role in the process of growing as an engineer, no doubt. Plus, it’s always good to have some default path to gain new knowledge.
Soft skills. Software development requires a lot of communication, so this is a key competence.
Anti-patterns. In other words, things to avoid. No one wants arrogant senior developers or lazy juniors on the team.
Addons. Additional nice-to-have requirements. For example, being a public speaker.

With that structure, it makes sense to start fulfilling each grade with content. Let’s check how it was done for an intern level. Currently, everything will be stored there, later on, general grading rules will be abstracted and become common for each and every level.

*Pic. 8 — The first description of the first level*

Let’s skip the obvious part about describing each grade and its parts. Are we ready to go?

But wait, upon completion of this simple part, we bump into the following questions:

What will the process of grade defining look like?
Who should evaluate a grade?
What are the thresholds?
Which units should be used?
What will the results look like?

And this is not a full list of questions to answer. However, we have a fully described system without evaluation rules and final documents. Let’s create them and start our beta-testing.

Basic principles

Ground rules of assessment cover the following questions (and here is an answer for our case):

Is a downgrade possible?

No, it’s not. There is no back power for the law, plus it makes no sense to humiliate a person by downgrading.

Is it possible to give a grade in advance?

No, it’s not, to become someone, you should already be the one.

How to calculate grades based on evaluations?

There is no magic checkbox, field, or formula, but rather common guidance with minimal thresholds.

How can facts be collected and validated to evaluate performance?

A wide range of sources could and should be used to collect and validate facts on performance, for example, “360-degree feedback”, metrics from different tools, training certificates, feedback from relevant colleagues, etc

Who will initiate an evaluation process?

Both developer and team leader should fill the same spreadsheet without knowing each other’s results, compare them, and come to a single conclusion.

What will happen if a developer does not agree with the final outcome?

Then he/she is able to escalate an issue to a grading committee. Their solution will be final.

Let’s define an evaluation and decision-making process.

We’ve tried 2 approaches:

Team leader evaluates their teammate by themselves, and after that they schedule a call to make adjustments if needed. It allows a gradual introduction to the system but might miss the self-evaluation process.
Both the team leader and developer fill out the form separately, then they merge the results. It takes more time but gives a better understanding to everyone.

Finally, we assume that the final decision about the grade is a responsibility of a team leader.

The only missing part now is the evaluation form, let’s fill this gap. Which marks will we use for this purpose? During the first try, I decided to proceed with binary 0/1 marks for everything except main responsibilities, which were given a 0–100% evaluation.

Finally, we are fully ready to test the grading system in real life!

And here comes the tricky part about such systems:

The hardest part is not creating, but implementing the grading system and making it work.

Implementation requires a lot of communication and context-sharing, with heavily task-loaded people. However, with the right attitude and preparation, it becomes another routine task.

Beta testing

1 small but very important formula that people often forget about:
Implementing Effectiveness = Team’s engagement x Solution Quality
Where each part starts from 0 and ends with 1 (0–100%).

People care so much about the quality and tend to abandon the engagement. Don’t do that! A perfect system with 0 engagement from your team will turn into 0 very quickly.

So, the ideal scenario is to involve your team in the creation of this system sooner rather than later. Or, at least present it and ask for feedback. Also, the smaller your sandbox is, the shorter the feedback loops are.

This is my ultimate beta-testing checklist:

Synchronize with people who will do the grading. A high-level presentation along with a thorough explanation during 1-on-1s should work.
Have a dry run. Ask to perform a shadow-grading: evaluate someone you know pretty well. Check the results and discuss them with a team leader to find flaws in the decision-making process.
Have the first 2–3 grading test sessions with team leaders and their developers. It might be awkward for someone, so make sure that you communicate goals properly, and set up a dry-run context.
Check the feedback results. Proper feedback gathering is crucial. The best way to do it is to combine 2 approaches: a personal interview and an anonymous survey.

I’m sure that you will find the results interesting and surprising. So, be prepared to face the truth even if it’s not very pleasant.

The results of our beta-testing in the Frontend Department turned out to be the following:

The overall conception works.
No clear criteria for defining the grade were made. Team leaders were struggling to identify the proper grade.
It’s hard to estimate something in %. The human brain works pretty well when it comes to 7+-2 marks, but there are too many shades of grey on a 0–100 scale.
It’s hard to give a 0/1 evaluation of hard or soft skills. It’s quite the opposite to the previous point: lack of choice makes the evaluation even harder.

The results were good enough to proceed with implementation for the whole company.

Results & Part 2 Overview

As the outcome of the 1st part we have the following:

Grading system with described levels,
Evaluation process rules,
Beta-tests within the Frontend Department,
Feedback from testers: lack of criteria definitions, issues with %, and binary evaluations.

In the second part of my article, we will see how to gather a team with different competencies, why deleting might be more useful than adding, how to properly separate a middle level, and how to do a rollout on the whole engineering department.

Stay tuned!