Photo by Charles Deluvio

How we evaluate each other’s performance

Manuel Küblböck
The Caring Network Company
14 min readJul 19, 2021

--

Out of all the critical core components of self-organized companies, we found performance evaluation to be the most complicated one. This is because of the large number of aspects to it that we needed to make a decision on. And not just that, on top of that, it is also the most touchy component, because it determines people’s compensation and with that their livelihood. And as if all of that wasn’t enough, it has one of the most heated concepts of all at its very center: fairness. Ahh, what a beautifully complicated, touchy, heated challenge. Let’s dive in. Shall we?

The first part of the post goes through our assumptions that underpin our implementation before the second part walks you through our current implementation. The third and final part lists the aspects and options we considered to arrive at our implementation.

Spoiler: In a nutshell, we settled on a process that collects evaluations from a group of peers, as well as self-assessments, which then go into a council where we decide if a performance increase has taken place.

Part 1: Assumptions that underpin our implementation

We evaluate people based on their results and behavior, not based on the time they spend in the office or the time they have been with the company.

We aim for both procedural justice (the process needs to be just) and distributive justice (everyone needs to be treated justly).

Avoid demotivation

According to Deming, being ranked is demotivating when trying to determine compensation. Who are we to disagree with Deming? That would be blasphemy (and probably foolish).

Comparison with others leads to envy. We minimize envy by focusing on comparing ourselves to our past selves because our present self looks probably favorable and our past self doesn’t care.

This is why we keep comparison with others out of the evaluations and only use it within the council.

No fixed path

We don’t have fixed career ladders to follow. We think they are too limiting and prevent us from reaching our full potential. However, without them, our sense of future possibilities is often vague. We can kind of feel the future, but we cannot see it and specify precisely what it is. Our leads and peers help us uncover our path as we walk it. We may walk down a path others have taken before us. But at any stage, we are free to take a turn and walk through the field where no-one has set foot before. See this post on stewarding about how leads can support this uncovering of the path.

The description of valued behavior in our competency model is generic (not role-specific) and on a behavior level as to not limit ways people can contribute.

Compensation is a hygiene factor

According to Frederick Herzberg there are separate sets of factors that cause job satisfaction and dissatisfaction. There are motivators (e.g. challenging work, recognition for one’s achievement, responsibility, opportunity to do something meaningful, involvement in decision making, sense of importance to an organization) that give positive satisfaction, arising from intrinsic conditions of the job itself.

And then there are hygiene factors (e.g. status, job security, salary, fringe benefits, work conditions, good pay, paid insurance, vacations) that do not give positive satisfaction or lead to higher motivation, though dissatisfaction results from their absence.

We consider compensation a hygiene factor, so we minimize the amount of effort we spend on it. The compensation we offer is competitive so people are not distracted or tempted by the money they could make elsewhere. However, the motivation to show up every morning and to give it our best comes from aligning on goals, a sense of belonging in our culture, and our striving for individual growth.

Separating performance evaluation from advice and appreciation

According to Roger Fisher and Alan Sharp there are at least three different kinds of feedback:

  • Appreciation is the expression of gratitude or approval of someone’s effort. It is an expression of emotion, designed to meet an emotional need.
  • Advice consists of suggestions about a particular behavior that should be repeated or changed. It focuses on the performance, rather than judging the person.
  • Evaluation is ranking someone’s performance in relation to that of their past self, someone else, or against an explicit or implicit set of standards.

The emotional impact of being evaluated tends to drown out the advice on improving performance. Most times, evaluation is the one that is least likely to be helpful, and most likely to distract from your other two purposes. So, it’s best to separate evaluation from advice and appreciation.

Growth feedback (appreciation and advice) is each employee’s responsibility: to select the right feedback givers, to act on the advice. Performance evaluation is the company’s responsibility: to carry out, to ensure the right evaluators are selected, to act on the evaluation.

Consistency

Growth feedback and performance evaluation need to be coherent in their results to avoid confusion and frustration. They are connected via the competency model and the group of people that provides the feedback. No-one should be surprised by the outcome of either based on the result of the other.

A similar process to determine someone’s level on our competency model needs to be used during the hiring process to determine the initial compensation to avoid big differences in outcome compared to the performance evaluation process.

Our focus on relative comparison to our past selves assumes we have a fair (enough) compensation distribution to start with.

Part 2: Our current implementation

Alright, now that you have an idea of where we are coming from, let’s get to what this looks like in our concrete implementation of evaluating each others’ performance.

Picking evaluators

Each evaluatee picks and decides on their evaluators after seeking advice from their lead. This is often achieved asynchronously by a brief exchange on our chat tool. We ask everyone to pick between 5 to 8 evaluators with a variety of perspectives to ensure a comprehensive evaluation.

We suggest keeping the group of evaluators as consistent as reasonable with the group of feedback givers in our growth feedback process which we do twice a year — once before and once after the performance evaluation. Keeping the group of evaluators consistent minimizes the possibility of getting very different results — especially being disappointed with your performance evaluation (which is mostly quantitative) after having heard only affirmative feedback in your growth feedback (which is mostly qualitative).

Questionnaire

We collect evaluations via a survey that we send out to all evaluators. The survey contains four fields.

  1. Relative comparison with our past selves
    Results development: In your perception how much did the RESULTS of your colleague increase during the last year?
    not at all — barely — a little — noticeably — clearly — very clearly — exceptionally
  2. Relative comparison with our past selves
    Behavior development: Given our competency model, how much did the BEHAVIOR of your colleague improve during the last year?
    not at all — barely — a little — noticeably — clearly — very clearly — exceptionally
  3. Absolute comparison based on behavior levels
    Competency level:
    Now please take our competency model and decide which LEVEL your colleague is currently best reflected by.
    1–9
  4. Achievements
    Qualitative feedback:
    Please support your evaluation with specific EXAMPLES of achievements or situations.
    200 characters (1000 characters for self-evaluation)

In previous iterations of this process, we used 27 questions to evaluate each other in more detail based on our competency model. We learned through a statistical analysis of the evaluations that 27 questions did not increase evaluation quality compared to a much less complex model. Answering patterns heavily correlated with the evaluator’s general impression of the evaluatee.

The time and effort needed by each evaluator made this part of the process feel more substantial than it actually was in terms of the final decision. The little extra gain of information created by asking more questions didn’t justify the extra time and effort needed for the process.

So we condensed it down to these four questions. As an added benefit the remaining questions are way less time-consuming.

Competency model

Our competency model describes the desired behavior we wish for from each other. It is the basis we evaluate each other on. It consists of defining behaviors on different levels for each of our 4 company values. As an example, here is how we describe our value “We focus on what makes sense” with the behaviors of being goal-oriented and quality-oriented.

As a rule of thumb, levels 1–3 are about leading yourself, 4–6 are about leading in your team, and 7–9 are about leading on a company level and beyond.

The more we are aligned on our competency model and what each level means, the more accurately can it serve us in evaluating each other fairly. Other than describing each level as best as we can, we haven’t found a great solution yet for calibrating our interpretations.

Compensation curves

Each of our chapters has a compensation curve that maps a level (1–9) to a monetary compensation value. We determine these curves by using salary data from online sources, salary wishes from candidates in our recruiting process, and most importantly by comparing data with befriended companies.

Compensation increases don’t just happen at distinct level jumps, but from any change of the level — also after the decimal point. While we only use the distinct levels 1–9 for evaluation to keep it simple, we use one decimal position during the council and for inferring the compensation. Someone might have a level of 4.2, for instance.

The curves generally don’t grow linearly, which reflects our assumption that the distribution of people’s impact isn’t linear but follows a power-law distribution.

We review these curves each year before the council. We adjust them if we believe they are no longer competitive in the market. We then adjust the entire curve. This may include making the curve flatter or steeper. There is no built-in standard increase of curves each year. Instead, we look if the market justifies a change. Typically, we adjust 3–5 curves each year.

When we adjust a curve, everyone’s compensation in the respective chapter gets adjusted accordingly on top of potential increases through the performance evaluations. The only exceptions are chapter members who do not fulfill expectations for their current level. We never decrease salaries, but through this mechanism, a level decrease with an unchanged salary can occur.

We haven’t made the compensation curves transparent so far. We are very careful about avoiding envy through comparison. In this case between chapters — knowing this is a step we can’t reverse.

We had a lot of voices asking for transparency on curves though. And in terms of wanting to know your own curve as an outlook on a possible compensation progression, it seems like a very reasonable request. In the next iteration, we will most likely make each curve transparent to its respective chapter members but not others. We also plan on giving more insight on how we derive the curves to increase perceived procedural justice.

Spreadsheets

Over the course of the last few years, we developed and fine-tuned two spreadsheets that automate as much of the process as possible.

The first spreadsheet and some accompanying scripts take care of

  • Collecting all evaluations from the survey
  • Calculating averages
  • Sending out emails, and
  • Creating reports for each evaluatee summarizing the results

The second spreadsheet is used during the council sessions and takes care of

  • Laying out the evaluation data so it can be easily understood,
  • Giving a recommendation for warrant and degree of a level increase based on the evaluations (together with a respective amount of compensation increase),
  • Showing us if we are within our allocated budget, and
  • Allowing us to adjust compensation raises to stay within budget

Council

We cluster evaluations that we discuss in the council by chapter. This allows us to have a look at all chapter members in relation to each other. This is where the relative comparison of people to each other comes in that we keep out of the evaluation part. Chapter leads receive preliminary performance reports for each of their chapter members prior to the council to form a personal opinion. They lead the discussion about their chapter in the council.

The council discusses each evaluatee and decides on an appropriate level based on the information and recommendations in the council spreadsheet. Using the chapter curves this level translates into a resulting new salary. The chapter lead makes sure they understand the reasoning behind the decision so they can explain it to the evaluatee.

We use a timesheet to schedule the council days so the chapter leads know when it is their turn ahead of time. We ask them to monitor our chat tool so we can inform them in case we are slightly ahead or behind schedule. We reserve seven minutes to discuss each evaluatee. This short amount of time works for us because of the council spreadsheet described above and because we expect the chapter leads to come prepared to the council.

Delivery

Each individual gets their report in a 1:1 session with their lead where they walk through the report together. The lead explains the results and conveys how the decision was reached in the council. The evaluatee has space to raise clarifying questions as well as emotions that come up.

Some people will not be happy with the outcome. That is inevitable. Some of those people will ask for a reconsideration of the council decision usually along with a monetary expectation. We distinguish between two scenarios:

  1. There is information that the council didn’t consider when they made the decision on the new level of this person. And it is likely that they would have come to a different decision if they had this piece of information. For these cases, we do a small council after the delivery sessions have taken place.
  2. There is no indication that the level decision is incorrect, but the person is unhappy with their compensation or the too little increase thereof. In these cases, we stay firm. If we didn’t, we would have just built a very elaborate process that good negotiators can still bypass.

Part 3: Aspects and options within them

Choices. Choices. Choices. This section lists the options for each aspect that we considered including a brief explanation of why we chose it or not. The options that we are using in our implementation of evaluating each other are highlighted in bold font. There are more options for each aspect. The ones listed are the ones we found worth discussing.

Evaluation — what is evaluated

Relative comparison

  • with others: Deming says: Don’t do it. We agree, so we refrain from it during the evaluation by peers, HOWEVER, we find it useful within the council to increase relative fairness within chapters.
  • with past self: focuses on each individual’s development

Absolute comparison with a description of the desired state.

  • Behavior levels: An individual’s level is transparent for the evaluatee, their lead, and the council, but not for the evaluators. However, level buckets (1–2, 3–4, 5–7, 8–9) correlate with the seniority prefixes in our job titles.

Type

  • Quantitative scale: easy to infer salary from
  • Qualitative: in very limited form in a free text field (200 char for others, 1000 char for self), no provided list because that would limit the ways to contribute, more substantially happening in growth feedback

Channel

  • Questionnaire: efficient, reduces the hurdle to be candid
  • In-person: inefficient because of too many overlapping peer groups

Evaluatee — who is evaluated

  • Selective pull: doesn’t catch low-performers who keep quiet, puts shy people at a disadvantage
  • Everyone who is a permanent employee after their first 9 months with a non-terminated contract: feedback is essential for growth

Evaluator — who evaluates

  • Centralized (e.g. leads): doesn’t match our org structure
  • Decentralized (e.g. peers): in line with our org structure
  • Self: valuable self-reflection

Compensation inference — how to get from performance to compensation

  • Negotiation: favors good negotiators
  • Salary formula: high complexity, no human judgment
  • Service provider for compensation and benefits with a catalog: high complexity, expensive
  • Compensation curves: reflects market compensation data
  • Council: sanity check for evaluations, uses peer evaluations, self-evaluations, company health, and market compensation data

Progression — how to advance

  • Levels in competency model: provides clarity on valued behavior
  • Undetermined skill development path: freedom (and uncertainty) with lead and peers as guides
  • Career ladder: limits options but also gives guidance, we hope to cover the guidance with leads and peers

Anonymity — who sees the evaluations

  • Transparent evaluations: adds social aspects to evaluations and reduces the likelihood of candid feedback
  • Anonymous evaluations, transparent group of evaluators: enough anonymity to avoid downsides of transparent evaluations; delivery by lead avoids the frustration of not being able to have a dialogue about the feedback
  • Transparent evaluations and evaluators for the council: helps to interpret the evaluations, but should already be reflected in peer group constellation, adds bias of council of evaluators

Frequency — how often we evaluate

  • Yearly: minimum effort
  • Additionally self-triggered: keeps employees in the driver seat, the council meets once a quarter

Delivery — how we communicate the results

  • Performance report with averages: clarity
  • 1:1 with lead: in-person delivery avoids misinterpretation of the results
  • send performance report to evaluatees directly: leaves evaluatees alone with the interpretation of their report and has a high likelihood of misinterpretation and frustration
  • Performance evaluation and inferred compensation are delivered together

Transparency — who sees the results

  • evaluatee: helpful as feedback
  • lead: necessary for delivery
  • council: necessary for level decision
  • evaluators: more comparison with others than necessary
  • chapter: more comparison with others than necessary
  • whole company: more comparison with others than necessary

Does it have to be this complicated?

I started this post by claiming that performance evaluation is the most complicated component out of all the critical core components. I guess by now, you agree. The obvious question to ask is: Does it have to be this complicated? Why not hand the responsibility of performance evaluation to one person per colleague, like the chapter lead role?

The main reason is that this wouldn’t fit a cross-functional team setup. The chapter lead would complain: “I can’t really fully judge the daily performance of my chapter members because I don’t work with them on a daily basis.”

Fine. Let’s give the responsibility for performance evaluation to someone who does, like the product owner role of a team. The product owner would complain: “I can’t really fully judge the functional performance of my squad members because I can’t compare them to their chapter colleagues that work in other squads. Besides, I don’t have an in-depth understanding of what they do.”

So, yes, it has to be this complicated, if we want the benefits of cross-functional teams. They necessitate decentralized feedback formats — including performance evaluation. It’s one of the trade-offs of cross-functional teams: they make feedback more complicated.

But at least I will receive lots of appreciation, right?

Once you worked this out for your context, give yourself a big pat on the back. This is a massive feat. Don’t expect anyone else to get excited over this, though. Performance evaluation is a hygiene factor after all. And just because you put a lot of thought into procedural and distributive justice doesn’t mean everyone will experience the process and outcome as fair. Fairness is highly subjective.

Peer evaluation is the worst form of salary setting, except for all the others.

Churchill didn’t say this. But I believe he would have, had he been tasked with finding a fair way to set salaries within a company.

I can’t publish this post without acknowledging four of my colleagues that were essential in the evolution of this process. 🙏 Thank you Neele, Sabine, Silke, and Wiebke.

This together with all other concepts on this blog is nicely bundled up with 88 visualizations, 37 videos, and 11 templates in my New Work by Design Transformation course. Helping you put New Work into practice for less than the price of a consulting day.

--

--

Manuel Küblböck
The Caring Network Company

Org design & transformation, Agile and Lean practitioner, web fanboy, ski tourer, coffee snob.