Why Performance Reviews are BS

12 min readFeb 14, 2024

He is a jerk! A total jerk!”

The head of a large man named Vic fills the screen on the video conference, and my friend Taylor is on the brink of a total breakdown. The “total jerk” in question is Alex, and Taylor is his manager. Vic and Taylor each manage their own teams of ten people, and only one promotion is available across all twenty of them. Taylor believes the spot should go to Alex, but Vic clearly thinks Alex is a total jerk. At least that’s what he needs to convince the calibration committee of, so the spot can go to the person he wants for the promotion, Brian. The bigger commotion he makes about Alex being not only bad at his job, but also bad as a person, the more likely Vic will win this fight.

Trapped in a stuffy conference room alongside eight other managers, Taylor and Vic continue arguing for twenty-five minutes. Nobody can leave until a decision is reached.

Welcome to calibrations.

Calibration meetings happen twice a year in conference rooms across Google and Facebook. Just like every Rosh Hashanah, God decides the fate of Jews everywhere, and we Jews pray for a good verdict, calibrations are the altar on which employee worth is judged and fates are sealed.

In calibration meetings, promotions and performance scores are decided by a group of managers for a given team. While your direct manager attends, the others in the room usually have little to no clue who you are, let alone the quality of your work. Some of you may be wondering why employees are graded by a committee like this. Shouldn’t your manager just give you a grade based on how well you did?

Please, come take a seat and let me tell you a little story about what really goes on in the world’s most progressive tech companies.

The reason calibration meetings exist, and a group of managers have to fight for scores, is because those scores must fit a specific distribution. A certain number of employees must end up in the top, middle, and lower buckets of the distribution. In a given department, only so many people can be good, a certain number must be bad, and most are required to fall somewhere in the middle.

Generally, it looks something like this:

Fights ensue not just over who gets the scarce promotional spot, but also over how each person on the team gets scored. If too many people are in the “good” bucket, managers have to collectively decide who will get pushed down to “fairly good.” In these tense moments, when an employee’s fate is being decided, his or her competence, talent, and effort hardly matter. The better scores go to the manager who’s better at getting his or her way. Vic won in the fight against Taylor that day, and Brian got promoted, not because Brian deserved it more than Alex, but because Vic was so aggressive and intolerable that he wore everyone down until they gave in.

The story of Vic and Taylor serves as an example of how performance is rated in the absence of objective measures, like grades. The process has little to do with the merits of your work; so many unrelated factors go into the decision. For example, if you have a manager who doesn’t represent your work well or doesn’t fight hard enough for you in the calibration meeting, his or her failure becomes the biggest influence on your promotion.

An important clarification: grading on a curve in school is different from the forced distribution described above. In school, teachers make the curve after the students take the test — for example, if the highest grade is a 90, then they’ll start the curve there and retrofit the other grades. At Google and Facebook, the distribution is set before any work is done, and performance is then forced to fit into it. The only thing these distributions have in common is the word used to describe them. In school, curves are used as a tool for fairness; if the test was too hard, and the highest score was an 85, it makes some logical sense to set that as the benchmark for an A grade. In contrast, a forced distribution is completely arbitrary. If everyone scores 100 percent, the forced distribution means you have to decide who gets the As and who gets the Bs. Even if everyone scored perfectly, theoretically, managers would still need to decide who fails. It’s just how the distribution works.

Calibrations are where people get slotted into the distribution, but it’s important to understand that managers show up to the meetings with people’s scores already in mind. Their ability to fight for the scores is just one piece of the puzzle. How do they come up with them in the first place?

Big companies have lots of people around, doing lots of different things, making it hard to draw clear lines between the work people do and its impact on the business. High performance is hard to detect, and low performance is easy to cover up. To explain how this environment affects performance ratings and determines the winners, let’s do a thought experiment.

For the NBA draft, basketball players are ranked by experts who use data on their past performance and observations from the court. The data is concrete, objective, and neatly benchmarked; players can be compared with each other on an apples-to-apples basis. The observations are direct; the experts can watch players in action.

Now imagine a scenario in which the panel of experts were not allowed to observe the players on the court and had very little on their performance data. What they did have was essentially worthless because it was inconsistent across players; some had points per game, while others had numbers of assists.

To decide the rankings, each player stands before the panel, describes his effort, and summarizes his performance. If players want to be ranked highly, they have to make a case for it, demonstrating why they’re the most talented and deserve to be at the top.

If we consider school to be loosely analogous to the first scenario and the corporate world to the second, we can ask a few helpful questions. Which system would be better at surfacing talent? In the second scenario, which basketball players would we assume get the high rankings? If the experts couldn’t identify actual talent, what would be used as its proxy? If we had to advise the basketball players on winning a top spot, wouldn’t we tell them to argue their case for being number one, even if they weren’t technically that good? Wouldn’t it help if they exaggerated their talent and made the other players seem inferior by comparison? Shouldn’t they seek out the judges as often as they can, kiss their asses a little, and make sure they’re always standing in front of the crowd, where they can be seen and heard?

A real-world example of this can be seen in the realm of orchestras, which over the past several decades have changed the way they hire musicians. In the late 1970s, women made up fewer than 5 percent of the United States’ top orchestra musicians. By 1997 that number had increased to 25 percent, and today some orchestras are above 30 percent. Considering that we’re still hovering around 4 percent of female CEOs after decades of effort, the leap in female orchestra musicians appears remarkable.

So how did orchestras do it? They put up a screen, so the people evaluating the performance couldn’t see the performer. Their ratings were based purely on the music produced instead of the person producing it. As with grades, when performance is judged objectively, women fare remarkably better.

In school, good grades are based on competence and effort. At work, success is predicated on acting competent and making a big show of your effort. It’s no coincidence that self-aggrandizement, aggression, and bravado, behaviors correlated more highly with men, are rewarded more often.

Clarifying the problem in this way makes it a lot easier to see the solutions. Are we better off trying to change women into men, or should we try to define success more clearly and objectively?

Although the latter is obviously more practical, it’s a highly unlikely solution, not because it wouldn’t work (it probably would), but because having measurable targets makes managers accountable for hitting them. Managers don’t like that.

On the marketing team at Facebook, we were responsible for writing the sales pitches for our advertising products. Writing stories that sell is really hard. Like most people, my teammates sucked at it. But it’s also a critical part of selling, so the sales teams were frustrated that marketing kept handing them shitty PowerPoints that were boring and full of jargon, and that made their job harder. When Robert, the head of US sales, began complaining loudly about it, the marketing executives finally decided that something had to be done.

Creating an entirely new library of stories and pitch decks became priority number one, and Sonia, my direct boss, who also was new to the company, was chosen to head up the highly visible and politically fraught project.

Over the next month, in sales meetings across the company, Sonia publicly committed to a solution. She promised the salespeople that the marketing team would deliver a brand-new library of compelling stories and pitch decks by the end of the year. It was January at this point. She had fewer than twelve months to save the marketing team’s reputation, not to mention her own.

This kind of storytelling was exactly what I did in my final two years at Google. I wrote pitch decks for YouTube, Facebook’s biggest competitor, and I traveled around the United States and Europe, teaching our salespeople how to deliver the pitches. While storytelling wasn’t a strength for the majority of people on our team, it was the one thing I knew how to do well, and I had a successful track record of delivering effective pitch decks to the sales teams at Google. I felt uniquely suited to help Sonia write new ones and build out the library.

Believing this was a win-win for both of us, I approached Sonia and explained how I could help with the project. I laid out a plan that would allow me to create the story library while keeping up with my day job. And on top of that, I’d be able to have the first draft of stories done in a month and the final versions in just eight weeks.

Sonia nodded along and thanked me for the suggestion, but said she needed some time to think about it.

I learned of her decision a few weeks later, when she presented her plan at the national sales meeting. Sonia hired a team of consultants from Accenture, who would rewrite all of the sales stories and build a better archive. They’d have everything done by December 31. It was now February. The salespeople seemed pleased.

Later on, Sonia shared the finer details of the plan with a smaller group of us in marketing. Of the eight consultants we hired, only one had ever worked in digital advertising. They knew virtually nothing about our products, our market, our competition, or our salespeople’s needs. Over the coming year, we’d act as their stewards in the process, while they wrote the stories and designed the slides. It would be nine months before sales could see a first draft. The total cost of the project? A quarter of a million dollars.

As far as I know, Sonia never mentioned my original proposal, and I didn’t ask. We never spoke of it again.

Although having me write the stories would have saved Facebook $250,000 and would have taken two months instead of ten, I now see why it would have been a bad move for Sonia. Imagine if she stood up in front of hundreds of salespeople and announced that she was going to solve their biggest sales challenge by . . . putting someone from her team on the job in that person’s spare time. It doesn’t project the right image, especially for a senior manager whose first assignment at the company was such a high-visibility affair. It sounds way more impressive to announce that we were laying out a quarter of a million dollars and devoting an entire year to it. This really makes an impression on the audience: marketing is very serious about this, and their commitment makes them an invaluable partner to sales.

Sonia’s visibility increased in direct proportion to the size and magnitude of the project. Hiring Accenture was also a smart move; it’s a well-known, prestigious firm. Associating herself with their brand made the project, and thus her reputation, appear more distinguished.

Sonia positioned herself as a hero in her first month at Facebook, and it solidified her image as a leader who’s willing to put a stake in the ground to get things done.

But what would have happened if Sonia’s success had been defined up front? What if the marketing executives planned to survey sales about the utility of the new sales decks, and set clear targets to hit? And what if efficiency metrics were put into place? That is, how much time and money it would take, along with trade-offs on quality? If success had been defined in terms of what’s best for the business, not for her reputation, the situation may have proceeded differently.

Without clear goals to assess how well Sonia solved the problem, what mattered was how important she looked while solving it.

Another common example of this phenomenon is what’s known as the “company reorg.” It seems that every time a new leader joins an organization, or if the current one needs to prove he or she has a plan to turn things around, a reorg is inevitable. Reorgs usually start off with a vague announcement from the leadership team: an ominous warning about an upcoming structural change to the organization. The lack of clarity, coupled with news of impending change, creates palpable fear among employees, who suddenly are unsure about their job security. Some people see the reorg as an opportunity to score a better, more senior role, while others see it as a need to justify their current positions. Either way, a frenzy of posturing, secret meetings, and politicking erupts, as people vie for more information and secure their spots in the new org structure. From the outside or from above, this sudden movement, the shuffling of the existing order, looks like progress. Appearing as someone who gets things done, the leader of a reorg gets an instant shot of credibility and a boost to his or her image. And by harboring information and creating fear, they only grow more powerful and important in the company.

The impact of a reorg is almost never assessed after the fact. Nobody ever knows, or seems to care, whether it was a good decision or how it impacted the bottom line. The lack of accountability makes reorgs one of the sharpest and most often used tools in the toolbox of corporate leadership. It solidifies a leader’s reputation and increases his or her power, regardless of how things turn out. Reorgs are a no-lose proposition for a corporate executive, and almost always a no-win for everyone else.

In cases like Sonia’s story project, or with any company reorg, success metrics would instill much needed accountability. So why are they seldom put in place? Because objectivity and accountability are just plain scary to most people. If Sonia had to get ten stories done in three months with a 7+ rating from sales, success or failure is clear-cut. Imagine the fear of missing the target. Would she be seen as incompetent? Would she be fired? Given a second chance? The same goes for corporate leadership amid a reorg. Keeping the objectives vague and success ambiguous is the much safer route. It preserves power among the powerful. And it’s nothing unique — it’s what everyone does across all of corporate America, every day, across every industry. Accountability is scary when your livelihood depends on it. And we’re only human.

These challenges to objectivity are solvable, only if we recognize the part they play in the gender gap. The bigger problem than gender bias is system bias. The way we judge competence and evaluate good work is broken and biased toward traits that are more common among men.

In contrast, schools grade on outcome, not behavior. The objectivity of grades is the ultimate equalizer. And that is why women dominate academia. The lack of objectivity in the corporate world results in a dysfunctional and biased system that rewards male-dominant behaviors. They are proxies for competence that don’t correlate to competence. Right now, instead of trying to design better performance systems, we try to design better women.

For the rest of this chapter and more:

Lean Out: The Truth About Women, Power, and the Workplace (p. 77). by Marissa Orr, HarperCollins Leadership.

Why Performance Reviews are BS

Written by Marissa Orr