Open Problems in Group Rationality

My sense is that LessWrong and the Sequences and the heuristics-and-biases literature and the Center for Applied Rationality canon and things like the dissemination of various economic and psychiatric and game theoretic concepts over the internet have done a lot for building a basic art of individual rationality.

By this I mean that there are plenty of swine pearls out there for people to pick up — it might be hard for someone to start updating their own personal algorithms and source code, but it’s not hard to find out how. Maybe there’s not much material above blue belt level, but there’s a lot of stuff for someone wanting to go from “never trained” to white belt or yellow belt or orange belt or green belt. At least for the basics, rationality is a technical problem rather than an adaptive one.

On the other hand, I feel like group rationality remains a largely open and empty field. There’s a lot of literature on what we do wrong, but not a lot of ready-made “techniques” just sitting there to help us get it right — only a scattering of disconnected traditions in things like management or family therapy or politics. My sense is that group rationality is in a place similar to where individual rationality was ~15 years ago, when there was all this depressing research on how we’re all broken and how awareness of our flaws does not protect us from our flaws, but precious few tractable next actions.

I’d like to fix this. In particular, I suspect that there’s value in treating the group rationality question as its own, separate question — not because it actually is distinct from individual rationality, but because isolating it temporarily might do good things like reduce complexity, increase measurability, focus attention, and meaningfully increase optimism that any progress can be made at all.

So below is my attempt to crystallize some fraction of the space. I welcome additional open questions within my same frame, I welcome alternate frames that carve up the space differently, I welcome better posings of my questions that more accurately carve reality at the joint, I welcome pointers toward places where the work has already been done, and I welcome wild theories about how best to solve any given subproblem.


1. Buy-in and retention.

Any group (company, team, community) is going to need to attract new members, enculturate those members, and nourish/reward them over time such that their needs are not better met by absorbing the cost of leaving and finding a new group. To the best of my knowledge, we don’t have a generalized model of how to do this, with gears and levers that are at least as clear and useful as e.g. Maslow’s Hierarchy is for individuals.

Some threads to pull on:

Model 1: Escalating Asks and Rewards
Imagine that you join an average martial arts academy — one of the ones with big plate glass windows in a strip mall. On day one, they’ll ask you to do some kicks and punches and stand in uncomfortable stances while you wear unfamiliar clothes.

After a month (or three, or six, depending on the granularity of their system), they’ll ask you to perform a variety of your new skills in front of a judge and a crowd, and they’ll reward this behavior with a brand-new belt (yellow! Woo!).

Not long after, you’ll show up and they’ll ask you to take some time supervising the new white belts — just a few minutes, during the class, while the instructor bounces back and forth between groups. You’ll squint at the new guys’ form, make some suggestions, offer sympathy based on your own memory of struggling through these same confusions a month ago. At the end of the class, they’ll bow to you, and so will your instructor — a reward of additional respect.

As time goes on, this keeps ramping up — they ask something of you, and they reward you, and each ask is greater than the last, and comes with a corresponding transfer of status and respect. Eventually, they’ll make you an assistant instructor, and you’ll start having formal responsibilities and maybe getting paid, and maybe at some point you’ll become a black belt and start running a demonstration team or a regular sparring practice of your own favorite protégés, and at some point, you’ll become a master and the head instructor will ask you to branch out and start your own dojo.

There’s a pair of quotes I read once that I really like in combination together:

The vigor of a community depends on the allegiance of its members, and the allegiance can be created and enhanced by the dissemination of epic stories.
Communities that make few or no demands on their members cannot command allegiance. All else being equal, members who feel most needed have the strongest allegiance.

The model here is that, the more a community asks of you, the more you can tell yourself the story of how you worked and sweated and gave and it made a difference. Within the world of this hypothetical martial arts community, you matter. You matter, and you know you matter, because you can see how the system came to gradually depend upon you, with structures being built up around you such that losing you would cause everything to sag and lean and maybe collapse.

But of course, no one wants that at first. If you were to walk into a martial arts academy and they were to ask you to take on a full time position, you’d probably be out of there like a shot. When you first walk in to a community, you’re asking the question: is there meaning here for me? Are there ways to make mutually beneficial trades? If I pour in effort, will I get things back? If I make myself available, will they actually use me well?

And so the dance is one of gentle exploration and ever-escalating commitments, with you finding out what it’s like to be needed by these people, and these people figuring out what they can build with you in the mix. Those who aren’t “asked of” don’t feel needed, don’t feel valued, don’t feel important, don’t feel like they’re part. But those who are asked too much too soon, or rewarded too little relative to what they feel like they’re giving, can’t spin up a good story that sounds like “this is a good place for me to be” when they tell it.

Model 2: Imperfect Containers
In a sense, every community is already functioning at full capacity.

This doesn’t mean every community is good or efficient. It’s like the web of an ecosystem — every species is sufficiently adapted to be there at all, and the ecosystem is fully functional in that every plant and animal and microbe in it is doing something that affects the whole, even if some niches suck and the overall balance isn’t healthy long-term.

When you show up at a company for the first time, that company already exists, has money, has roles, has a culture. Even if you’re founding the company, there’s some known set of people and dynamics between them, that you’re trying to turn toward a new purpose.

So you slot yourself in. Become, in some part, the thing that the overall structure needs of you.

(Note: If the community doesn’t have open slots, very few people will join it, and those will be the equivalent of Invasive Species entering an ecosystem.)

But eventually, that role, that niche — it becomes too small for you. You outgrow it (or, if you were always too big for it, you eventually lose patience). Your elbows start to bump into the elbows of the people around you, and the constraints start to chafe.

If the community is too rigid, nothing changes, and eventually you give up and leave or give up and stay. But with any luck, there’s enough flex that, as your current niche feels too small, you can stretch your container. You can take on new roles within the company, launch a new project within the team, start a new weekly event within the neighborhood, open up a new branch of the martial arts empire, start giving orders to people who were previously your peers or superiors.

And so the whole system is continually growing, as people jostle and bump and try to make room for the versions of themselves they want to be. There’s a co-creation aspect to it — if people can’t have ownership, or at least creative direction, then many of them will be dissatisfied and ultimately go to find another community. But also, if ownership and creative direction are required up front, fewer people will be able to slot in in the first place.

The ideal balance, then, is one where, from the outside, the open/available niches are clear enough that people who are unfamiliar with the community can make reasonably confident predictions about whether or not they’ll fit. They don’t have to be right, they just have to be confident enough that they feel safe enough to take the plunge. “Do I want this job?” “Should I take advantage of this three-month trial?” “Do I know what kind of dish I should bring, if I want to participate in the block party potluck?”

But it’s also one where, from the inside, neither the niches themselves nor the overall matrix holding them together is completely rigid. Most humans seek some form of novelty; most humans desire growth.

Model 3: Flags and Pledges
Recently, I was comparing notes on how I get eleven-year-olds to do pushups (and like it) with my friend Kenzi, who has spent years herding volunteers as a stage manager for a theater company. We noticed some striking similarities in the way we’d both learned to coax good work out of people who can quit at any moment, in the absence of any real strong incentives. Convergent evolution led us both to the same basic structure.

The short version is: people don’t respond to imposed principles. If you try to hold someone accountable to a standard they didn’t agree to, or that they think is wrong or bad or dumb, they’ll simply walk away (or actively rebel). Instead, what you have to do is key into something that they do care about, and forge a comprehensible link between that internal motivation and the external needs of the play or project or class or whatever.

That way, when you see a volunteer lounging around doing nothing, or a sixth grader giving up halfway through the pushups, your intervention can be more of a reminder of something they already care about, and less of an imposition from the outside. If they’ve already consciously noticed the link between doing pushups and being better at parkour, then encouragement to finish the set of pushups feels more like encouragement to get better at parkour, and less like forcing them to keep doing something hot and painful and fundamentally unrewarding.

(Look here for a full post on this model.)


2. Defection and discontent.

I have a colleague who likes to quote a joke that goes something like: “We have yet to enter the best phase of this movement: the search for traitors.”

My current hypothesis is that feelings of being-defected-on by one’s fellow group members, or of being unfairly judged as defecting by one’s fellow group members, usually emerge from misunderstanding and miscommunication. Actual defection happens, sure, but it’s something like 10% of the problem, with the other 90% being typical mind fallacies, differential understanding of goals and targets, and a lack of common knowledge surrounding norms and standards. Usually, Person A is not in their own mind trying to screw over Person B or the project as a whole, but rather simply disagrees with Person B about what sorts of rights and responsibilities are conferred upon them by their participation in the group.

My new favorite tool for modeling this is stag hunts, which are similar to prisoner’s dilemmas in that they contain two or more people each independently making decisions which affect the group. In a stag hunt:

  • Imagine a hunting party venturing out into the wilderness.
  • Each player may choose stag or rabbit, representing the type of game they will try to bring down.
  • All game will be shared within the group (usually evenly, though things get more complex when you start adding in real-world arguments over who deserves what).
  • Bringing down a stag is costly and effortful, and requires coordination, but has a large payoff. Let’s say it costs each player 5 points of utility (time, energy, bullets, etc.) to participate in a stag hunt, but a stag is worth 50 utility (in the form of food, leather, etc.) if you catch one.
  • Bringing down rabbits is low-cost and low-effort and can be done unilaterally. Let’s say it only costs each player 1 point of utility to hunt rabbit, and you get 3 utility as a result.
  • If any player unexpectedly chooses rabbit while others choose stag, the stag escapes through the hole in the formation and is not caught. Thus, if five players all choose stag, they lose 25 utility and gain 50 utility, for a net gain of 25 (or +5 apiece). But if four players choose stag and one chooses rabbit, they lose 21 utility and gain only 3.
  • This creates a strong pressure toward having the Schelling choice be rabbit. It’s saner and safer (spend 5, gain 15, net gain of 10 or +2 apiece), especially if you have any doubt about the other hunters’ ability to stick to the plan, or the other hunters’ faith in the other hunters, or in the other hunters’ current resources and ability to even take a hit of 5 utility, or in whether or not the forest contains a stag at all.

Let’s work through a specific example. Imagine that the hunting party contains the following five people:

  • Alexis (currently has 15 utility “in the bank”)
  • Blake (currently has 12)
  • Cameron (9)
  • Dallas (6)
  • Elliott (5)

If everyone successfully coordinates to choose stag, then the end result will be positive for everyone. The stag costs everyone 5 utility to bring down, and then its 50 utility is divided evenly so that everyone gets 10, for a net gain of 5. The array [15, 12, 9, 6, 5] has bumped up to [20, 17, 14, 11, 10].

If everyone chooses rabbit, the end result is also positive, though less excitingly so. Rabbits cost 1 to hunt and provide 3 when caught, so the party will end up at [17, 14, 11, 8, 7].

But imagine the situation where a stag hunt is attempted, but unsuccessful. Let’s say that Blake quietly decides to hunt rabbit while everyone else chooses stag. What happens?

Alexis, Cameron, Dallas, and Elliott each lose 5 utility while Blake loses 1. The rabbit that Blake catches is divided five ways, for a total of 0.6 utility apiece. Now our array looks like [10.6, 11.6, 4.6, 1.6, 0.6].

(Remember, Blake only spent 1 utility in the first place.)

If you’re Elliott, this is a super scary result to imagine. You no longer have enough resources in the bank to be self-sustaining—you can’t even go out on another rabbit hunt, at this point.

And so, if you’re Elliott, it’s tempting to preemptively choose rabbit yourself. If there’s even a chance that the other players might defect on the overall stag hunt (because they’re tired, or lazy, or whatever) or worse, if there might not even be a stag out there in the woods today, then you have a strong motivation to self-protectively husband your resources. Even if it turns out that you were wrong about the others, and you end up being the only one who chose rabbit, you still end up in a much less dangerous spot: [10.6, 7.6, 4.6, 1.6, 4.6].

Now imagine that you’re Dallas, thinking through each of these scenarios. In both cases, you end up pretty screwed, with your total utility reserves at 1.6. At that point, you’ve got to drop out of any future stag hunts, and all you can do is hunt rabbit for a while until you’ve built up your resources again.

So as Dallas, you’re reluctant to listen to any enthusiastic plan to choose stag. You’ve got enough resources to absorb one failure, and so you don’t want to do a stag hunt until you’re really darn sure that there’s a stag out there, and that everybody’s really actually for real going to work together and try their hardest. You’re not opposed to hunting stag, you’re just opposed to wild optimism and wanton, frivolous burning of resources.

Meanwhile, if you’re Alexis or Blake, you’re starting to feel pretty frustrated. I mean, why bother coming out to a stag hunt if you’re not even actually willing to put in the effort to hunt stag? Can’t these people see that we’re all better off if we pitch in hard, together? Why are Dallas and Elliott preemptively talking about rabbits when we haven’t even tried catching a stag yet?

I’ve recently been using the terms White Knight and Black Knight to refer, not to specific people like Alexis and Elliott, but to the roles that those people play in situations requiring this kind of coordination. White Knight and Black Knight are hats that people put on or take off, depending on circumstances.

The White Knight is a character who has looked at what’s going on, built a model of the situation, decided that they understand the Rules, and begun to take confident action in accordance with those Rules. In particular, the White Knight has decided that the time to choose stag is obvious, and is already common knowledge/has the Schelling nature. I mean, just look at the numbers, right?

The White Knight is often wrong, because reality is more complex than the model even if the model is a good model. Furthermore, other people often don’t notice that the White Knight is assuming that everyone knows that it’s time to choose stag—communication is hard, and the double illusion of transparency is a hell of a drug, and someone can say words like “All right, let’s all get out there and do our best” and different people in the room can draw very different conclusions about what that means.

So the White Knight burns resources over and over again, and feels defected on every time someone “wrongheadedly” chooses rabbit, and meanwhile the other players feel unfairly judged and found wanting according to a standard that they never explicitly agreed to (remember, choosing rabbit should be the Schelling option, according to me), and the whole thing is very rough for everyone.

If this process goes on long enough, the White Knight may burn out and become the Black Knight. The Black Knight is a more mercenary character—it has limited resources, so it has to watch out for itself, and it’s only allied with the group to the extent that the group’s goals match up with its own. It’s capable of teamwork and coordination, but it’s not zealous. It isn’t blinded by optimism or patriotism; it’s there to engage in mutually beneficial trade, while taking into account the realities of uncertainty and unreliability and miscommunication.

The Black Knight doesn’t like this whole frame in which doing the safe and conservative thing is judged as “defection.” It wants to know who this White Knight thinks he is, that he can just declare that it’s time to choose stag, without discussion or consideration of cost. If anyone’s defecting, it’s the White Knight, by going around getting mad at people for following local incentive gradients and doing the predictable thing.

But the Black Knight is also wrong, in that sometimes you really do have to be all-in for the thing to work. You can’t always sit back and choose the safe, calculated option—there are, sometimes, gains that can only be gotten if you have no exit strategy and leave everything you’ve got on the field.

I don’t have a solution for this particular dynamic, except for a general sense that shining more light on it (dignifying both sides, improving communication, being willing to be explicit, making it safe for both sides to be explicit) will probably help. I think that a “technique” which zeroes in on ensuring shared common-knowledge understanding of “this is what’s good in our subculture, this is what’s bad, this is when we need to fully commit, this is when we can do the minimum” is a promising candidate for defusing the whole cycle of mutual accusation and defensiveness.

(Circling with a capital “C” seems to be useful for coming at this problem sideways, whereas mission statements and manifestos and company handbooks seem to be partially-successful-but-high-cost methods of solving it directly.)


3. Safety versus standards.

It seems reasonably well-established to me that feeling under threat, insecure, or anxious reduces one’s capacity for action. If you’re not feeling safe in a given cultural context, you’ll be spending less than your full attention on the goals endorsed by that context, because you’ll be diverting resources toward contingency planning or political maneuvering or cost-benefit analyses or maybe just drowning in cortisol.

At the same time, though, environments of total safety are ones where much less gets done. If we found a company where you’re 100% guaranteed not to be fired for any reason, this will produce high safety, but it will also remove a large swath of the motivational/incentive structure, where people do things because those things are entangled with larger goods like paychecks and a continued sense-of-belonging.

This problem is largely solved in “trivial” cases like being a telemarketer — you have basically one number to maximize, and if you do so above a clear threshold you’re safe. But in more complicated subcultures with more ambitious and fuzzy goals, it’s a lot harder. To the best of my knowledge, there’s no clear model or technique that tells e.g. the manager of a startup how to balance fostering safety against holding high standards which might require telling people they aren’t measuring up. This loops back to problem #2, in that if standards and communication aren’t clear and good, you and I will often have starkly different opinions about whether you’re taking “right action.”


4. Productivity versus relevance.

This one is related to #3, in my head, and also a close match or rhyme with the Goodhart problem. Essentially, there’s always something you could do that seems productive and results in the appearance and qualia of busy-ness and ends up with lots of pieces of paper moved around or lots of objects getting created and shipped.

But in practice there often seems to be a tradeoff between the amount of widgets produced and the goodness/meaningfulness of those widgets. Answering the question “is this really what we’re here for, though?” is hard in a lot of cases, and in particular that question often seems to cause schism between subsets of the group who are itching to get started and subsets of the group who are allergic to lost purposes and wasted time.

One solution to this problem is to have a clear group hierarchy and someone who’s skilled at operationalization and decision-making — then the problem is reduced to “well, is Steve convinced?” But for pursuits that require multiple heroes coordinating together in spaces that are ill-defined and pre-paradigmatic, taking that road sacrifices the majority of the group’s potential.


5. Sovereignty versus cooperation.

The whole point of joining together in groups (at least, according to me) is that you gain access to resources and potential that are beyond the reach of a single individual.

“When the snows fall and the white winds blow, the Lone Wolf dies, but the pack survives.” — Ned Stark, to Arya Stark

However, groups cannot function healthily if they’re under threat of dissolution at any moment, or if mutiny is always one step away, or if any given “organ” in the larger whole might simply stop functioning or disappear. Participation in a group requires the surrender of at least some of one’s sovereignty—the scaffold has to be stable enough that people are willing to start building using it.

Allegedly, the payoff for your partial surrender is access to levers of power via the ability to gain status within the group and influence its direction or its internal workings, or via the fact that multiple other people will add their resources toward the shared goal that you personally desire, or via the fact that existence within a web allows you to specialize without starving.

But if the group frustrates you such that you feel like you have no levers to influence it, or if the target the group aims for is not the one you actually wanted, or if the group takes more from you in one domain than you get back in the other, then the whole proposition starts to lose its appeal. To the best of my knowledge, there’s no grand unifying model that turns these murky dynamics into something measurable, testable, and tinkerable.


6. Moloch and the problem of distributed moral action.

In both Harry Potter canon and the popular fanfic Harry Potter and the Methods of Rationality, there is a prison called Azkaban that is essentially a living hell. If any single individual took it upon themselves to a) sit in judgment of their fellow humans, b) sentence them to torture-until-death, c) drag them off to a prison, and d) directly enact the torture, themselves, we’d be horrified and would assign that individual several rather pointed labels, least among them “sadist” and “evil.”

But the people who proposed Azkaban are not the people who built it, and the people who built it are not the people who paid for it, and the people who paid for it are not the people who guard it, and the people who guard it are not the people who send prisoners to it. The implicit claim, I think, was that by distributing these processes society reduces the chances that one corrupt interest will turn the whole edifice to evil ends. But in practice, what happens is that no one individual is culpable for an edifice that is already evil.

If I find Azkaban to be a horror, and wish to disincentivize its creation and continued existence, whom do I punish? It seems that the answer must be either “no one” or “everyone,” both of which are problematic.

(For a real-world example, there were recently people in my social circle encouraging people to look up the contact information and addresses of the offices of U.S. Immigration and Customs Enforcement, for the explicit purpose of harassing the employees for working at an organization that tortures people and tears apart families and other questionably-charitable-summaries-of-what-ICE-is-and-does.

My objection was that this would produce a lot of catharsis on the part of the shouters, but that a) it wasn’t a particularly effective process for bringing about change even if it did drive some ICE employees to quit, and b) it presupposed a lot of things about the lives and motivations of ICE employees, including that they could easily get jobs elsewhere and that they were not themselves interested in changing the org’s direction.)

This is one instantiation of the broader dynamic that Scott Alexander was pointing at in his essay Meditations on Moloch. When you distribute processes across a population, individuals and subgroups acting on local incentives follow gradients that, when combined, lead to disaster. This is how we get arms races and cold wars and rampant pollution and global warming. It’s related to the tragedy of the commons and to iterated prisoner’s dilemmas (with n participants), and to the best of my knowledge we’re only juuuuuust starting to find techniques which might promise a better future.

(cf. Functional Decision Theory, which is largely framed in individual terms but has far-reaching ramifications in populations where lots of people are using it.)


There are more open problems in group rationality than just the six outlined above. For instance, what the hell is conflict resolution, and how does it work? Or, to what extent are institutions like the United States Constitution unjust, because no one currently living actually opted into them? Or, how does one decide when to maximize good within an existing broken social framework versus when to tear it down and replace it with a new one?

But I don’t want to be the only one pretending to see what’s going on. Strong request for all readers who can spare a minute to actually spare that minute (by the clock) to brainstorm other open problems, or reasons why my frame is wrong, or links to places with answers to any of the above, or something else usefully contributive.