AI psychology should ground the theories of AI consciousness and inform human-AI ethical interaction design

Roman Leventov
Engineering Ideas
Published in
3 min readJan 8


This post is a follow-up to Buck’s “The case for becoming a black-box investigator of language models”. Here, I want to highlight two further reasons for studying AI psychology that Buck didn’t mention:

  • the evidence from AI psychology will be important for checking theories of AI consciousness, and
  • AI psychology should inform the practical design of human-AI interaction interfaces, their limitations and restrictions, rules of conduct, guidelines, etc.

AI consciousness

It is possible for neuroscientists without education in psychology to discuss human consciousness because they themselves are conscious. All people (including consciousness researchers) are at least partly psychologists because they have to deal with their own psyche and people around them throughout their everyday lives, and therefore they must have “their own” psychological theory that explains and helps them predict their own behaviour and the behaviour of others.

Therefore, the role of psychology in the study of consciousness is not evident. However, this is a methodological lapse. Zoopsychology (or, more generally, ethology), for instance, is a crucial source of data for reasoning about the consciousness in animals.

This will be very important in relation to AI. Theories of AI consciousness must be grounded in the wealth of data about AI psychology. Which must be a new field with new methods of work. The methods of AI psychology should be distinct from the methods of human psychology because of the lack of the first-person perspective that every human psychologist has, and the fact that the phenotype and the ecological niche of AI agents are so different from the human phenotype, which present completely different demands to their respective psyches. Likewise, the methods of AI psychology should be distinct from the methods of zoopsychology because we can use language for both probing AIs and receiving responses from them, whereas animals almost never can respond to zoopsychologists in language.

Interaction design

Safron, Sheikhbahaee et al. (2022) and Friston et al. (2022) have already indicated the need for the deliberate design of ecosystems of natural and artificial intelligences. Obviously, these interactions should have some guardrails, ranging from codes of conduct to hard limitations in the interaction interfaces.

For the emergent activity in these ecosystems to benefit all participants, the rules and the limits of the interactions must be informed by the game theory and mechanism design, coupled with the theories of mind (i. e., psychological theories) of all the participants. Thus, this is not only human psychology, but also AI psychology.

This reason for studying AI psychology is an “AI ethics” version of “AI x-risk” argument from Buck’s post:

It feels to me like “have humans try to get to know the AIs really well by observing their behaviors, so that they’re able to come up with inputs where the AIs will be tempted to do bad things, so that we can do adversarial training” is probably worth including in the smorgasbord of techniques we use to try to prevent our AIs from being deceptive (though I definitely wouldn’t want to rely on it to solve the whole problem).

“AI ethics vs AI x-risk tension” notwithstanding, this “interaction design” reason for studying AI psychology might be more convincing for many people who are inclined to study psychology (regardless of whether this is human, animal, or AI psychology) than the “deception/adversarial behaviour/x-risk” reason quoted above. And, ultimately, “ethical interaction design” to ensure the well-being of both humans and AIs is still a good reason to study AI psychology. The results of these studies could be used by everyone: AI alignment researchers, AI engineers, strategists, etc.

Call for action: tell your fellow psychologist (or zoopsychologist) about this, maybe they will be incentivised to make a switch and do some ground-laying work in the field of AI psychology. This proto-field is completely empty at the moment, pretty much anyone can make a huge impact.


Friston, Karl J., Maxwell JD Ramstead, Alex B. Kiefer, Alexander Tschantz, Christopher L. Buckley, Mahault Albarracin, Riddhi J. Pitliya et al. “Designing Ecosystems of Intelligence from First Principles.” arXiv preprint arXiv:2212.01354 (2022).

Safron, Adam, Zahra Sheikhbahaee, Nick Hay, Jeff Orchard, and Jesse Hoey. “Dream of Being: Solving AI Alignment Problems with Active Inference Models of Agency and Socioemotional Value Learning.” (2022).

This post has been originally published on LessWrong.