SociaLLM: proposal for a language model design for personalised apps, social science, and AI safety research

Roman Leventov
Engineering Ideas
Published in
4 min readDec 19, 2023


I first proposed this model here, as a base model for a proposed app to improve global online discourse through personalised comment ordering on all websites.

This post is also a response to the “Reverse-engineering prosociality” agenda described in the post “The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda”.

Architecture and training

SociaLLM is (note: this is a proposal, the model hasn’t been trained (or even designed in detail) yet!) a foundation language model to be trained on chat, dialogue, and forum data where the identities of message authors (called “users” below) are stable and all messages have timestamps so we can have a global order of them.

SociaLLM builds upon the Mamba architecture which is a language model with so-called state-space modelling (SSM) blocks instead of self-attention blocks. The model combines SSM blocks that track three separate message streams:
(1) the “local conversation”/flow of messages (which is exactly the training regime of the current LLMs);
(2) the message history of the particular user as well as their general “reading history”, which in the forum data could be approximated as previous N (1–10) messages before every user’s message;
(3) the message history of the particular interlocutor of the user, which is the subset of the general “reading history” from the previous point, authored by a particular other user.

Training this model would cost from 2 times (on a purely 1–1 dialogue data) to ~10–15 times (on chat room and forum data where messages from the most active users tend to be mixed very well) more than the training of the current LLMs. The data should be wrangled to create training sub-datasets from the perspective of each user pair, but otherwise, the training shouldn’t be much fancier or more complicated than the current distributed training algorithms for LLMs (it seems to me).

The first upside of this model is that we can create (what seems to be) strong inductive biases towards developing a large self-other overlap (see also this AI Safety Camp project by AE Studio):
(1) connecting the “user’s own” SSM blocks and interlocutor’s SSM blocks into the residual stream symmetrically (maybe just through parallel connection, as in multi-head attention);
(2) using the same weights for the user’s own and interlocutor’s SSM blocks (at inference time blocks are separate and track states separately, but their weights are the same and updated in lockstep batch after batch); and
(3) probably some extra regularisation techniques, such as intermittent “forgetting” of the either user’s own or interlocutor’s state (which is not completely unlike some real-world situations for humans: sometimes people tell us that we met before but we don’t remember them) and thus teaching the model to degrade gracefully under these circumstances.

Industrial applications

As I already mentioned at the beginning of the post, I originally thought about this model as a base model that can be fine-tuned to predict whether the human user will find this or that information novel, insightful, boring, helpful, saddening, fun, and so on. This fine-tuned model, in turn, could be used within a browser extension to reorder comments on websites (YouTube, Reddit, Facebook, Twitter feed or replies, NYT, The Guardian, etc.) to order the “good” or “informationally valuable” comments first, which (I hope) should change the dynamics of the online echo chambers.

More generally, SociaLLM can improve almost all applications that currently use LLMs and for which personalisation the raw reasoning and creative power: personalised content recommendations and filtering, customer service and engagement, education and language learning assistants, mental health and personal counselling (a-la Pi AI).

In the media and entertainment industries, SociaLLM could also be helpful in narrative analysis (for mass media products, such as movies and novellas) and interactive storytelling for the new forms of media and games.

There are also possible applications that enhance the collective intelligence of teams:

Research and AI safety applications

The value of SociaLLM in social science research should be obvious: it could be directly used for research and experiments in language intentionality, Theory of Mind, social group or team dynamics, etc.

Beware: the discussion below is somewhat above my pay grade in terms of statistics and ML theory. Take it with a grain and salt, and if something looks to you wrong in it, please point it out.

Collective intelligence mechanisms and research (such as “Collective Intelligence in Human-AI Teams” mentioned above) often require the measure of the information content of the messages that agents send to each other. For SociaLLM to provide such a measure, the user’s own and interlocutor’s SSM blocks must use the same weights (as suggested above), so we can these SSM blocks as producing the same state representation structure.

Also, for such an informational measure, the SSM blocks should simultaneously provide the energy measure of the current state, i.e., the SSM blocks should simultaneously be Energy-Based Models (EBMs). I’m not sure how to engineer this into SSM blocks. Maybe the techniques from the “Recurrent Neural Filters” paper (Lim, Zohren, and Roberts, 2020) should help, where the Error Correction term aka auto-encoding (posterior) error can be used as the current state’s energy. If you have other ideas on how to turn SSM models into (quasi-)energy-based models (or better yet, Bayesian models, but this seems a taller order), please share.

On the AI safety front, SociaLLM could also be used to study (social) deception (e.g., when analysing Diplomacy game logs) and collusion, and, perhaps, help to engineer and test the mechanisms to disincentivise or prevent deception and collusion in AI teams aka agencies.

This post has been originally published on LessWrong.