From using virtual characters to understand people to understanding people to build virtual characters

Creation: Open Minds
Creation: Being Human
6 min readFeb 27, 2018

Contributed by Dr Harry Farmer, Institute of Cognitive Neuroscience, University College London

The past decade has seen a rapid increase in the use of computer generated virtual characters (VCs) across a range of industries. This increased interest has been driven by technological innovations in several fields including advances in the realism of computer graphics; in the ability of artificial agents to interact with people and, perhaps most importantly, recent developments in the use of head mounted displays and motion tracking to deliver experiences in virtual and augmented reality. These advances raise the possibility of a future in which we regularly interact with such VCs in many walks of life from personal sales assistants to teaching and therapy.

The increasing sophistication of these VCs is of particular interest to researchers working to understand human behaviour. For the last four years I have been working as a researcher in Dr Antonia Hamilton’s Social Neuroscience group at University College London’s Institute of Cognitive Neuroscience. One of our lab’s key aims is to understand the role of non-verbal behaviour in social interaction. That is, how bodily actions such as our facial expressions, postures and gestures are used to both structure our overall social interactions and to signal our attitude towards interaction partners. One particularly well studied part of this work is imitation or mimicry in which people tend to copy the actions made by others. A quick Google Scholar search suggests that since the year 2000 there have been around 160,000 academic papers that have touched on imitation\ mimicry in humans.

Despite the large amount of attention, previous research in this area has tended to fall in to one of two categories. One method is to employ naturalistic studies in which a researcher (known as a confederate) interacts with an experimental participant and either deliberately copies the participants actions and then measures how the participant responds or else makes some set number of gestures and counts how often the participant copies them. This method allows for a relatively natural setting which means that it’s easier to draw conclusions about real behaviour but suffers from a lack of experimental control and the risk that the confederate may unconsciously influence the results by behaving differently in different conditions. The other commonly used method is to use computerised tasks in which people observe videos of other people making gestures and have to produce either the same or a different gesture. In this situation research has found that people are faster to respond when making the gesture they have seen compared to another gesture and the difference in reaction speed between the two conditions can be used to measure how inclined to imitate other’s actions the participant is. However, while this method offers greater experimental control it is much harder to apply the findings to real world situations.

In the Social Neuroscience group we combined VCs with motion tracking technology in order to gain the best of both worlds, an interaction partner who we could put in a realistic setting but whose behaviour we could precisely control to increase our experimental efficiency and remove the possibility of experimenter bias affecting our results. Using this technique Jo Hale, one of the group’s PhD students, was able to create setups in which our participants could interact with a VC which would either imitate their own head movements and posture with a few seconds delay or else would simply play back the movements of a previous participant.

At this point readers may be thinking that while I’ve made a case for the use of VCs in research on human behaviour it’s not clear that the outcome of our work should be of much interest to those using VCs outside of the cognitive sciences. To do so I would like to try and set out three key lessons my colleagues and I learnt while doing this research that might be of interest for anyone working towards the construction of truly interactive VCs.

One early lesson we learnt was that in contrast to when using confederates, where one of the key issues is to ensure that they stick to a script and try to minimise the amount of additional behaviour as much as possible, when dealing with VCs a key problem is that any behaviour you want them to have must be programmed in by the experimenter. For example, we spent several weeks thinking that there was something very odd about the VC before we realised that we had forgotten to program in blinking to our control script. While this is a particularly obvious and easily fixed example the inability of VCs to realistically model human behaviour can often be more subtle and can contribute to the VC falling into the well-known “Uncanny Valley”.

A second key lesson of our work was that at present the amount of information that VCs can process about their human interaction partners is still very limited. Our VCs only received information about participants head movements and posture and could only use this information to drive its own action, something which of course greatly limited the naturalism of our partner. Part of this reflects the enormous difficulty of building a VC that has a capacity to perceive and respond in anything like the level of an actual human. However, even if building a fully interactive AI is a long way off, recent advances combining deep learning algorithms with sensor technology mean that designing a VC that can use information about their partner’s gaze and facial expressions to guide their actions is already possible.

But even if a VC can learn what a person is looking at or what expression they are making what should it do with that knowledge? The final lesson I draw from our work is that to truly build an interactive VC it is necessary for us to have a much richer understanding of the fine grained aspects of everyday human interaction. To go back to the importance of blinking, recent research has found that listeners often use longer blinks at the end of a speaker’s “turn constructional units” (TCUs), which is a fancy way of saying those points in a conversation where the listener could jump in with their own point. In essence this suggests that long blinks have a role in signalling that the listener gets what the speaker is saying and wants them to move on the next point. So instead of our own solution to the blinking issue which was to simply tell the VC to randomly blink every few seconds, what we should have done was to have our VC decode the speech of the participant for characteristics of the end of a TCU and then increase the length of blinks whenever those characteristics were found. Of course blinking is only one of many signals that help to structure our interactions. The task of creating a fully responsive VC will involve identifying the most significant of these signals developing the technology that will allow a VC to both perceive and produce those signals reliably.

This final task is not easy and will not be achieved by either computer scientists or cognitive scientists alone. Rather it is a job that will require a truly interdisciplinary approach combining technology and computational expertise with cutting edge research into human interaction. A key first step will be greatly improving our knowledge of human interaction and work currently being undertaken by our group is aiming to do that by using motion tracking and face recognition technology with deep learning in order to better analyse real world interactions. Using such methods we are building on our previous research by moving from using VCs to study human interaction to using greater understanding of human interaction to inform the design of VCs.

--

--

Creation: Open Minds
Creation: Being Human

Global communications agency. Got a marketing challenge that needs fresh thinking? We're creative problem solvers, working with some of the world's best brands.