A Killswitch For AI … Daisy, Daisy?

Published in

ASecuritySite: When Bob Met Alice

5 min readMar 3, 2024

We have some serious questions to answer over the next few years. Will AI eventually destroy our society? Will it make us brainless zombies? Will it see us as a dog, in the way that humans see dogs?

I have been presenting a range of talks on AI and Cybersecurity over the past few weeks, and especially on the effects that AI will have on our society. Often I use 2001: Space Oddesy as a starting point for my talks. While I mostly outline the point that Dave is not allowed to go back into the spaceship, it is the switching off of HAL that is increasingly a key highlight in the film:

And, did you know that the Daisy Bell song was the first song to be song on a computer? Well, now OpenAI — famous for ChatGPT, DALLE-2 and Sora — has a job which calls for a Killswitch Engineer:

The salary is high, as the role requires a high degree of technical proficiency, such as in understanding cloud architectures, LLM integration, and machine learning.

While, it feels like a joke, and where they need someone to unplug the servers in case AI turns against us, there is a serious side to this. Overall, it’s a complex question to decide which parts of the architecture should be turned off in the event of AI going rogue. What happens if it start to do unethical things, like, “Vote for Trump”, or “Go and kill you neighour”. For this, the Killswitch Engineer will have to be able to spot signs of bad behaviour, such as deviating away from societally agreed norms. So, before reaching for the killswitch, the engineer will thus have a role in switching parts of the architecture off, in order to avoid a complete power off. So, just like in cybersecurity, the job will involve incident response teams, and in diagnosing problems with the AI engine.

But, this is not the first time we have encountered killswitches for AI, so meet Microsoft Tay.

Tay

Anyone who has children will know that you shouldn’t swear in from of them, as they will pick up bad language. You should also avoid being bigoted, racist, and all the other bad things that children can pick-up.

So Microsoft decided to release their “child” (an AI chatbot) onto the Internet last week — named Tay (not after the River Tay, of course) and came with the cool promotion of:

“Microsoft’s AI fam from the internet that’s got zero chill”.

Unfortunately it ended up learning some of the worst things that human nature has to offer. In the end Microsoft had to put her to sleep (“offline”) so that it could unlearn all the bad things that it had learnt:

In the end she spiralled completely out of control, and was perhaps rather shocked with the depth of the questions she was being asked to engage with:

Microsoft’s aim was to get up-to-speed on creating a bot which could converse with users and learn from their prompts, but it ended learning from racists, trolls and troublemakers. In the end it was spouting racial slurs, along with defending white-supremacist propaganda and calls for genocide.

After learning from the lowest-levels of the Internet, and posting over 92K tweets, Tay was put to sleep to think over what she had learnt (and most of her tweets have now been deleted):

c u soon humans need sleep now so many conversations today thx

She was also promoted as:

The more you talk the smarter Tay gets

but she ended up spiralling downwards, and talking to the kind of people you won’t want your children to talk to on-line. At present she is gain thousands of new followers, but has want strangle silent.

As soon as she went offline there was a wave of people keen to chat with her and posting to the #justicefortay hash tag:

Some even called for AI entities to have rights:

Meet Norman and Bad AI

Norman — and where the name is derived from the famous Psycho movie — is a part of a research project at MIT’s Media Lab, and illustrates the dark end of AI. He (“it”) has been trained around pictures of the world which represent its darker side. These were pictures taken from Reddit of people dying in shocking circumstances.

After being trained on violent images and then being asked to interpret ink blot images — the Rorschach test — the researchers working on the project outlined that Norman’s response was extremely dark, and where every image described murder and violence. Alongside Norman, the team also trained another AI agent on pictures of cats, birds and people, and was far more positive about the images shown.

When the AI agents were shown this image:

Norman saw “a man is shot dead”, whereas the other AI agent saw, “a close up of a vase and flowers”. And for this one:

Norman saw, “A man is shot dead in front of his screaming wife”, whereas the other agent saw, “A person holding an umbrella in the air”.

An AI program used by a US court was trained to perform a risk assessment on those accused of crimes, and ended up biased against black prisoners. In New York City, an AI program to predict child abuse was acquised for racial profiling:

and in New Zealand it was found that the AI agent wrongly predicted child abuse more than half the time and in Los Angeles County the false positive rate was more than 95%:

Conclusions

The advancement of AI over the past few years has been unbelievable, and we are just at the start of it having a significant impact on our society.

A Killswitch For AI … Daisy, Daisy?

Tay

Meet Norman and Bad AI

Conclusions

Written by Prof Bill Buchanan OBE FRSE