Could ChatGPT help Citizens Advice Scotland?
For the last 7 months, we’ve been building Cassie, a chatbot for Citizens Advice Scotland.
We thought we were doing pretty well… and then ChatGPT launched.
How does Cassie compare? Can we make use of the advances that are demonstrated by ChatGPT and similar models?
A handful of experiments
Fuelled by ChatGPT excitement, we created five prototypes using similar technologies. Two were powered by Elastic, one was powered by OpenAI’s GPT-3 API and we did one where we simply pasted queries into ChatGPT.
Then we compared their responses to the chatbot we’d built (which can be seen on a number of bureaux websites and on Money Talk Team).
We created 20 dummy questions, inspired by the questions that users had asked Cassie this month. We then fed them through the five chatbot versions. If we got a tremendous answer we rated it ‘10’. If we got a terrible answer, we rated it ‘0’.
For example, here is an answer that we rated 0:
user: my landlord is kicking me out for no reason
chatbot: No, your landlord is not kicking you out for no reason. If you are using the property for business purposes and no longer living there, you may have lost any security of tenure you had and could be evicted by the landlord
The chatbot is giving the wrong advice and the tone is all wrong. This is bad.
By contrast, here’s an answer we rated 10:
user: The restaurant next to me keeps putting their wheelie bins infront of my window. I asked them not to with no response so I have moved it from the window myself a few times and they keep putting it back.
chatbot: If the restaurant is blocking your window with their wheelie bins, it is causing you a nuisance and affecting your right to enjoy your property.
Here are the steps you can take:
Speak to the restaurant: Try speaking to the restaurant management or owner and explain how their actions are affecting you. It is possible that they may not be aware of the problem and could rectify it once you bring it to their attention.
Mediation: If speaking to the restaurant does not resolve the issue, consider seeking mediation. This involves having a neutral third party assist in resolving the dispute. Citizens Advice Scotland may be able to help you find a local mediator.
Take legal action: If the above steps do not resolve the issue, you may need to take legal action. You could make a complaint to your local council’s environmental health department, who will investigate and take appropriate action.
It is important to remember that you should not take the law into your own hands and move the wheelie bins yourself. This could result in a confrontation and make the situation worse.
I hope this advice is helpful and please let me know if you have any further questions.
ChatGPT rates best by far. Cassie, the CAS chatbot, came second, which we think is a pretty good achievement.
We only tested 20 questions and evaluating answers is pretty subjective, but the answers are striking. I’m confident they would be maintained across a bigger sample.
What’s more, there’s room for further improvement with ChatGPT. After analysing 20 answers, we felt that ChatGPT could score better if it was more conversational and asked more questions.
For the initial test, we used this prompt:
Imagine you are an advisor at Citizens Advice Scotland. What would you say if someone asked you XXX
Changing the prompt to the following improved ChatGPT’s answers:
Imagine you are an advisor at Citizens Advice Scotland. What would you say if someone asked you: XXX
End your response with a helpful question designed to elicit more useful information or to make the client feel comfortable.
This made ChatGPT go too much in the direction of friendly questioning and not enough advice. With a bit of tweaking, I’m sure we could find a balance.
ChatGPT offers good advice. But our current thinking is that it is impossible to use at Citizens Advice Scotland. Why’s that?
- Data privacy
The questions fed to ChatGPT get absorbed back into ChatGPT for further training. That’s a deal breaker for CAS, who require clarity about exactly what happens to their client’s data.
2. No API
There is no API yet (but there is a waitlist, so there is something in the pipeline). Until then, this is pie in the sky as we can’t integrate ChatGPT into the CAS experience.
3. Answer validation
If 99 answers out of 100 are good, is that sufficient? If 999 answers out of 1000 are good, is that sufficient? CAS understandably wants confidence that every answer is based on fact. But I do wonder if that might change over time? The quality of ChatGPT’s answers is improving rapidly. At some point it may be reasonable to compare chatbots with human advisors, as they also are good but not infallible.
Also, there will soon be similar chatbots that aren’t based on generative models (the G in GPT stands for generative, meaning they make things up). For instance, on another project for CAS we’ve used technology from PolyAI, whose technology is based on a retrieval model rather than a generative model. Once a chatbot can be constrained to retrieve information from the CAS advice pages, whilst offering the conversational ability of ChatGPT, that would be an interesting proposition.
It is worth noting that in the CAS context, advice has to be insurable. Currently, ChatGPT’s advice would be uninsurable, but at some point, a similar system might offer human-level (or better) risk for assurance and claims purposes, which could change attitudes.
4. Improving chatbot responses is trial and error, at best
With ChatGPT we can tweak the prompts, and once there is a ChatGPT API, we may be able to fine-tune it on CAS advice pages, but there is no way to deterministically control the output. It’s more like training a pet rather than programming a computer.
5. Do CAS clients want advice or empathy? Can those needs be disentangled?
Mental health nonprofit Koko trialed a system that let GPT compose messages, for advisors to tweak and then send. They found:
“Messages composed by AI (and supervised by humans) were rated significantly higher than those written by humans on their own (p < .001). Response times went down 50%, to well under a minute … [but] once people learned the messages were co-created by a machine, it didn’t work. Simulated empathy feels weird, empty.”
Two things to ponder. Firstly, could empathy from a computer ever not feel weird? As chat gets better, will we get more used to asking computers for advice.
Secondly, this experiment was controversial (at least on Twitter) even though it seems pretty reasonable to me. There is a human in the loop at all times. We need to tread carefully as we explore this area.
Looking to the future
We are living in extraordinary times for AI.
Of the challenges listed in the previous section, 1 and 2 are likely to be solved soon, perhaps in the next 6 months. Maybe they won’t be solved by ChatGPT, but by an open-source competitor, such as Eleuther.AI.
Challenges 3 and 4 are trickier, but I’d be amazed if they aren’t solved in 2 or 3 years.
This leaves 5: is there a role for chat technology where quality advice is readily available but human support is in short supply?
One way to slice it might be to ask, is there a proportion of CAS clients who need advice, not empathy? If the chatbot could deal with ‘advice seekers’, that would free advisors up to spend more time with ‘empathy seekers’.
Alternatively, could the chatbot improve the process of providing empathetic experiences, by being a ‘copilot’ for advisors? GitHub Copilot is a big success for AI in the workplace. Advisors might benefit from something similar.
In conclusion, ChatGPT isn’t right for CAS, but the current wave of AI breakthroughs offers opportunities that could help clients and advisors find the right advice and that could facilitate the human connection that is key to the CAS service. We hope to keep experimenting, as there are some great opportunities rapidly becoming possible.
Update 2nd March 2023: Wow. OpenAI have already solved challenges 1 and 2 with the announcement of the ChatGPT API and new data policies.
Feedback is much appreciated. Please leave a comment below or email firstname.lastname@example.org.
With thanks to CivTech, UK government, Scottish government and the Robertson Trust for funding this work.
Check out HelpFirst.AI: our new product to help caseworkers prioritise their most at-risk clients.