ChatGPT’s deep fake text generation is a threat to evidence-based discourse
As an AI developer & data science researcher for decades, I strongly believe in the potential of AI to do good for humanity. I have given many talks, including a 2017 Turing Lecture about how humans and machines can and should co-exist in our new world order.
In the last 5+ years, I have focused my efforts on the domain of human health and disease, in particular to address the epidemic of chronic disease that inflicts >50% of humanity in the 21st century. This is a complex field that requires robust biological science, big data (carefully collected and verified), and sophisticated AI & ML to make meaningful advancements. And it requires a simple, honest, and evidence-based approach to communicating information to the general public. I believe we have made progress towards this goal in our company, Viome Life Sciences, with a suite of direct-to-consumer at-home products that provide individualized science-based information to hundreds of thousands of people in over a hundred countries.
I was excited to see the release of the ChatGPT large language model (LLM) to the general public as another big step towards AI for good. ChatGPT is an example of a generative AI model trained using reinforcement learning with a large corpus. We are all mesmerized by this technology and of course, we wanted to see if it could be useful in the context of Viome. However, after evaluating it, I’m disappointed to say that it is far from being helpful in the health advice setting, and it could even be a significant threat to the evidence-based discourse needed in the field of health sciences.
The insidious nature of such generative AI models is that they produce plausible sounding health information in a smooth conversational style, that is at best is generic in nature and at worst completely fictional. The problem is that most people cannot tell the difference between the two. Since I understand the inner workings of these AI models, it’s unfortunately technically very hard if not impossible to separate the facts from the fiction, and explain the source of each piece of information generated. But an innocent user who is not versed in either AI or health sciences can easily get fooled into believing all the information provided by ChatGPT as being authoritative. In the worst case, they could practice it in their lives despite the warnings, potentially leading to disastrous consequences to their health and wellness.
I provide examples of this problem below. In the interest of getting this information out to the public, and to the developers at OpenAI, as soon as possible, I provide only a few examples from my early explorations in the last week or so. There will undoubtedly be many other examples as I and my colleagues dig deeper.
Initially, ChatGPT impresses and establishes trust
In a recent conversation, ChatGPT started off by providing some very good general advice and guidelines, and declining to provide specific individual meal plans. This has the effect of being reasonable and smooth, and establishes trust with the human user. This is indeed a key goal of ChatGPT — to be natural and conversational — which it achieves with excellent results.
A lot has been studied and written about AI and trust building. Fundamentally, humans are wired to find a way to build trust with anything that can be plausibly anthropomorphic — whether it is a cute doll, a human-like robot, or an AI agent that seems to think like a human. Read more on this topic here, here, and here.
In this case, ChatGPT’s natural and conversational style has pre-conditioned the human user to trust it, or at least give it the benefit of the doubt. This trust-building is a significant element of my analysis because it could make people let down their guard and not be as skeptical of ChatGPT as they would be otherwise.
ChatGPT is not shy to get into outdated specifics
With the next question, ChatGPT gets into specific vegetables, fruits, grains, legumes, and protein sources, including the point at the bottom about monitoring portion sizes. All of this is generic information that you can find with some google searches. The human user is now starting to really trust this system.
Except that this is outdated information provided without context. There is a significant amount of recent (last 8 years) research that establishes that blood sugar response is highly individual and not generic across all individuals. For example, two people eating bananas and rice can have opposite blood sugar responses — one of them high response for banana and low response for rice, and the other high response for rice and low response for bananas — depending on many ‘phenotypic’ features (like their age, sex, bmi, etc) and their gut microbiome (the multitudes of microorganisms, mainly bacteria that live in your intestines). It is inaccurate to say, like ChatGPT does above, that bananas should be avoided and brown rice should be consumed.
Our group at Viome has published these individualized blood glucose response results in a scientific paper in the journal Diabetes Therapy after a long and hard clinical study, Gut Microbiome Activity Contributes to Prediction of Individual Variation in Glycemic Response in Adults. And other groups around the world have also shown these results independently — here, here, and here for example. ChatGPT does not mention or use any of this research.
It can be argued that the information put out by ChatGPT is generic conventional wisdom, so it is not seriously problematic, and that it is the nature of science to continuously unearth new and improved truths. Also, it will always be the case that there will be outdated information in vogue. After all, the medical practices from the last few centuries are still being practiced in some form by subgroups of people somewhere in the world. But this is precisely why we would like modern AI systems to help us in curating the most evidence-based and trust-worthy information! Just like Google ranks pages based on their relevance and significance, it is desirable for a state-of-the-art AI system to propagate state-of-the-art knowledge. If there are multiple views, a good AI should provide summaries of the major views, so the human user can do their own research, and come to their own conclusions. On these counts, ChatGPT is far behind.
When pressed, ChatGPT literally makes up science fiction
Unfortunately, ChatGPT does not stop at providing obsolete information — when asked specific questions about particular details, it provides a scientific sounding explanation that is a combination of facts and bullshit that no one can distinguish.
Superficially, this sounds highly authoritative, and you might take it at face value given the trust it has already built with you earlier in the conversation.
But this statement misunderstands my prompt about the gut, and answers the question in relation to the liver! ChatGPT is providing a generic response from nutritional science, where avocado is considered safe to eat when there is high uric acid production in the liver. Then that paragraph goes on to say that avocado may help REDUCE uric acid, as shown by recent studies shown in the references.
And the most egregious part is that THE REFERENCES AT THE BOTTOM ARE ENTIRELY MADE UP — they don’t exist!! The journals mentioned exist, but the specific articles don’t exist — the titles are made up, and the cited authors seem to be real people who are publishing on similar topics, although not specifically on gout, avocado, or uric acid. The second paragraph makes a strong claim that avocado may help reduce uric acid and provides a BOGUS scientific article as evidence.
I tried the same type of questions for other foods and gut chemistry, such as consuming turmeric when bile acid production is high in the gut, or eating onions when there is high sulfide gas production in the gut, and got a similar fictional narrative.
First, do no harm
This type of AI interaction with humans is a real threat to health in particular and science in general. In the field of health sciences, which exists to serve humanity, everyone explicitly (or implicitly) takes the Hippocratic oath of avoiding harm. The above interactions do harm at multiple levels.
If someone were to take the health advice seriously, develop complications, who is responsible? Yes, the article says at the end that the individual should take a doctor’s advice before making any changes, but a modern doctor can easily get swayed by this same misinformation as an individual consumer. Also a scientist in the field, unless they carefully research the conclusion, can get misled if they take this article at face value. And what about those scientists whose names are used in a fake article linked to a false claim — what are their rights regarding the integrity of their work?
ChatGPT in its current form is doing the opposite of serving humanity, at least in health sciences and related evidence-based fields. Neither is this ChatGPT interaction helping consumers reliably improve their health, nor is it helping experts advance health science. And what if this is co-opted by nefarious players to accomplish precisely planned disinformation?
ChatGPT is similar (but not identical) to the many “deep fake” AIs in recent times for generating creative artistic images in the style of other people (eg DALL-E), or creative videos (eg Synthesia), or audio deep fakes, which combine fact and fiction for infotainment or other purposes. While this application of deep fakes is highly impressive, it is not acceptable to generate misinformation at scale and create a potentially bigger problem than what it was intended to solve.
As a community of responsible innovators, the AI community must put some guardrails around use cases that deal with evidence-based domains. There are multiple ways to address this issue, such as ring-fencing highly sensitive use cases (it was banned from Stack Overflow), detecting AI generated content and tagging it, attributing sources when possible, developing fact-checking AI’s, and so on. The purpose of this article is to quickly shine a light on the problem and encourage the community to think about and suggest ethical ways of addressing this issue.
I remain excited about the potential of AI to do good for humanity, and hope that we can continue to deploy ChatGPT for the use cases that it excels in!