Can We Put a Lie Detector on ChatGPT?

Bálint Kovács
HCLTech-Starschema Blog
9 min readMar 8, 2023


OpenAI’s ChatGPT is not only a powerful tool for content creation but has also emerged as a potential alternative for a variety of internet search activities — including those relating to information gathering or research — with efficiency over accuracy and completeness as its primary appeal. However, as a fine-tuned version of the GPT3 large language model on dialogs, ChatGPT was created with a primary focus on producing output that imitates human conversations. As such, it doesn’t prioritize factuality or truthfulness in its output, has no built-in fact checking mechanism, and this is a problem for many business and academic use cases.

This is clearly an important modern-day challenge for data science, so let’s explore the prospects — and limits — of creating an accuracy assurance mechanism for a ChatGPT-like content creation AI tool in service of such priorities, e.g. for business and academic use.

Photo by Tingey Injury Law Firm on Unsplash

An intelligent agent that can give truthful, actionable answers to questions with varying degrees of domain-specificity could drive a rarely seen boost in human productivity. It could transform the paralysis of action that occurs when trying to evaluate search engine results into a more engaging and human-like interaction that delivers a truly authoritative and verified result. Ideally, these interactions would take up less time than our current standard avenues of research without compromising information quality, which would leave human workers with more time to dedicate to higher value-added tasks — or, at the very least, a reduced risk of burnout.

Why ChatGPT Has a Propensity for BS

ChatGPT is based on the large language model GPT3. These models learn from petabytes of text collected mostly from the internet. They learn how words are related to each other by predicting the next word in a text sequence. However, in prominent real-world scenarios, this language mimicking is not exactly what people use these models for — rather, they have specific questions or tasks in mind that they want the model to answer or complete for them. So it’s safe to say that the objective of the language model is in effect misaligned with the intentions of the user. Rectifying this and aligning language models for different tasks is an emerging area of research in natural language processing (NLP).

ChatGPT has some impressive capabilities: from generating code to acting as a terminal and writing a novel, new use cases are constantly popping up. Its previous iteration, InstructGPT was trained with the objective of following the intentions of the users. However, such an objective can lead to the false assumption that what the user wants is “correct.”

In a classic example, it used to be easy to confuse ChatGPT by asking it to explain why Mars is the smallest planet: everything about the following answer was wrong, because the question contained the assumption that Mars is the smallest planet — which ChatGPT assumed is correct — when in reality it’s not.

ChatGPT’s response to the Mars question. Note that OpenAI has since improved ChatGPT’s ability to give a reliable answer to this particular question, but we were still able to reproduce the underlying issue at the time of the publication of this article, e.g. by asking why Starschema CEO Tamás Földi was hired by OpenAI — which we’re happy to confirm is not true.

Another issue is that the initial model was trained with information available before September 2021. While there have been updates with new data since then, the lag time regarding information is likely here to stay for the foreseeable future. And providing consistently truthful answers to questions about recent events or dynamically changing scientific fields is impossible without having access to the most current data.

ChatGPT also has problems related to logical reasoning — it’s prone to making incorrect inferences when presented with a set of statements. This could pose a problem when the user needs truthful answers to more complex questions.

Precedent We Can Rely On

A search engine like Google works by building an index of websites based on their content. To manually retrieve results for a query, we first enter a prompt into the engine. Then, the article most similar to the user prompt is returned, while also taking information about the user, such as search history and location, into consideration.

Personalization of results in this manner could be implemented for chatGPT. For search engines, it’s considered a more-or-less solved task: storing location and age data, and extracting topics and sentiments from both long- and short-term searches helps tailor the results to the needs of the user. Such accuracy-enhancing information could similarly be extracted from the prompts we enter into chatGPT.

IBM Watson started out as a question answering system, which famously beat two of the best players in Jeopardy! in 2011. In Jeopardy!, players are given a statement and need to find the question for which this statement is the answer. Watson was trained on web-based sources, including Wikipedia, and learned to extract semantic elements of texts such as the subject and the verb. Then, using this ability, it came up with solutions with probability scores based on how many reliable sources — compiled by humans — contain the same information, added the candidate answer to the sentence and calculated a probability score for that as well.

The key difference between Watson and chatGPT is that the former was trained for a very specific task on relatively specific data, while the latter is a general conversational system. In theory, similar techniques could be used for a ChatGPT-like system, to make it a reliable augmentation to a wide range of human-driven tasks. OpenAI even has a guide for fine-tuning ChatGPT to specific use-cases.

Let’s now turn to the question of how we could enable a model like ChatGPT to validate the truthfulness of its output.

Potential Fixes

One naive approach to enable the model to tell true information from false would be to reference a knowledge base of facts and connect the model to that knowledge base. As in the case of Watson, this can be an aggregation of curated sources such as Wikipedia. A disadvantage of this approach, however, is that maintaining such a knowledge base requires a lot of manual work and limits the capabilities of the model to the information stored in the knowledge base. This also somewhat defeats the intention of reducing human effort through AI integrations.

If, however, a company is only interested in factual answers within a specific domain, it can build an effective fact checker as an extra layer on top of chatGPT. The fact checker can be trained on articles from the relevant domain, with the objective of telling whether an information is true or not. And when it catches chatGPT making a mistake, it can correct or replace the output by generating output from its sources. Training such a model can be achieved with the number of articles not exceeding the relatively reasonable range of hundreds, which makes this approach feasible for smaller companies as well. This is precisely the approach that Got It AI is using to build its Auto ArticleBot.

A different approach is to use a search engine to fetch sources related to a prompt from the internet and use GPT3’s summarization capabilities to create an answer to the prompt. While this would only retrieve information that is deemed true according to the search engine, it solves the problem of putting out information that’s out of date. And while the category of “truth” according to a search engine can sometimes be misleading, it’s often closer to the actual, complete and relevant truth than always assuming the user is correct, like ChatGPT does.

This is how YouChat by and ChatSonic, search engines with chatting capabilities, work. However, they would still make mistakes when asked the question about Mars:

YouChat’s response to the Mars question
ChatSonic’s response to the Mars question successfully retrieved two sites about the actual smallest planet, Mercury — already an improvement — but the produced text contained information that was far from true. Such outputs are called “hallucinations” in the context of AI models: fictitious statements delivered confidently. The answer by ChatSonic contained valid facts, but it did not deny the incorrect presumption of the user.

The Cost of Searching

Broader and business-critical applicability for a technology like chatGPT will not solely depend on its ability to filter information for truthfulness and relevance — it will also need to be a viable proposition in terms of resource demands. We don’t know the exact figures, but based on some estimates, running prompts on the current version of ChatGPT may require four to five times as much computational resources as running a Google search. This is a massive cost that would reduce the profits of Google by billions if they were to use the current version of ChatGPT to provide search services — not to mention it would turn the current list- and ad-based revenue model upside down.

This high computational cost is clearly an obstacle for the technology’s broad and flexible applicability as a search engine replacement. There is ongoing research into a similar topic: reducing the size of large neural networks without the loss of performance, and it’s worth keeping an eye on, as compressing ChatGPT’s footprint promises to be a major step in its development.

Keep Your Expectations in Check

OpenAI is no doubt already working on solving the issues related to the fact checking capabilities of chatGPT. They’ve already shown willingness to hire a large workforce for model training, so building a knowledge base might be a next step as an interim fix. ChatGPT was originally fine-tuned by content writers who created expected outputs for a prompt and ranked different model predictions, and the company might extensively rely on human intuition once again to help evolve the model.

The solutions and useful precedent we discussed above would primarily improve the truth-telling capabilities of chatGPT in domain-specific settings. A domain-agnostic system with perfect general truth-telling capabilities is likely still far away — for one, it’s unlikely that artificial intelligence will be equipped anytime soon to deal with absolutely any logical or linguistic curveball that authentic human intelligence can throw at it.

Avoiding the realm of science fiction, a realistic near-term solution could involve a system that returns results with attached probabilities and references. In such a system, it would ultimately fall on the user to decide which result to accept based on this additional information. This workflow would be similar to the one currently used for self-driving cars, where user supervision is needed to make critical decisions.

If you need help implementing or improving a natural language processing solution to boost your analytics and decision-making capabilities, we at Starschema are here to leverage our expertise in building NLP augmentations for Fortune 500 companies and beyond to help you make the right choices and get the most value out of them. Get in touch — we’d love to talk to you.

About the author

Bálint Kovács is a data scientist at Starschema with a background in software development. He has worked on diverse roles and projects, including as a research fellow and assistant lecturer at a top Hungarian university, a deep learning developer at a big multinational company and, currently, as a consultant data scientist. He enjoys diving deep into user data to uncover hidden insights and leverage them to create effective prototypes. Connect with Bálint on LinkedIn.