ChatGPT: a revolutionary misunderstood AI
This article is a personal opinion on ChatGPT. I’m not a data scientist, only a tech savvy guy that tests and learns of all cutting edge tech stuff.
The year 2022 has been very rich in terms of tech innovations but ChatGPT has definitely been the most popular in Q4. A lot of buzz words, articles with catchy titles, (…) so, a lot but is it always true and realistic?
Let me share my opinion on this revolutionary app, the pro, the cons, the fake and what to expect next.
The power of ChatGPT
Just after a few tests with ChatGPT, it is totally outstanding. The conversational agent is really good, verbose, with a much better English than mine (I’m french native).
Whatever the questions, you have immediately an aggregation of useful information on your topic. A summary of each aspect, and you can ask to deep dive on some part. You no longer have to browse half of the internet with Google Search to get useful information!
You can even ask it to write code samples to perform simple but also complex algorithms like sorts or binary trees.
In short, the conversational experience is totally news and makes the existing ones (Google Assistant, Alexa, Siri,…) totally corny!
But not useless, I will explain later
What (Who) is threatened?
With such power, many have seen ChatGPT as a killer of many companies, and especially the largest about the internet information: Google.
And you would have seen many catchy titles like “Code Red at Alphabet”, “ChatGPT: And if it was the Google Killer?”.
But personally, I think not. At least, not yet or not in that condition. It’s a great prototype, a powerful example of what AI can achieve when we push it to the edge of our compute power and AI research.
But that’s all, and Google won’t be the first victim of ChatGPT
You can note here that some companies choose to switch to ChatGPT and use it as a core product/assistant. That is totally suicidal to rely its business on a prototype!!!
The ChatGPT expertise
An expert or an analyst, very popular on news channels, have to deeply know their topic. For that, they have to browse the internet (with Google Search for example; or other sources, like books in libraries), read dozens/hundreds of articles, extract the interesting parts and release a report that makes sense for the current questions and the context.
That skill requires a lot of knowledge, years of learning and investigation to know even the not well known details/stories. The Ukrainian crisis is a good example. There is the conflict, but, oblast per oblast, there are local stories that are important to know to better understand the stance of each side.
That kind of expertise and analysis are the most threatened by ChatGPT, the kind of knowledge that requires years to get, with reports long to generate, you have now a solution to have the same result in a few questions and a few minutes!!! …or almost…
All experts have biases
In addition to human slowness, experts have biases, also called “opinions”.
When you have political analysts, you obviously know if they are Democrat or Republican. And the conclusion/arguments are in favor of one or another
Same things for sports, religions or other opinionated topics. Experts have biases because they are humans and have emotions, good or bad, which leads to preferences.
I personally am a big fan of Google, an expert on Google Cloud, and I have a huge bias about Google technology!
Is ChatGPT an expert without bias?
Sadly not, ALL experts have biases, ChatGPT also. And it’s not a problem when this bias is known and publicly shared.
For ChatGPT, the bias is not related to emotion, a preferred candidate, sports teams or country. AI is emotionless and the bias doesn’t come from there.
As any kid, the bias comes from how they grow, as they learn. For an AI, the source (and quality) of data will shape its understanding of the world!
Come back to the Ukrainian war. Putin is not more or less stupid as Biden is, he grew in a different context, in the same reality as us, but with another point of view, another opinion on the world.
My point is: if ChatGPT had been built by China or Russia, its answers would be totally different on topics that oppose the way of living of those parts of the world.
In addition, because ChatGPT is emotionless, it’s impossible for it to generate reports on a song, movie or art. ChatGPT can aggregate critics on works, can detect pattern (“that fourniture is gothic style”), but can’t express if a song is catchy or a movie is thrilling or boring by itself.
The ChatGPT Achil heel
As with any AI, the main problem to resolve is the data: data quality, data quantity, data biases, data completeness,…
That’s the first problem with ChatGPT: transparency!
I don’t know the data source on which ChatGPT has been trained, how that source has been curated, cleaned and validated.
So, I don’t know the biases of ChatGPT!
Source of data
I don’t know what is the source of data, but whatever the source, there are several issues
- If the source is too limited, the answer of ChatGPT will be too narrow. Worse, you will generate a single way of thinking, omitting all variation, shade of the history or information. No more than any dictator does!
- If the source is too broad, the curation process will take a very long time, and you will also collect wacky theories and degrade the performance of ChatGPT.
- If a source of information is omitted, for example the darknet, you could miss something.
Indeed, during the Ukrainian conflict, I heard experts saying that a list of spies have been released on the Darknet. If ChatGPT ignores that source, the analysis is partial, and less trustable than a human!
Data quality is of course paramount. And here comes the issue of fake-news. How did the data source handle fake news?
- To use a pre-processing algorithm could detect the fakes and remove them from the training set? Not sure that’s fast enough, and sometimes it takes days, months or more to prove a fake!
- Use a limited and trusted source of data to be sure not to be polluted by fakes.
In addition to the previous limitation mentioned (single way of thinking), there are many examples in history that prove that even experts and researchers/scientific community have been fooled by smart fake publications.
So, how to be sure that no fake-news are still in the training dataset? And, if it’s the case, what is the mechanism to remove that wrong-learning from the system?
It’s a really hard and difficult challenge for the researcher of OpenAI!
Freshness of data
The current data freshness of ChatGPT ends in 2021. We are in 2023. That’s bad!
My guess is the consequence of fake-news and data curation process. The velocity of those operations can’t follow the real time and create glitches like this one.
One consequence is the use of ChatGPT as a personal assistant: if the data are outdated, your assistant is has-been and, thus, useless!
Google Search scans every day the whole internet to recreate its index and provide as good as possible information, including false information (i.e. Fake-news).
The fake-news indexation is mitigated by AI algorithms and the search engine algorithm, but it’s not fully bullet-proof!
It’s always a matter of tradeoff!
Hierarchy and importance of information
Finally, because ChatGPT is a powerful tool to aggregate and summarize the information, the question of the similarity, ranking and sorting is legit.
How is the information ranking performed?
Google Search got us used to using algorithms such as PageRank and other evolution to interpret/guess the importance and the trustability of a website. Better is the result, higher is the website in the search engine resultset.
But now, with ChatGPT, how is this ranking/sorting/aggregation done?
- If all the data sources have the same importance, ChatGPT would be an evolution of Wikipedia with a simple listing of what exists in the world.
- If the data sources are ranked, what is the process? Who validated it?
Is it a question of usage? Number of citations? Number of reuse? Link to that article?
I would like to remind you that for centuries EVERYBODY on earth thought the earth was flat. And it’s not because everybody believed in this, that was true!
And now, there is the question of the minority reports. Maybe useless at a point of time, even considered as a fake-news, but finally true and correct.
Think about Galileo and Eratosthene!
However, if that information is never considered, mentioned or listed, ChatGPT becomes totally useless and present only a poor and demagogic version of the reality. It’s even a danger for a researcher that wants to know the state of the art of a specific domain.
In that case, Google Search is much more relevant!
Transparency and explainability
In the end, the ChatGPT blackbox is my main pain point.
- The training dataset is obscure and unclear. What has been learnt by that AI? What are its biases?
- The use of the information, the ranking, the sorting in the importance of the data lack of explainability
I would love to have a link to the document sources used by ChatGPT when the answer is generated, like the “References” section in Wikipedia. The data source, the ranking of the data, and all that hidden information would be revealed and would help to trust that new tool.
In fact, all the things that will help me to know and understand the biases of Chat GPT!
And Google in that picture?
Because I’m a Google fan, I can’t omit that part!
Most of the popular catchy titles of ChatGPT articles predict the end of Google! Yes, of course, it’s legit. The motto of Google is to organize world information.
ChatGPT goes a step further, by analyzing and summarizing world information.
But I don’t see an opposition here. By thinking about the data source.
From where does ChatGPT get the training dataset?
I’m pretty sure that Google Search is in the loop. I don’t imagine a company reproducing the huge job done by Google those 20 past years to train a prototype of AI.
In that way, ChatGPT enhances the job done by Google, and downgrades Google as a simple data provider.
And here I’m seeing more a symbiosis option than an open conflict.
- If ChatGPT really uses Google Search data as data source, a weaker Google will lead to weaker data, lower data quality and lower ChatGPT performance. A lose-lose situation.
In addition, Google has many tools and products that can enhance ChatGPT usages! Speech-to-text, text-to-speech, translate, android integration,…
- On the opposite, if ChatGPT scans itself the whole internet there will be a deep conflict for the user: which source to trust? Who has the truth?
Explainability and transparency will be the key here, and Google leads the way for now!
Data war, or war in general, never leads to a better world!
- Another consideration is the possibility that Google creates a similar AI, as powerful as ChatGPT, maybe better, and continues the story alone!
It’s also a symbiosis option, but with another and similar AI.
In addition, for now, only Google Search offers to list all the sources and to go directly to it, read the original text without aggregation or interpretation. A trustable way to get the exact truth!
In fact, the true power of ChatGPT has been totally hidden by its ability to aggregate and generate wise answers based on tons of data!
The true power is the generative part!
Generative? Yes, the ability to chat seamlessly with any user. To generate a text, a poem or even code with a very high quality. You can even provide constraint to comply with a author style or a typical coding language!
And I have to admit that I feel like I’m in a science fiction movie. Plug a Speech-to-Text and Text-to-Speech as input and output of ChatGPT (made by Google?) and you are in the future!
In that way, Google Assistant (and not Google Search) is the real Google product in danger.
And it’s not really true: the number of languages supported by Google Assistant is truly incredible and it won’t be easy to do the same!
What if ChatGPT was an advantage for Google?
Google, i.e. Alphabet, is a tentacular company. Google Cloud offers resources with dedicated hardware (TPU) to train large AI models.
ChatGPT, with its hundreds of billions of parameters, requires a huge quantity of resources to work.
The combination of both, for storing the dataset and/or training and/or serving the model could be a benefit for Google!
Even though ChatGPT was a threat for Google just before!
A final challenge
In the end, there is a final challenge for ChatGPT. As mentioned just before, ChatGPT requires a huge quantity of resources. And the question of the business model will come at a time.
Google bases his business model on ads. When you search for something, ads are added as the top result to invite you to click on it.
For better results (and higher billing by Google to ad-companies), personalization is the key. Google knows you: with your search history, your emails, your smartphone, your travels on Google Maps,….
What about the business model of ChatGPT?
You could imagine do as Google does and add ads in the data aggregation and summary.
However, it makes no sense and will degrade the experience.
Otherwise, the power of ChatGPT is its conversational agent. You can use it as a personal assistant to book a flight for instance.
However, there are also many challenges here
- If you add ads in the result, what will be the behavior of ChatGPT: book the best flight or follow the ad? Maybe it can ask you if you want to take advantage of the promotion, but I doubt about the real impact
- To propose real personalize experience, the assistant must know you. What do you prefer (eco class or business, your preferred airport and flight company,…). Google already know you and know that, and for 20 years now!
It will be an impossible challenge for ChatGPT to gather as much as data as Google already has. The symbiosis option is much more relevant here.
- A last point, not related to business model (even thought it would be costly to deploy): A personal assistant need to know near real time data to propose the best options to its user: what are the flights? the options? the prices?
That is still a challenge for ChatGPT today.
The business model is not clear for ChatGPT today. Maybe the ad time is over and something new is required for that disruptive solution.
ChatGPT is a wonderful creation and I love its breakthrough in the current market. The usual competitors, such as Google (assistant), are jostled and force them to act, or die.
Die? Not immediately, but it’s a real game changer and a catalizer to speed up the development and deployment of similar solution.
Today, ChatGPT is still a prototype, with many aspect to fix before having a commercial and mature version:
- Data source transparency
- Explainability of the reports/Answer
- Data quality and freshness
- Business model
- Openness to other languages or other user interaction (voice)
The true change is not about the content, aggregation and data summary. It’s about the conversation experience with the machine.
I have been happy to have a human-like interaction with a machine and chat naturally with it!
But, in the end, was it not the true purpose of ChatGPT? Chat Generative Pre-trained Transformers