Is AI Generated Information Reliable or Not?

AI: The Pathway to the Future Being Slowly Built

Ritvik Nayak
Data Storytelling Corner
7 min readJul 2, 2024

--

We all can agree that AI has definitely come a long way since its creation in the early 1950s. Within a period of just 74 years, humanity has gone from the hypothetical Turing Machine, which could test the intelligence of machines and ELIZA, a program that could take input and respond to dialogues, but unfortunately was restricted to a script all the way to Google’s completely AI-automated car named Waymo and even an entire virtual universe powered by AI and VR-the Metaverse. Alan Turing truly would be proud.

Image by:https://www.zabala.eu/news/artificial-intelligence-and-consultancy/

Recent Mishaps with AI

Although AI has significantly advanced in various aspects such as Natural Language Processing, Computer Vision, Data Analysis, and much more, it is still not always completely reliable. For instance, in 2018, an Uber self-driving car struck and killed a pedestrian in Arizona as the AI system failed to identify the woman as a hazard and did not stop the car quick enough. The unfortunate death of Elizabeth Herzberg (the woman who the car struck) serves as an example that while AI has come a long way, human supervision is a must in real-world scenarios.

In another instance, a New Jersey man was wrongfully arrested due to a false match from an AI facial recognition system used by the police. The police relied only on the AI system and did not further verify or investigate the claims, proving that AI should not serve as a completely relied-on system but rather simply provide assistance to humans.

How do AI Systems Generate Information and Data?

AI systems can generate information from a user input via various methods and techniques.

Data Acquisition and Preprocessing

AI systems require large amounts of data to learn from. They are trained on accessible and reliable data from various sources such as images, texts, sensor data, and data from user interaction to improve their future responses, so be careful what you ask AI. This process is called data acquisition. The raw data is then cleaned and formatted of any disturbances or noise that will make the data incorrect or inaccurate, ensuring that the information provided in the future responses by the AI System is reliable.

Machine Learning

Supervised learning is a branch of AI that focuses on training AI algorithms to make predictions with labelled datasets. In a supervised learning system, a supervisor inputs a labelled dataset, where each data point consists of an input and the corresponding correct output for that label. The algorithm then processes this data to learn a mapping from inputs to outputs. Thus, the algorithm understands the corresponding correct output for each input given, learning the pattern.

After this learning stage, another dataset is provided to the algorithm, also labelled but previously not given in the learning stage. This new dataset is called the test dataset. The algorithm makes a prediction of what it thinks the correct corresponding output is to each input based on the mapping created in the learning stage. The predicted output is then compared to the output from the learning stage to evaluate the algorithm’s performance.

A Representation of a Supervised Learning System. Image by:https://www.geeksforgeeks.org/supervised-unsupervised-learning/

Unsupervised learning is another branch of machine learning. Basically, in an unsupervised learning system, an unlabelled dataset is given and the algorithm aims to try find patterns, relations, and connections itself with very little instructions or no instructions at all, eventually delivering an output on its own with no human intervention.

A Representation of an Unsupervised Learning System. Image by:https://www.geeksforgeeks.org/ml-types-learning-part-2/

Reinforcement Learning is a branch of AI that trains AI Algorithms to make the most optimal decisions in certain scenarios via a trial and error technique, where actions and results that work towards the objective are enforced while ones that do not work towards the objective are rejected. In Reinforcement Learning, the AI Algorithm is called the Agent, and the environment is the external system which the Agent interacts with.

Image by:https://techvidvan.com/tutorials/reinforcement-learning/

Natural Language Processing(NLP)

Some algorithms can process, understand and even generate information in various different human languages as they are trained on texts and data from different languages from across the world. A notable algorithm is GPT-3, which is an algorithm that OpenAI’s ChatGPT uses.

Sequence-to-Sequence Models: Sequence to Sequence Models are a specific type of NLP that takes a sequence of words or characters as an input, processes it and generates another, which it delivers as an output. There are 2 main components in Sequence-to-Sequence Models, or Seq2Seq models for short: the encoder and the decoder.

The encoder is a component of a Seq2Seq model that reads the input sequence and compresses it into a context vector. Some types of encoders are: Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), or Gated Recurrent Unit (GRU) network.

The decoder is the second and last component of a Seq2Seq model that generates the output sequence from the data given by the context vector from the encoder.

Computer Vision

Convolutional Neural Networks

Convolutional Neural Networks or CNNs are a type of Neural Network that are used particularly for pattern recognition, image analysis and detection. So how do they work? Well, in essence, a CNN processes images by applying a series of convolutional layers, which are similar to filters trying to detect and identify a specific item, object, or aspect in an image, sound, or video. Then, the outputs of each convolutional layer is passed through a component called an activation function, which introduces non-linearity. The layers are then pooled together, reducing dimensionality and preserving important features and details.

Generative Adversarial Networks

In my opinion, Generative Adversarial Networks or GANs are the best type of neural networks, used in Text-to-Video Generation, Video Synthesis, Art Creation, etc. They consist of 2 neural networks, the Generator and the Discriminator. The Generator’s purpose is to create data samples that resemble the training data from random input noise. The objective of the Generator is to try to fool the Discriminator.

The Discriminator is the second neural network in the GAN which tries to distinguish between the real training data and the data produced by the Generator. The discriminator takes in inputs of the Generator’s data as well as the real data and outputs a probability of which data is real and which is fake.

So, in the beginning, the Generator tries to create data similar to the training data from random noise given, its object to make the discriminator incorrectly guess. The Generator then inputs its data into the Discriminator. At the same time, real data is also input into the Discriminator. This data is not labelled, so the Discriminator cannot know which data is which without processing it. The Discriminator proccesses the data and tries to distinguish between the Generator’s data and the real data. Finally, the Discriminator outputs the probability of each data being real or fake.

If the Discriminator incorrectly labels the data, then the experiment is repeated until the Discriminator’s accuracy maximises. If Generator did not fool the Discriminator, the experiment is repeated until the Generator is able to fool it, making it sort of a win-or-lose game.

Image by: https://www.linkedin.com/pulse/exploring-fascinating-realm-generative-adversarial-networks-kaurav

New Information Generation

Some AI models are trained to generate new information similar to the training data. GANs are one of these models as they can generate images, videos, or even sounds similar to training data.

So is AI Reliable?

So is AI reliable? Well, AI is reliable, but not always. AI reliability depends on various factors, such as the data given, the initial training, the monitoring, and the human oversight. So that isn’t necessarily the question. The question we should be asking is ‘Do humans supervise AI enough that it will give completely accurate results?’ Unfortunately, for now, the answer to this question is no. AI is not even a century old yet and humans are completely relying on it for tasks. Resulting in major mishaps and accidents like the unfortunate death of Elizabeth Herzberg. Let me get this clear, I am NOT saying that AI is wrong. I am just stating that for now, maybe AI should be more of an assistant rather than completely performing tasks.

But perhaps, in the near future, with AI's growing advancements, AI can lead the world in the ever sought after quest for unprecedented innovation and problem-solving capabilities.

Acknowledgements

--

--

Ritvik Nayak
Data Storytelling Corner

International Math Olympiad Gold Medalist | Programmer & Software Developer | AI, ML, Astrophysics, Quantum Computing, and Mathematical Researcher |