Brief introduction in Generative AI

Shishir Bastola
7 min readJun 4, 2024

--

Fig: GenAI

Introduction

Generative AI (GenAI) refers to a category of artificial intelligence systems designed to generate new content, such as text, images, music, or even code, based on patterns learned from existing data. These systems are trained on large datasets and leverage sophisticated algorithms to produce outputs that resemble human-created content.

The main types and applications of GenAI include:

  1. Text Generation:
  • Natural Language Processing (NLP): Systems like OpenAI’s GPT-4 (Generative Pre-trained Transformer) generate coherent and contextually relevant text. These models are used for tasks like content creation, chatbots, and language translation.
  • Content Creation: Automating the generation of articles, reports, or creative writing.

2. Image Generation:

  • Generative Adversarial Networks (GANs): These involve two neural networks (a generator and a discriminator) that work together to create realistic images. GANs are used in art creation, image enhancement, and even deepfake generation.
  • Style Transfer: Techniques to apply the style of one image (e.g., a painting) to another image.

3. Music Generation:

  • AI models can compose music by learning patterns from existing compositions, useful for creating background scores, personalized music, and aiding musicians in the creative process.

4. Code Generation:

  • Tools like GitHub Copilot use AI to assist developers by suggesting code snippets, generating boilerplate code, and even creating entire functions or classes based on descriptions.

5. Video and Animation:

  • Generating animations or editing videos by predicting and creating new frames, enhancing video quality, or even generating completely new video content.

6. Other Applications:

  • Gaming: Creating new levels, characters, or even entire games.
  • Healthcare: Generating synthetic medical data for training purposes, or creating personalized treatment plans.
  • Finance: Generating financial reports or simulating market conditions.

Key Technologies Behind GenAI

  1. Neural Networks: Deep learning models, particularly those using architectures like transformers (e.g., GPT, BERT), are the backbone of GenAI. These models learn complex patterns and representations from vast amounts of data.
  2. Transfer Learning: Pre-training models on large datasets and fine-tuning them for specific tasks allows for more efficient and effective generation capabilities.
  3. Reinforcement Learning: Some generative models use reinforcement learning to improve their outputs by receiving feedback on the quality of generated content.

Relationship to AI and ML:

Generative AI (GenAI) is a specialized subset of artificial intelligence (AI) and machine learning (ML) that focuses on creating new content rather than merely analyzing or predicting existing data. Within the broader field of AI, which encompasses any machine or system that can perform tasks requiring human-like intelligence, GenAI specifically leverages advanced ML techniques to generate text, images, music, and more.

  • AI (Artificial Intelligence): The overarching domain involving systems that mimic human cognitive functions such as learning and problem-solving. GenAI falls under this broad category.
  • ML (Machine Learning): A subset of AI where systems learn from data to make decisions or predictions. GenAI uses ML algorithms to identify patterns in data and generate new content based on those patterns.
  • Deep Learning: A specialized area of ML involving neural networks with many layers (hence “deep”). Most state-of-the-art GenAI models, like GPT (Generative Pre-trained Transformer), are built using deep learning techniques.

In essence, GenAI represents the intersection of these fields, utilizing deep learning to push the boundaries of what AI can create autonomously.

Fig: GenAI

Deep learning

Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in a way that is inspired by the human brain. Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions.

Machine learning

Machine learning (ML) is a branch of artificial intelligence (AI) where computers learn from data to make decisions or predictions without explicit programming. It powers applications like recommendation systems, fraud detection, and self-driving cars.

Comparison on Deep learning and Machine learning

Machine Learning means computers learning from data using algorithms to perform a task without being explicitly programmed. Deep Learning uses a complex structure of algorithms modeled on the human brain. This enables the processing of unstructured data such as documents, images, and text

Fig: Comparison on Deep learning and Machine learning

History

Mapping Sequences

Mapping sequences can involve several types of relationships:

  1. One-to-One Mapping: Used for sentiment analysis.
  2. One-to-Many Mapping: Image capturing.
  3. Many-to-One Mapping: Language translation using RNN, LSTM, GRU.
  4. Many-to-Many Mapping: Multiple elements in one sequence correspond to multiple elements in another sequence.

Sequence-to-sequence Paper

The paper introduces a novel approach called sequence-to-sequence (seq2seq) modeling to address the limitation of fixed-length inputs and outputs in traditional sequence mapping tasks. This technique employs two distinct components: an encoder and a decoder. The encoder processes variable-length input sequences and generates a fixed-length vector representation, capturing the semantic information of the input. Subsequently, the decoder utilizes this vector representation to generate variable-length output sequences, allowing flexibility in the output structure. By employing encoder-decoder architecture, the seq2seq model effectively overcomes the constraint of fixed-length input-output mapping, enabling it to handle a wide range of tasks with variable-length sequences, such as machine translation and text summarization.

Fig: encoder and a decoder

Context Vector

A context vector is a fixed-length representation that captures the essential information of a variable-length input sequence in natural language processing tasks like machine translation. Simply, think of a context vector like a superhero’s special power. It takes a big, long story (like a comic book) and squishes it down into a short summary, like a mini-comic strip, so the superhero can understand it quickly and know what to do next!

Attention Is All You Need

In this study, when input text exceeded 30 words, the context couldn’t maintain relevance, so attention was introduced as a solution. Attention allows each input word to map to the output word, enhancing the prediction of subsequent words. Google’s “Attention Is All You Need” paper in 2018 marked a milestone in NLP by introducing the transformer architecture, which is faster than traditional ones like RNN and LSTM, as it allows parallel processing of input, unlike the sequential approach of previous architectures.

Fig: Attention

Discriminative Vs Generative model

Discriminative models focus on learning the boundary between classes in data, making direct predictions about the target variable given input features. Generative models, on the other hand, learn the joint probability distribution of both input features and the target variable, allowing them to generate new data samples.

Fig: Discriminative Vs Generative model

Generative AI Model training Approach

In training generative models, labelled data isn’t required. This is especially beneficial when dealing with large datasets where labelling each data point is impractical. Instead, generative models focus on understanding the underlying data distribution to capture relationships within the data. In generative AI, unstructured data is often fed to the model for training purposes, which can initially involve unsupervised learning to learn from the data’s inherent structure. Subsequently, the model can undergo fine-tuning or supervised learning with labelled data to enhance its performance in specific tasks.

Large Language Model (LLM)

A large language model (LLM) is a powerful deep learning system that can understand and produce text in a way that’s similar to human writing. It’s particularly good at generating text with the same complexity and size as human language. LLMs are based on transformer architectures, which are well-suited for handling sequences of data like text.

LLM models classification

Some of the free to use llm are Bloom, OLLama, PaLM, and so on.

LLM is use cases

Here are five notable use cases of large language models (LLMs):

  1. Text Generation: LLMs are adept at generating human-like text across various domains, from creative writing to technical documentation, serving writers, marketers, and content creators alike.
  2. Translation: LLMs facilitate seamless language translation, enabling effective communication across linguistic barriers, benefiting travelers, businesses, and global communities.
  3. Chatbots and Virtual Assistants: LLMs power conversational agents, enhancing customer service, providing personalized recommendations, and automating interactions in sectors like e-commerce, healthcare, and finance.
  4. Summarization: LLMs condense large volumes of text into concise summaries, aiding researchers, students, and professionals in extracting key information from documents, articles, and research papers.
  5. Sentiment Analysis: LLMs analyze text to determine sentiment, valuable for businesses in gauging customer feedback, monitoring brand reputation, and predicting market trends in sectors like retail, hospitality, and finance.

Positional encodings in transformer

Positional encodings provide the model with information about the position of tokens in the sequence, ensuring that the model can differentiate between tokens based on their position. This is essential for tasks where word order matters, such as language translation and text generation.

Fig: Positional encodings

Attention

The attention mechanism calculates a weighted sum of the values based on the similarity between the query and key vectors. The resulting weighted sum, along with the original input, is then passed through a feed-forward neural network to produce the final output.

Fig: Positional encodings

Context Window

A context window is a textual range around a target token that a large language model (LLM) can process at the time the information is generated.

--

--

Shishir Bastola

Undergraduate Computer Engineer | AI | ML | DEEPLEARINING | GENAI | Follow your dream | ✌️