Exploring LLM: A Collection of My Articles❤️

5 min readJun 25, 2024

Dive into the intricate world of large language models with in-depth articles on their architectures, MoE, and RAG. Discover more by exploring the links below.

What to Expect from My Articles? 👋

In my articles, I cover a range of topics, from the basics to advanced architecture, how they work, Codes for these concepts, and the mathematical Representation behind them. A comprehensive visual journey to the world of AI (LLM and NLP).

If you’re looking to improve your understanding of Large Language Models (LLMs) and Natural Language Processing (NLP), I highly recommend checking out my articles. They’re worth your time, and I’m confident they’ll help you grasp complex concepts easily. Keep in mind that my writing style might not be perfect in some places, but my goal is to make complicated ideas simple to understand.

All This Article Published Under Towards AI Publication. 👽

Note: I’ll be publishing new content regularly, and I’ll add links to these articles as they become available. Be sure to check back for updates and to continue learning about the latest developments in NLP and LLM.

1. Large Language Model (LLM)

Large Language Model (LLM)🤖: In and Out

Delving into the Architecture of LLM: Unraveling the Mechanics Behind Large Language Models like GPT, LLAMA, etc.

pub.towardsai.net

In this article, I’ll take you through the key components of Large Language Models, including:

📌 Word Embeddings: how words are represented as vectors

📌 Self-Attention: how the model focuses on specific parts of the input

📌 Multi-Head Attention: an advanced version of self-attention for better performance

📌 Feed Forward Network: a crucial layer in the model’s architecture

📌 Linear Layer and Softmax: how the model makes predictions

📌 Inference Mechanism: how the model generates output from input

By the end of this article, you’ll have a solid understanding of these fundamental concepts in Large Language Models.

2. BERT (Bidirectional Encoder Representation From Transformer)

BERT: In-depth exploration of Architecture, Workflow, Code, and Mathematical Foundations

Delving into Embeddings, Masked Language Model Tasks, Attention Mechanisms, and Feed-Forward Networks: Not Just Another…

pub.towardsai.net

In this article, I have explored the following key concepts:

📌 Word Embeddings: How machines understand the meaning and context of words
📌 Position Embeddings: Understanding the importance of word position in a sentence
📌 Masked Language Model Task: Cloze test task that BERT was pre-trained on.
📌 Self-Attention and Multi-Head Attention: How models focus on the input data
📌 Feedforward Networks, Linear Layers, and Softmax: Essential Learning components of language models

By the end of this article, you’ll have a deep understanding of the BERT architecture and its components, as well as practical knowledge of how to work with this powerful model.

3. Mistral 7B

Breaking down Mistral 7B ⚡🍨

Exploring Mistral’s Rotary positional Embedding, Sliding Window Attention, KV Cache with rolling buffer, and…

pub.towardsai.net

In this article, I have explored the architecture of Mistral 7B, a powerful language model. Specifically, I have covered:

📌 The overall architecture of Mistral

📌 Relative Positional Embeddings: a technique for encoding word positions

📌 Rotary Positional Embeddings: another approach to positional encoding

📌 Self-Attention: how the model focuses on specific parts of the input

📌 Multi-Head Attention: an advanced version of self-attention for better performance

📌 KV Cache: a mechanism for efficient inference in attention calculation

📌 Sliding Window Attention: a technique for processing long input sequences

📌 KV Cache and Inference in Mistral 7B: how these components work together

📌 Calculating Parameters in Mistral 7B: a step-by-step guide

By the end of this article, you’ll have a deep understanding of the Mistral architecture and its components, as well as practical knowledge of how to work with this powerful model.

4. Mixture of Experts(MoE) and Sparse Mixture of Experts(SMoE)

The architecture of Mistral’s Sparse Mixture of Experts (S〽️⭕E)

Exploring Feed Forward Networks, Gating Mechanism, Mixture of Experts (MoE), and Sparse Mixture of Experts (SMoE).

pub.towardsai.net

In this article, I have explored the concept of a Mixture of Experts, a powerful technique in Generative AI. You’ll learn about:

📌 Mixture of Experts (MoE): a method for combining the strengths of multiple FFN

📌 Sparse Mixture of Experts (SMoE): an efficient variant that reduces the computational cost

This article is highly recommended, as it provides a clear and concise explanation of these complex concepts. By the end of this article, you’ll have a solid understanding of Mixture of Experts and its efficient variant, Sparse Mixture of Experts.

5. Fine Grained Expert and shared Expert isolation

Revolutionizing AI with DeepSeekMoE: Fine-grained Expert, and Shared Expert isolation 🧞‍♂️

Optimizing MoE with Fine-Grained and shared expert isolation for enhanced precision and efficiency in Large Language…

pub.towardsai.net

In this article, I have explored advanced and efficient variants of a Mixture of Experts (MoE), including:

📌 Fine-Grained Experts: a technique for improving expert specialization

📌 Shared Experts Isolation Method: innovative approaches introduced by DeepSeek researchers

These variants tackle two major challenges in MoE: knowledge redundancy and knowledge hybridity. By the end of this article, you’ll understand how these advanced methods overcome these limitations, enabling more effective and efficient MoE models.

6. Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG): A Comprehensive Visual Walkthrough 🧠📖🔗🤖

A Simple Illustrated Guide

pub.towardsai.net

In this article, I have explained the basic working mechanism of RAG (Retrieval-Augmented Generation), a powerful language model. You’ll learn about:

📌 Embedding Models: how RAG represents input text as vectors

📌 Chunks: the process of breaking down input text into manageable pieces

📌 Vector Index: a data structure for efficient vector storage and retrieval

📌 Vector Search Methods: including Naive Search (Flat), NSW, and HNSW, with code examples and working flow diagrams to illustrate each approach

By the end of this article, you’ll have a solid understanding of RAG’s underlying mechanics and how these components work together to enable efficient and effective language generation.

7. Multi-Head Latent Attention

A Visual Walkthrough of DeepSeek’s Multi-Head Latent Attention (MLA) 🧟‍♂️

Exploring Bottleneck in GPU Utilization and Multi-head Latent Attention Implementation in DeepSeekV2

pub.towardsai.net

In this article, I have explored two critical topics in the realm of deep learning:

📌 The Bottlenecks Problem in GPU: How Memory access patterns can slow down your Model’s performance

📌 Multi-Head Attention: a key component of transformer architectures, and how it can be optimized to mitigate the bottlenecks problem

You’ll gain a deeper understanding of the challenges posed by GPU bottlenecks and how multi-head attention can be optimized to overcome these limitations, leading to faster and more efficient model training.

If you found my articles useful 👍, give Clapssss👏😉! Feel free to follow for more insights.

Let’s stay connected and explore the exciting world of AI together!

Join me on LinkedIn: linkedin.com/in/jaiganesan-n/ 🌍❤️

Exploring LLM: A Collection of My Articles❤️

What to Expect from My Articles? 👋

1. Large Language Model (LLM)

Large Language Model (LLM)🤖: In and Out

Delving into the Architecture of LLM: Unraveling the Mechanics Behind Large Language Models like GPT, LLAMA, etc.

2. BERT (Bidirectional Encoder Representation From Transformer)

BERT: In-depth exploration of Architecture, Workflow, Code, and Mathematical Foundations

Delving into Embeddings, Masked Language Model Tasks, Attention Mechanisms, and Feed-Forward Networks: Not Just Another…

3. Mistral 7B

Breaking down Mistral 7B ⚡🍨

Exploring Mistral’s Rotary positional Embedding, Sliding Window Attention, KV Cache with rolling buffer, and…

4. Mixture of Experts(MoE) and Sparse Mixture of Experts(SMoE)

The architecture of Mistral’s Sparse Mixture of Experts (S〽️⭕E)

Exploring Feed Forward Networks, Gating Mechanism, Mixture of Experts (MoE), and Sparse Mixture of Experts (SMoE).

5. Fine Grained Expert and shared Expert isolation

Revolutionizing AI with DeepSeekMoE: Fine-grained Expert, and Shared Expert isolation 🧞‍♂️

Optimizing MoE with Fine-Grained and shared expert isolation for enhanced precision and efficiency in Large Language…

6. Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG): A Comprehensive Visual Walkthrough 🧠📖🔗🤖

A Simple Illustrated Guide

7. Multi-Head Latent Attention

A Visual Walkthrough of DeepSeek’s Multi-Head Latent Attention (MLA) 🧟‍♂️

Exploring Bottleneck in GPU Utilization and Multi-head Latent Attention Implementation in DeepSeekV2

Written by JAIGANESAN