Exploring LLM: A Collection of My Articles❤️

JAIGANESAN
5 min readJun 25, 2024

--

Dive into the intricate world of large language models with in-depth articles on their architectures, MoE, and RAG. Discover more by exploring the links below.

Photo by Wes Hicks on Unsplash

What to Expect from My Articles? 👋

In my articles, I cover a range of topics, from the basics to advanced architecture, how they work, Codes for these concepts, and the mathematical Representation behind them. A comprehensive visual journey to the world of AI (LLM and NLP).

If you’re looking to improve your understanding of Large Language Models (LLMs) and Natural Language Processing (NLP), I highly recommend checking out my articles. They’re worth your time, and I’m confident they’ll help you grasp complex concepts easily. Keep in mind that my writing style might not be perfect in some places, but my goal is to make complicated ideas simple to understand.

All This Article Published Under Towards AI Publication. 👽

Note: I’ll be publishing new content regularly, and I’ll add links to these articles as they become available. Be sure to check back for updates and to continue learning about the latest developments in NLP and LLM.

1. Large Language Model (LLM)

In this article, I’ll take you through the key components of Large Language Models, including:

📌 Word Embeddings: how words are represented as vectors

📌 Self-Attention: how the model focuses on specific parts of the input

📌 Multi-Head Attention: an advanced version of self-attention for better performance

📌 Feed Forward Network: a crucial layer in the model’s architecture

📌 Linear Layer and Softmax: how the model makes predictions

📌 Inference Mechanism: how the model generates output from input

By the end of this article, you’ll have a solid understanding of these fundamental concepts in Large Language Models.

2. BERT (Bidirectional Encoder Representation From Transformer)

In this article, I have explored the following key concepts:

📌 Word Embeddings: How machines understand the meaning and context of words
📌 Position Embeddings: Understanding the importance of word position in a sentence
📌 Masked Language Model Task: Cloze test task that BERT was pre-trained on.
📌 Self-Attention and Multi-Head Attention: How models focus on the input data
📌 Feedforward Networks, Linear Layers, and Softmax: Essential Learning components of language models

By the end of this article, you’ll have a deep understanding of the BERT architecture and its components, as well as practical knowledge of how to work with this powerful model.

3. Mistral 7B

In this article, I have explored the architecture of Mistral 7B, a powerful language model. Specifically, I have covered:

📌 The overall architecture of Mistral

📌 Relative Positional Embeddings: a technique for encoding word positions

📌 Rotary Positional Embeddings: another approach to positional encoding

📌 Self-Attention: how the model focuses on specific parts of the input

📌 Multi-Head Attention: an advanced version of self-attention for better performance

📌 KV Cache: a mechanism for efficient inference in attention calculation

📌 Sliding Window Attention: a technique for processing long input sequences

📌 KV Cache and Inference in Mistral 7B: how these components work together

📌 Calculating Parameters in Mistral 7B: a step-by-step guide

By the end of this article, you’ll have a deep understanding of the Mistral architecture and its components, as well as practical knowledge of how to work with this powerful model.

4. Mixture of Experts(MoE) and Sparse Mixture of Experts(SMoE)

In this article, I have explored the concept of a Mixture of Experts, a powerful technique in Generative AI. You’ll learn about:

📌 Mixture of Experts (MoE): a method for combining the strengths of multiple FFN

📌 Sparse Mixture of Experts (SMoE): an efficient variant that reduces the computational cost

This article is highly recommended, as it provides a clear and concise explanation of these complex concepts. By the end of this article, you’ll have a solid understanding of Mixture of Experts and its efficient variant, Sparse Mixture of Experts.

5. Fine Grained Expert and shared Expert isolation

In this article, I have explored advanced and efficient variants of a Mixture of Experts (MoE), including:

📌 Fine-Grained Experts: a technique for improving expert specialization

📌 Shared Experts Isolation Method: innovative approaches introduced by DeepSeek researchers

These variants tackle two major challenges in MoE: knowledge redundancy and knowledge hybridity. By the end of this article, you’ll understand how these advanced methods overcome these limitations, enabling more effective and efficient MoE models.

6. Retrieval Augmented Generation (RAG)

In this article, I have explained the basic working mechanism of RAG (Retrieval-Augmented Generation), a powerful language model. You’ll learn about:

📌 Embedding Models: how RAG represents input text as vectors

📌 Chunks: the process of breaking down input text into manageable pieces

📌 Vector Index: a data structure for efficient vector storage and retrieval

📌 Vector Search Methods: including Naive Search (Flat), NSW, and HNSW, with code examples and working flow diagrams to illustrate each approach

By the end of this article, you’ll have a solid understanding of RAG’s underlying mechanics and how these components work together to enable efficient and effective language generation.

7. Multi-Head Latent Attention

In this article, I have explored two critical topics in the realm of deep learning:

📌 The Bottlenecks Problem in GPU: How Memory access patterns can slow down your Model’s performance

📌 Multi-Head Attention: a key component of transformer architectures, and how it can be optimized to mitigate the bottlenecks problem

You’ll gain a deeper understanding of the challenges posed by GPU bottlenecks and how multi-head attention can be optimized to overcome these limitations, leading to faster and more efficient model training.

If you found my articles useful 👍, give Clapssss👏😉! Feel free to follow for more insights.

Let’s stay connected and explore the exciting world of AI together!

Join me on LinkedIn: linkedin.com/in/jaiganesan-n/ 🌍❤️

--

--

JAIGANESAN

Simplifying Complex AI, ML, and DL Concepts for Everyone | 🤝 🌏❤ LinkedIn: https://www.linkedin.com/in/jaiganesan-n/