<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Tilak Mudgal on Medium]]></title>
        <description><![CDATA[Stories by Tilak Mudgal on Medium]]></description>
        <link>https://medium.com/@tilak559?source=rss-e88414f712dd------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*g3_p8oTfVfq8ylfzYseGTw.jpeg</url>
            <title>Stories by Tilak Mudgal on Medium</title>
            <link>https://medium.com/@tilak559?source=rss-e88414f712dd------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Tue, 19 May 2026 04:11:23 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@tilak559/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Vector Databases: The Memory Engine of RAG]]></title>
            <link>https://medium.com/@tilak559/vector-databases-the-memory-engine-of-rag-62e4f1ed59d9?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/62e4f1ed59d9</guid>
            <category><![CDATA[faiss]]></category>
            <category><![CDATA[embedding]]></category>
            <category><![CDATA[retrieval-augmented-gen]]></category>
            <category><![CDATA[milvus]]></category>
            <category><![CDATA[vector-database]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Wed, 15 Oct 2025 23:38:29 GMT</pubDate>
            <atom:updated>2025-10-15T23:42:31.727Z</atom:updated>
            <content:encoded><![CDATA[<p>Every intelligent system needs a way to remember what it has learned.<br>In Retrieval-Augmented Generation (RAG), that memory is powered by <strong>embeddings</strong> numerical representations of text that capture meaning instead of just words. These embeddings allow AI to understand that <em>“refund”</em> and <em>“return”</em> are related even if they’re written differently.</p><p>If you’re already familiar with how embeddings work, feel free to continue.<br>But if you’d like a quick refresher on how AI turns text into numbers and meaning, check out the below post.</p><p><a href="https://medium.com/@tilak559/rag-embeddings-how-ai-turns-text-into-understanding-aa83f941b746">RAG Embeddings: How AI Turns Text into Understanding</a></p><p>Now, once we’ve transformed our data into embeddings, the real question becomes:<br><strong>Where do we store them, and how do we find the right ones when a user asks something new?</strong><br>That’s where <strong>Vector Databases</strong> step in acting as the <em>memory engine</em> that helps RAG systems recall, search, and connect the right pieces of information in milliseconds.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*0Aa3zExNzhspmQJq.png" /></figure><p>They act as the <strong>memory</strong> of your RAG system whenever a user asks a question, the database helps find the <strong>most relevant pieces of information</strong> by comparing vectors.</p><blockquote>For example:</blockquote><blockquote>You ask, “What are the benefits of using solar panels?”<br>The system doesn’t look for the word “benefits” it looks for <strong>meaning</strong>.</blockquote><blockquote>It fetches chunks whose embeddings are closest in meaning to your question, like “Solar panels reduce electricity costs and carbon emissions.”</blockquote><h3>Why Traditional Databases Don’t Work</h3><p>Traditional databases are great for structured information things like customer names, invoice IDs, or sales amounts.<br>You can easily run queries like:</p><pre>SELECT * FROM cars WHERE brand = &#39;Porsche&#39;;</pre><p>But what if your question isn’t that simple?<br>Say you ask:</p><blockquote>“Show me cars that feel sporty but practical.”</blockquote><p>A relational database will stare blankly at you it doesn’t “understand” what <em>sporty but practical</em> means.<br>A vector database, on the other hand, can find cars like the <strong>Porsche 911</strong>, <strong>BMW M2</strong>, or B<strong>ugatti Chiron</strong>, because it stores <em>meanings</em> as numerical vectors, not just text strings.</p><p>That’s the magic vector databases let machines <strong>search by meaning</strong>, not by exact words.</p><h3>How Vector Databases Work</h3><p>When we turn text into embeddings, every sentence becomes a list of numbers like giving it a unique spot in a giant 3D map.<br>But instead of just three directions (x, y, z), this map has <strong>hundreds or even thousands of directions</strong> one for every tiny piece of meaning.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*eDpPHakwKAfqEYchTI_3wA.png" /></figure><p>A <strong>vector database</strong> stores all these “locations” along with the original text, so it knows both <em>where</em> something is in this space and <em>what</em> it means.<br>It also creates special shortcuts (called <strong>indexes</strong>) so it can find things quickly, even if you have millions of sentences stored.</p><p>Now, when you ask a question, your query is also turned into a vector a point on the same map.<br>The database simply looks for other points that are <strong>closest</strong> to it.<br>Those nearby points usually represent text with <strong>similar meaning</strong>, so the database returns them as the most relevant results.</p><h3>Vector Similarity: How It Finds the Closest Meaning</h3><p>Imagine every car you know plotted in a huge invisible 3D space:</p><blockquote>Cars with <strong>similar personality</strong> like “sporty,” “luxury,” or “eco-friendly” end up close to each other.</blockquote><blockquote>A <strong>Porsche 911</strong> and <strong>BMW M4</strong> would be neighbors because they share “performance” and “sporty” vibes.</blockquote><blockquote>Meanwhile, a <strong>Toyota Prius</strong> lives far away in the “eco &amp; efficiency” zone.</blockquote><p>So when you ask the system, <em>“Recommend cars that feel thrilling to drive,”</em> it doesn’t just find the word “thrilling” it finds the vectors closest in <em>meaning</em> to that feeling.</p><p>That’s what a vector database does: it measures <strong>closeness in meaning</strong> using something called <em>vector similarity</em>.</p><p><strong>Vector search</strong> is about finding which vectors are closest to your query vector meaning they represent similar ideas.<br>Instead of text comparison, we measure how similar two points (vectors) are in multi-dimensional space.</p><p>The most common similarity measures are:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*neRIwyByZRAEMJZdoJCvCg.png" /></figure><h4>Cosine Similarity — The Shortcut to Meaning</h4><p>Most vector databases rely on <strong>cosine similarity</strong>, which measures the <em>angle</em> between two vectors.<br>The smaller the angle, the more similar they are.<br>If two vectors point in almost the same direction, it means their meanings align closely.</p><p>So, when your query “thrilling to drive” points in nearly the same direction as “high performance car,” the cosine similarity is high and your database confidently returns results like <strong><em>Porsche 911 GT3</em></strong> or <strong>BMW M4</strong>.</p><p>It’s not looking for keywords.<br>It’s comparing ideas mathematically.</p><h4>Inside the Vector Database</h4><p>Every piece of text you store whether it’s a paragraph from a document, a product description, or a customer review gets an embedding vector and a unique ID.<br>The database keeps track of both:</p><pre>{<br>  &quot;id&quot;: &quot;123456&quot;,<br>  &quot;text&quot;: &quot;The Porsche 911 GT3 offers incredible handling and track-ready precision.&quot;,<br>  &quot;embedding&quot;: [0.11, -0.82, 0.57, ...]<br>}</pre><p>When a user asks, <em>“Which car is fun on corners?”</em><br>the database finds vectors with similar patterns.<br>Those become your <strong>top results</strong>, which are then sent to the LLM for final answer generation.</p><h4>How the Database Stores and Searches</h4><ol><li><strong>Storage</strong> — Each document chunk and its embedding vector are stored together.</li><li><strong>Search</strong> — When a query comes in, its embedding is computed.<br>Then the database compares it with all stored vectors (using cosine similarity or another metric).</li><li><strong>Result</strong> — The closest ones are retrieved (top-k results) and passed to the LLM for context.</li></ol><h4>Popular Vector Databases</h4><figure><img alt="" src="https://cdn-images-1.medium.com/max/972/1*NQSTJOryq-AIx76mrLOQDw.png" /></figure><h4>The Role of Vector Databases in RAG</h4><p>In a Retrieval-Augmented Generation system, the vector database acts as the <strong>retriever’s engine </strong>it quickly finds the right context before passing it to the LLM.</p><p>The flow looks like this:</p><ol><li>Your documents are broken into chunks and embedded.</li><li>Those embeddings are stored in the vector database.</li><li>A user asks a question → it’s embedded → the database finds similar vectors.</li><li>The retrieved chunks are sent to the LLM → the final answer is generated.</li></ol><p>Without this database, your system would have to re-scan every document each time someone asked a question slow, inefficient, and impractical.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=62e4f1ed59d9" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[RAG Embeddings: How AI Turns Text into Understanding]]></title>
            <link>https://medium.com/@tilak559/rag-embeddings-how-ai-turns-text-into-understanding-aa83f941b746?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/aa83f941b746</guid>
            <category><![CDATA[token]]></category>
            <category><![CDATA[chunking]]></category>
            <category><![CDATA[vector-embeddings]]></category>
            <category><![CDATA[embedding]]></category>
            <category><![CDATA[openai]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Fri, 10 Oct 2025 04:27:42 GMT</pubDate>
            <atom:updated>2025-10-10T04:27:42.701Z</atom:updated>
            <content:encoded><![CDATA[<p>If you’ve ever wondered <em>how</em> an AI “understands” that</p><blockquote><em>“dog” and “puppy” are similar,<br>but “dog” and “table” are not<br>you’re really wondering about </em><strong><em>embeddings</em></strong><em>.</em></blockquote><p>Embeddings are the <em>heart</em> of Retrieval-Augmented Generation (RAG).<br>They’re what allow an AI system to find meaning, not just matching words.</p><p>Before we create embeddings, we need to prepare our text so it’s manageable for the model to process.<br>This step is called <strong>chunking</strong> breaking a long document into smaller, meaningful sections.</p><h3>Chunking — Preparing Text for Embedding</h3><p>Every embedding model has a <strong>token limit</strong> (for example, 512, 1024, or 8192 tokens).<br>If a document exceeds that limit, you can’t embed it in one go you have to split it into smaller <strong>chunks</strong>.<br>Each chunk is then treated as a separate piece of text that gets its own embedding.</p><h4>Why Chunking Matters</h4><p>Chunking ensures:</p><ul><li>Each piece of text fits within the model’s token limit.</li><li>The content remains <strong>contextually meaningful</strong> (you don’t cut sentences in half).</li><li>Search and retrieval later happen on <strong>relevant sections</strong>, not entire documents.</li></ul><p>Think of it as dividing a book into chapters before summarizing it you get better organization and more precise meaning.</p><h4>How Chunking Works</h4><p>Here’s what happens in a typical workflow:</p><ol><li><strong>Document</strong> → Start with your full text (a report, webpage, or transcript).</li><li><strong>Chunking</strong> → Split it into smaller pieces, often around <strong>200–500 words</strong> or <strong>512–1024 tokens</strong> per chunk.</li><li><strong>Embedding</strong> → Send each chunk to the embedding model to get a numerical vector.</li><li><strong>Vector Database</strong> → Store all embeddings for later retrieval and semantic search.</li></ol><p>Good chunking doesn’t just cut by length it respects structure.<br>You can chunk:</p><ul><li><strong>By paragraphs or headings</strong> (for articles)</li><li><strong>By conversation turns</strong> (for chat logs)</li><li><strong>By logical sections</strong> (for PDFs or reports)</li></ul><p>You can also add <strong>overlap</strong> between chunks (e.g., 20–30%) to preserve context between them, ensuring smoother retrieval later in RAG systems.</p><h3>What Are Embeddings</h3><p>At their core, <strong>embeddings</strong> are just numbers but very special ones.</p><p>They’re vectors (lists of numbers) that represent the <em>meaning</em> of text.<br>Each word, sentence, or paragraph can be converted into a vector.</p><p>The trick?<br>Texts with similar meanings have vectors that are <em>close together</em> in this mathematical space.</p><h4>Example:</h4><p>Let’s say you have these two sentences:</p><ol><li>“The cat is sleeping on the couch.”</li><li>“A kitten is napping on the sofa.”</li></ol><p>Even though the words are different, they mean almost the same thing.</p><p>When converted to embeddings, their numeric representations might look like:</p><pre>Sentence 1: [0.12, -0.43, 0.88, ...]<br>Sentence 2: [0.11, -0.45, 0.91, ...]</pre><p>These two vectors are <em>very close</em> in distance meaning the model recognizes they’re similar in meaning.</p><p>That’s how AI finds related information even when you don’t use the same words.</p><h3>Why Embeddings Matter in RAG</h3><p>When someone asks a question, your RAG system doesn’t do a keyword search like “Cmd+F”.<br>Instead, it:</p><ol><li>Converts the question into an embedding vector.</li><li>Searches for <em>similar</em> embeddings in a large collection (your knowledge base).</li><li>Retrieves the most relevant passages based on meaning, not keywords.</li></ol><p>So when someone asks:</p><blockquote><em>“How can I return my order?”</em></blockquote><p>It can still find:</p><blockquote><em>“Refund policy for purchased products”<br>even though the words “return” and “refund” are different.</em></blockquote><p>That’s the real power of embeddings <strong>semantic search</strong> instead of keyword search.</p><blockquote><strong><em>What Makes This “Semantic” Search Different</em></strong></blockquote><blockquote>In traditional <strong>keyword search</strong>, systems only look for exact word matches.<br>So if you type <strong>“return order”</strong>, it’ll only find documents that literally contain those words.<br>If your database says <strong>“refund policy”</strong> instead, the system won’t connect the two even though both mean the same thing.</blockquote><blockquote>That’s where <strong>semantic search</strong> changes everything.</blockquote><blockquote>Semantic search understands meaning, not just words.<br>It works by comparing the <strong>embeddings </strong>those numeric representations of text meanings.<br>If two pieces of text have similar meanings, their embeddings will be <strong>close together</strong> in the vector space, even if they don’t share any words.</blockquote><blockquote>That’s why semantic search feels more like how humans think.<br>We don’t look for exact words we look for what’s related in meaning.</blockquote><h3>How Embeddings Are Created</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*E44xfG4OZxj02CBM.jpg" /></figure><p>To create embeddings, we use special models called <strong>embedding models</strong>.<br>These models convert text into numerical representations vectors that capture meaning, not just words.</p><p>Let’s see how this works using a sentence :</p><blockquote><strong><em>“Porsche 911 GT3 is the best car ever built.”</em></strong></blockquote><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*w7TXPPMj6_z1t4dTSfzI8g.png" /></figure><p><strong>Static vs Contextual Meaning</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/794/1*LvY8yyIkBMk6yTjf6MP7jA.png" /></figure><p>Older models like <strong>Word2Vec</strong> gave “drive” one fixed vector.<br>Modern embedding models like <strong>BGE-M3</strong> or <strong>OpenAI embeddings</strong> create <strong>context-aware vectors</strong>, so the same word means different things depending on its usage.</p><h3>Choosing an Embedding Model</h3><p>You’ll often hear about embedding models like:</p><ul><li><strong>BGE-M3</strong> (multi-lingual, open-source, strong accuracy)</li><li><strong>E5-Large</strong> (great for English semantic search)</li><li><strong>OpenAI text-embedding-3-large</strong> (high-quality, API-based)</li><li><strong>MiniLM / Instructor-xl</strong> (lighter, good for smaller setups)</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1000/1*7OXM7WMlmrVVzUxBfEB8TA.png" /></figure><ul><li><strong>Tokenization type</strong> — how the model breaks text into smaller pieces (like words or sub-words) before turning them into numbers.</li><li><strong>Dimension</strong> — the length of each embedding vector, showing how much detail the model can capture about meaning.</li><li><strong>Pooling method</strong> — how the model combines all token vectors into one final vector that represents the entire sentence.</li></ul><h4>Types of Tokenization</h4><p><strong><em>Sentence:</em></strong><em> “The Porsche 911 GT3 is insanely fast.”</em></p><p><strong>1. Word-level tokenization</strong><br>This method splits text into full words separated by spaces or punctuation.</p><blockquote><em>Example: </em><em>[&quot;The&quot;, &quot;Porsche&quot;, &quot;911&quot;, &quot;GT3&quot;, &quot;is&quot;, &quot;insanely&quot;, &quot;fast&quot;]<br>It’s simple but limited if the model hasn’t seen a word like “GT3” before, it treats it as unknown.</em></blockquote><p><strong>2. Subword-level tokenization</strong><br>This approach breaks uncommon or complex words into smaller, known parts.</p><blockquote><em>Example: </em><em>[&quot;The&quot;, &quot;Porsche&quot;, &quot;911&quot;, &quot;GT&quot;, &quot;3&quot;, &quot;is&quot;, &quot;insane&quot;, &quot;ly&quot;, &quot;fast&quot;]<br>This helps the model handle rare words or new combinations without losing meaning.</em></blockquote><p><strong>3. Character-level tokenization</strong><br>Here, every single character (including spaces and punctuation) becomes its own token.</p><blockquote><em>Example: </em><em>[&quot;T&quot;, &quot;h&quot;, &quot;e&quot;, &quot; &quot;, &quot;P&quot;, &quot;o&quot;, &quot;r&quot;, &quot;s&quot;, &quot;c&quot;, &quot;h&quot;, &quot;e&quot;, &quot; &quot;, &quot;9&quot;, &quot;1&quot;, &quot;1&quot;, &quot; &quot;, &quot;G&quot;, &quot;T&quot;, &quot;3&quot;, &quot; &quot;, &quot;i&quot;, &quot;s&quot;, &quot; &quot;, &quot;i&quot;, &quot;n&quot;, &quot;s&quot;, &quot;a&quot;, &quot;n&quot;, &quot;e&quot;, &quot;l&quot;, &quot;y&quot;, &quot; &quot;, &quot;f&quot;, &quot;a&quot;, &quot;s&quot;, &quot;t&quot;, &quot;.&quot;]<br>It captures all details but creates extremely long token lists, which is inefficient for large texts.</em></blockquote><p><strong>4. Byte Pair Encoding (BPE)</strong><br>BPE starts at the character level and repeatedly merges the most frequent pairs of characters into subwords.</p><blockquote><em>Example: </em><em>[&quot;The&quot;, &quot;▁Porsche&quot;, &quot;▁911&quot;, &quot;▁GT&quot;, &quot;3&quot;, &quot;▁is&quot;, &quot;▁insane&quot;, &quot;ly&quot;, &quot;▁fast&quot;, &quot;.&quot;]<br>This creates a balance between word and character-level tokenization compact yet flexible.</em></blockquote><p><strong>5. WordPiece tokenization</strong><br>Used in models like BERT, it works similarly to BPE but uses probabilities to decide which subwords to merge.</p><blockquote><em>Example: </em><em>[&quot;The&quot;, &quot;Porsche&quot;, &quot;911&quot;, &quot;GT&quot;, &quot;##3&quot;, &quot;is&quot;, &quot;insane&quot;, &quot;##ly&quot;, &quot;fast&quot;, &quot;.&quot;]<br>The “##” prefix shows that a token continues from a previous one (e.g., “GT” + “##3” → “GT3”).</em></blockquote><p><strong>6. SentencePiece tokenization</strong><br>SentencePiece treats the entire text as a raw byte stream, including spaces, and can handle multiple languages naturally.</p><blockquote><em>Example: </em><em>[&quot;▁The&quot;, &quot;▁Porsche&quot;, &quot;▁911&quot;, &quot;▁GT3&quot;, &quot;▁is&quot;, &quot;▁insanely&quot;, &quot;▁fast&quot;, &quot;.&quot;]<br>The underscore (▁) represents a space, making it language agnostic and ideal for multilingual embedding models like </em><strong><em>BGE-M3</em></strong><em>.</em></blockquote><h3>Final Takeaway</h3><p>Embeddings aren’t just numbers they’re how machines <strong>understand meaning</strong>.<br>And chunking isn’t just splitting text it’s how we <strong>preserve structure and relevance</strong> in that meaning.</p><p>When combined, they transform unstructured data into <strong>searchable, intelligent knowledge </strong>the backbone of every modern RAG system.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=aa83f941b746" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Why RAG Exists and How It Works — Part 1]]></title>
            <link>https://medium.com/@tilak559/why-rag-exists-and-how-it-works-part-1-2a5abc4a85a9?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/2a5abc4a85a9</guid>
            <category><![CDATA[vector-database]]></category>
            <category><![CDATA[agentic-rag]]></category>
            <category><![CDATA[llm]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Thu, 09 Oct 2025 23:29:28 GMT</pubDate>
            <atom:updated>2025-10-10T04:29:32.536Z</atom:updated>
            <content:encoded><![CDATA[<h3>Why RAG Exists and How It Works</h3><p>If you’ve ever asked ChatGPT or any other AI model a question like <em>“What’s the revenue of Tesla in 2024?”</em> and it confidently gave you an outdated or completely wrong answer, you’ve experienced the limits of large language models (LLMs).</p><p>They’re brilliant at <em>language</em>, but not always great at <em>facts</em>.<br>That’s where <strong>Retrieval-Augmented Generation (RAG)</strong> comes in it’s like giving your AI a search engine brain.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/942/0*oT_ZfMaPr4WoQZ9N.png" /><figcaption>Source: seekr</figcaption></figure><h3>Why RAG Exists</h3><p>LLMs, like GPT or LLaMA, are trained on massive amounts of text. But:</p><ul><li>Their training data is <em>static </em>it ends at a certain point.</li><li>They can’t “look up” new or private information.</li><li>They sometimes “hallucinate” i.e., make up facts that <em>sound</em> correct.</li></ul><p>Let’s take an example:</p><p>Imagine you built a chatbot for your company’s customer support.<br>Someone asks:</p><blockquote><em>“What’s the return policy for our premium customers?”</em></blockquote><p>A normal LLM might <em>guess</em> based on patterns in text it’s seen before, saying something like:</p><blockquote><em>“You can return products within 30 days.”</em></blockquote><p>But your real policy might be:</p><p><em>“Premium customers can return products within </em><strong><em>60 days</em></strong><em>.”</em></p><p><em>“Non Premium customers can return products within 3</em><strong><em>0 days</em></strong><em>.”</em></p><p>That’s a big difference.<br>RAG fixes this problem by letting your chatbot <em>retrieve</em> the real policy from your database or documents <strong>before</strong> it generates an answer.</p><h3>Where RAG Fits vs. Fine-Tuning</h3><p>There are two main ways to give LLMs new knowledge:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*bfzQ82yPZyHoj1rsPMt4kQ.png" /></figure><p>Let’s simplify it:</p><ul><li><strong>Fine-tuning</strong> is like teaching your assistant new habits (“always greet customers politely”).</li><li><strong>RAG</strong> is like giving your assistant a live knowledge base they can search whenever they need an answer.</li></ul><p>In real-world systems, companies often combine both:</p><ul><li>Fine-tune for tone and behavior.</li><li>Use RAG for real-time information.</li></ul><h3>High-Level Architecture of RAG</h3><p>Let’s walk through what’s happening step by step:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/800/1*qdfooG-M8hdqo7AXVXj7_g.webp" /><figcaption>Source: GeeksforGeeks</figcaption></figure><ol><li><strong>Knowledge Base → Data Chunks</strong><br>Your documents, FAQs, or PDFs or text live in the <em>knowledge base</em>. Before we can use them, we split them into smaller pieces (called <em>chunks</em>) so that the model can search and understand them efficiently.</li><li><strong>Embedding Model</strong><br>Each chunk of text is converted into a list of numbers called an <strong>embedding</strong> a numerical representation that captures the meaning of the text. Ex: “Return policy for premium customers” → [0.23, 0.91, -0.45, ...]</li><li><strong>Vector Database (Vector DB)</strong><br>All these embeddings are stored in a <strong>vector database</strong> such as Milvus, FAISS, or pgvector.<br>When a user asks a question, the system converts the <em>query</em> into another embedding and searches for the most similar ones inside this database just like finding the closest points in space.</li><li><strong>Retrieved Documents</strong><br>The most relevant chunks are retrieved from the vector database. These chunks contain the factual information the model needs to answer correctly.</li><li><strong>Generation Step (LLM)</strong><br>The retrieved text is passed to the LLM (like GPT or LLaMA) along with the user’s question.<br>The model reads both and composes a fluent, context-aware response that’s grounded in those retrieved facts.</li></ol><p><strong>Example</strong>:</p><p>You ask:</p><blockquote><em>“What is the warranty period for solar panels?”</em></blockquote><p>Without RAG:</p><blockquote><em>“Most solar panels come with a 10-year warranty.” </em>(generic answer)</blockquote><p>With RAG:</p><blockquote><em>“According to our installation manual, all panels installed after 2022 have a </em><strong><em>25-year product warranty and 10-year performance warranty</em></strong><em>.”</em></blockquote><p>See the difference?<br>RAG doesn’t just <em>talk smart</em> it <em>answers correctly</em> by pulling from verified data.</p><h3>Why It Matters</h3><ul><li><strong>Trust:</strong> Answers come from your real data, not random web text.</li><li><strong>Relevance:</strong> The model can access your internal knowledge (policies, product data, contracts).</li><li><strong>Scalability:</strong> You can update your documents anytime without retraining the model.</li></ul><p>That’s why RAG has become the <em>standard</em> architecture for any AI assistant, chatbot, or knowledge engine that needs to stay current and factual.</p><p><a href="https://medium.com/@tilak559/rag-embeddings-how-ai-turns-text-into-understanding-aa83f941b746">RAG Embeddings: How AI Turns Text into Understanding</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2a5abc4a85a9" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[How to Write a Dockerfile (Step-by-Step for Beginners)]]></title>
            <link>https://medium.com/@tilak559/how-to-write-a-dockerfile-step-by-step-for-beginners-22413c730512?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/22413c730512</guid>
            <category><![CDATA[image]]></category>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[dockerfiles]]></category>
            <category><![CDATA[docker-image]]></category>
            <category><![CDATA[containers]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Sat, 26 Apr 2025 22:05:42 GMT</pubDate>
            <atom:updated>2025-04-26T22:05:42.961Z</atom:updated>
            <content:encoded><![CDATA[<p>If you’ve been following along, you already know <strong>what Docker is</strong> and <strong>how it helps</strong> you package your app into neat little containers.</p><p>Now the next question is:<br>👉 <em>“How do I actually create a Docker container for my app?”</em></p><p>The answer: <strong>You write a Dockerfile.</strong></p><p>Let’s walk through it in the simplest way possible.</p><h3>What is a Dockerfile?</h3><p>A <strong>Dockerfile</strong> is just a text file with <strong>instructions</strong> for Docker.<br>It tells Docker <strong>how to build your app</strong> into an image.</p><p>Think of it like a recipe:</p><ul><li>Start with a base ingredient (Python)</li><li>Add your app files</li><li>Install any extra packages</li><li>Tell Docker how to start the app</li></ul><p>When you “build” the Dockerfile, Docker follows these steps to create a ready-to-run <strong>image</strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*UAkDVSG8M-eJcyfH.png" /></figure><h3>Let’s Build One Together!</h3><p>Suppose you have a simple Python app with these two files:</p><pre>/myapp<br>  ├── app.py<br>  └── requirements.txt</pre><ul><li>app.py has your Python code.</li><li>requirements.txt lists your Python packages (like Flask, Pandas, etc.).</li></ul><p>Now, let’s write a <strong>Dockerfile</strong> to containerize this app!</p><h3>Here’s a simple Dockerfile:</h3><pre># 1. Use an existing Python image<br>FROM python:3.11<br><br># 2. Set the working directory inside the container<br>WORKDIR /app<br><br># 3. Copy the requirements file first<br>COPY requirements.txt .<br><br># 4. Install the dependencies<br>RUN pip install --no-cache-dir -r requirements.txt<br><br># 5. Copy the rest of the app files<br>COPY . .<br><br># 6. Set environment variables<br>ENV PYTHONDONTWRITEBYTECODE=1<br>ENV PYTHONUNBUFFERED=1<br><br># 7. Define the command to run the app<br>CMD [&quot;python&quot;, &quot;app.py&quot;]</pre><h3>Line-by-Line Simple Explanation</h3><p>FROM python:3.11 Start from an official Python environment</p><p>WORKDIR /app Create and move into a working folder</p><p>COPY requirements.txt .Copy the requirements.txt into the container</p><p>RUN pip installInstall Python libraries from requirements.txt</p><p>COPY . .Copy all your app code into the containerENV</p><p>PYTHONDONTWRITEBYTECODE=1Prevent Python from creating unnecessary .pyc files</p><p>ENV PYTHONUNBUFFERED=1Make Python print output immediately</p><p>CMD [&quot;python&quot;, &quot;app.py&quot;]Tell Docker to start your app by running</p><p>python app.py</p><h3>How to Build and Run Your Dockerfile</h3><h3>Step 1: Open your terminal and move to the project folder:</h3><pre>cd /path/to/your/myapp</pre><p>Make sure your Dockerfile, app.py, and requirements.txt are inside.</p><h3>Step 2: Build the Docker Image</h3><pre>docker build -t my-python-app .</pre><p>This command builds an image called my-python-app based on your Dockerfile.</p><h3>Step 3: Run the Docker Container</h3><pre>docker run my-python-app</pre><p>You should see your app running! 🎉</p><p>For example, if your app.py simply prints something like:</p><pre>print(&quot;Hello from inside Docker!&quot;)</pre><p>You’ll see:</p><pre>Hello from inside Docker!</pre><h3>Folder Structure You Should Have</h3><pre>/myapp<br>  ├── app.py<br>  ├── requirements.txt<br>  └── Dockerfile</pre><p>Very simple and clean!</p><h3>Real-World Tip 💡</h3><p>Always copy requirements.txt first and install packages before copying your full code.<br>Because if you only change your code (not dependencies), Docker will <strong>cache</strong> earlier steps and build faster!</p><p>Setting environment variables like PYTHONUNBUFFERED=1 helps <strong>get real-time logs</strong>, especially useful when debugging.</p><p>Using --no-cache-dir in pip install makes your final image <strong>smaller and lighter</strong>.</p><h3>Final Thoughts</h3><p>Writing a Dockerfile might seem intimidating at first, but once you see it as a simple set of “steps” or a “recipe,” it becomes easy.</p><p>Every Dockerfile is basically:</p><ul><li>Pick a base</li><li>Copy your stuff</li><li>Install what’s needed</li><li>Start your app</li></ul><p>That’s it!</p><h3>📋 Quick Summary</h3><ul><li>Dockerfile = Instructions for building a Docker image</li><li>Build image: docker build -t myapp .</li><li>Run container: docker run myapp</li><li>Think of it like cooking: <strong>recipe ➔ dish ➔ serve!</strong></li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*9__iN9_OZFEE6l2ZPAiohQ.png" /></figure><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=22413c730512" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Getting Started with Docker and Docker Compose file]]></title>
            <link>https://medium.com/@tilak559/getting-started-with-docker-and-docker-compose-file-e95d16379843?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/e95d16379843</guid>
            <category><![CDATA[orchestration]]></category>
            <category><![CDATA[docker]]></category>
            <category><![CDATA[containers]]></category>
            <category><![CDATA[docker-compose]]></category>
            <category><![CDATA[image]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Sat, 26 Apr 2025 21:31:35 GMT</pubDate>
            <atom:updated>2025-04-26T21:31:35.702Z</atom:updated>
            <content:encoded><![CDATA[<p>When you’re new to the world of containers, two names pop up almost immediately: <strong>Docker</strong> and <strong>Docker Compose</strong>. At first, they can sound confusing are they the same thing? Do you need both?</p><p>Let’s break it down in simple words.</p><h3>What is Docker?</h3><p>Imagine you’re building a project maybe a website or an app. Normally, you need to set up everything: the programming language, the database, the server, libraries, and a hundred other things.<br>And if someone else wants to run your project, they need to set up everything exactly the same way.<br>That’s painful and messy.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*8NCLjOvP0UkDPU11.png" /></figure><p><strong>Docker fixes that.</strong></p><p>Docker is a tool that <strong>packages your application and everything it needs into a container</strong>.<br>Think of a container like a little box that has your app, your libraries, your system settings everything.<br>You can send this box anywhere (your friend’s laptop, a cloud server, etc.), and it will run <strong>exactly the same</strong> every time.</p><blockquote><strong>Wait, what’s an Image and what’s a Container?</strong></blockquote><blockquote>A <strong>Docker Image</strong> is like a blueprint — it’s the read-only template of your app (think of it like a recipe).</blockquote><blockquote>A <strong>Docker Container</strong> is the running instance of that image (like a dish you cooked using the recipe).</blockquote><p>You create an image once, and you can run many containers from it whenever you want.</p><h3>Quick Example:</h3><ul><li>You build a web app using Python and MySQL.</li><li>Instead of installing Python and MySQL everywhere, you “wrap” your app inside a Docker container.</li><li>Now you can ship your container and run it with a simple command — no messy installations needed!</li></ul><h3>What is Docker Compose?</h3><p>Okay, now imagine your project isn’t just one container.<br>Maybe you need:</p><ul><li>One container for your app (Python)</li><li>Another container for your database (MySQL)</li><li>Another container for a caching system (Redis)</li></ul><p>Managing all of them separately would be annoying: you’d have to start each one manually, connect them together, set environment variables, etc.</p><p><strong>Docker Compose solves that.</strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/0*vnYn72YKMfDoRtYB.jpeg" /></figure><p>Docker Compose is a tool that lets you <strong>define and run multi-container applications easily</strong>.<br>You simply write a <strong>docker-compose.yml file</strong> where you list out all the containers your app needs, and their settings.</p><blockquote><strong>What if containers need to save data?</strong> Sometimes containers need to store information — like database files or user uploads.</blockquote><blockquote>That’s where <strong>Volumes</strong> come in.<br>Volumes are like shared storage areas that survive even if a container stops or gets deleted.</blockquote><blockquote>In Docker Compose, you can easily define volumes too!</blockquote><h3>What’s the Difference Between Docker and Docker Compose?</h3><h4><strong>Docker</strong></h4><ul><li>It is for running individual containers.</li><li>Good for small/simple apps</li><li>Container settings are given in the command line</li></ul><h4><strong>Docker Compose</strong></h4><ul><li>It is for orchestrating <strong>many</strong> containers at once.</li><li>Great for bigger apps needing multiple services (like app + database + cache)</li><li>Settings are written in a YAML file</li></ul><h3>When Should You Use Each?</h3><ul><li><strong>Use Docker</strong> if you just have a <strong>single app</strong> — like a small script, a single website, or a simple backend server.</li><li><strong>Use Docker Compose</strong> when your app <strong>needs multiple things</strong> — like a server, a database, a message queue, etc.</li></ul><h3>Real-Life Examples</h3><ul><li>You built a simple API in Python Flask?<br>→ A single Docker container is enough.</li><li>You built a full web application that needs a web server, a database, and a Redis cache?<br>→ You should set up a Docker Compose file.</li></ul><h3>Advantages</h3><p>✅ <strong>Portability:</strong><br>Both Docker and Docker Compose make it super easy to move your project between computers without worrying about “It works on my machine” problems.</p><p>✅ <strong>Consistency:</strong><br>Every environment (development, testing, production) can behave exactly the same if you use containers.</p><p>✅ <strong>Isolation:</strong><br>Each container runs its own little world — no conflicts between different apps or versions.</p><p>✅ <strong>Scaling:</strong><br>Docker Compose can also help you scale — you can easily run multiple copies of a service if needed.</p><h3>Final Thoughts</h3><p>Docker and Docker Compose make modern app development cleaner, faster, and a lot less frustrating.<br>At first, they might seem a bit technical, but once you understand the basic ideas “pack everything into a box” and “manage multiple boxes together” it becomes really fun and powerful.</p><p><strong>Quick Summary:</strong></p><ul><li>Images = Blueprint</li><li>Containers = Running app</li><li>Volumes = Persistent storage</li></ul><p>In the next parts of the blog, we’ll dive deeper into how to create a basic Dockerfile and a Docker Compose file, with step-by-step examples!</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=e95d16379843" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Mastering Power BI DAX: From Basics to Advanced Calculations]]></title>
            <link>https://medium.com/@tilak559/mastering-power-bi-dax-from-basics-to-advanced-calculations-463ddb7378fc?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/463ddb7378fc</guid>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Mon, 23 Dec 2024 08:21:32 GMT</pubDate>
            <atom:updated>2024-12-23T08:21:32.321Z</atom:updated>
            <content:encoded><![CDATA[<p>Power BI’s Data Analysis Expressions (DAX) is a powerful formula language that enables users to create custom calculations and data analysis within Power BI. In this guide, we will take you from the basics to advanced concepts, so by the end, you’ll be equipped to write your own DAX calculations with confidence.</p><h3>What is DAX?</h3><p>DAX stands for Data Analysis Expressions. It is a collection of functions, operators, and constants that you can use to create formulas and expressions in Power BI, Power Pivot, and Analysis Services. DAX is used to perform calculations on data in tabular models.</p><p>Key highlights:</p><ul><li>DAX operates on <strong>columns and tables</strong> rather than individual cells.</li><li>It’s designed to work with relational data.</li><li>DAX formulas are used in calculated columns, calculated tables, and measures.</li></ul><h3>Getting Started with DAX</h3><h3>1. Syntax Basics</h3><p>A DAX formula begins with an equals sign = and can include functions, operators, and references to columns or tables.</p><p>Example:</p><pre>=SUM(Sales[Amount])</pre><p>This formula sums up the values in the Amount column of the Sales table.</p><h3>2. Commonly Used Functions</h3><h4>Aggregation Functions:</h4><ul><li>SUM: Adds all numbers in a column.</li><li>AVERAGE: Calculates the mean.</li><li>COUNT: Counts the number of rows.</li><li>MAX and MIN: Find the maximum and minimum values.</li></ul><h4>Logical Functions:</h4><ul><li>IF: Performs conditional logic.</li></ul><pre>=IF(Sales[Amount] &gt; 1000, &quot;High&quot;, &quot;Low&quot;)</pre><ul><li>AND, OR, NOT: Combine or negate conditions.</li></ul><h4>Text Functions:</h4><ul><li>CONCATENATE: Joins two strings.</li></ul><pre>=CONCATENATE(Customer[FirstName], Customer[LastName])</pre><ul><li>LEFT, RIGHT, MID: Extract parts of strings.</li></ul><h3>Intermediate DAX Concepts</h3><h3>1. Calculated Columns vs Measures</h3><ul><li><strong>Calculated Columns</strong>: Add new data to the table; values are computed row by row. Example:</li></ul><pre>Profit = Sales[Revenue] - Sales[Cost]</pre><ul><li><strong>Measures</strong>: Perform calculations on aggregated data; results change based on the context of the visualization. Example:</li></ul><pre>Total Sales = SUM(Sales[Revenue])</pre><h3>2. Contexts in DAX</h3><p>Understanding context is crucial for writing effective DAX formulas.</p><ul><li><strong>Row Context</strong>: Applies to calculated columns and iterates through rows of a table.</li><li><strong>Filter Context</strong>: Comes into play with measures and is determined by filters applied in visuals, slicers, or other DAX calculations.</li></ul><p>Example of Filter Context:</p><pre>Sales for Region = CALCULATE(SUM(Sales[Amount]), Region[Name] = &quot;West&quot;)</pre><p>This calculates total sales for the “West” region.</p><h3>3. Iterators</h3><p>DAX includes functions that iterate over rows to perform calculations.</p><ul><li>SUMX: Iterates over a table and evaluates an expression for each row.</li></ul><pre>=SUMX(Sales, Sales[Revenue] - Sales[Cost])</pre><ul><li>AVERAGEX, MAXX, MINX: Similar to SUMX but perform other aggregations.</li></ul><h3>Advanced DAX Techniques</h3><h3>1. Time Intelligence Functions</h3><p>DAX offers built-in functions to work with time-based data.</p><ul><li>TOTALYTD, TOTALMTD, TOTALQTD:</li></ul><pre>=TOTALYTD(SUM(Sales[Revenue]), Calendar[Date])</pre><ul><li>DATESYTD, DATESBETWEEN, PREVIOUSYEAR:</li></ul><pre>=CALCULATE(SUM(Sales[Amount]), PREVIOUSYEAR(Calendar[Date]))</pre><h3>2. Advanced Filtering with CALCULATE</h3><p>The CALCULATE function modifies the filter context.</p><p>Example:</p><pre>Sales in 2023 = CALCULATE(SUM(Sales[Amount]), Year[Year] = 2023)</pre><h3>3. Dynamic Measures</h3><p>Use dynamic calculations that respond to user selections.</p><p>Example:</p><pre>Dynamic Sales = IF(SELECTEDVALUE(Region[Name]) = &quot;East&quot;, SUM(Sales[Amount]), 0)</pre><h3>4. Variables in DAX</h3><p>Variables simplify complex calculations by allowing you to store intermediate results.</p><p>Example:</p><pre>Profit Margin =<br>VAR TotalCost = SUM(Sales[Cost])<br>VAR TotalRevenue = SUM(Sales[Revenue])<br>RETURN (TotalRevenue - TotalCost) / TotalRevenue</pre><h3>5. Advanced Table Functions</h3><ul><li>ADDCOLUMNS: Adds a calculated column to a table.</li><li>SUMMARIZE: Groups data and adds aggregations.</li><li>CROSSJOIN: Returns all combinations of two tables.</li></ul><p>Example:</p><pre>Grouped Sales = SUMMARIZE(Sales, Sales[Region], &quot;TotalSales&quot;, SUM(Sales[Amount]))</pre><h3>Best Practices for Writing DAX</h3><ol><li><strong>Start Simple</strong>: Begin with basic formulas and gradually layer complexity.</li><li><strong>Use Meaningful Names</strong>: Name measures and calculated columns clearly.</li><li><strong>Leverage Variables</strong>: Simplify calculations and improve readability.</li><li><strong>Optimize Performance</strong>: Use aggregations and filters efficiently to avoid unnecessary computation.</li><li><strong>Test Extensively</strong>: Validate your formulas in different contexts to ensure accuracy.</li></ol><h3>Conclusion</h3><p>Mastering DAX takes practice, but it is a rewarding skill for unlocking the full potential of Power BI. By understanding the basics, practicing with intermediate concepts, and exploring advanced techniques, you’ll be well-equipped to perform sophisticated data analysis and create impactful visualizations.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=463ddb7378fc" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[SQL Questions and Answers]]></title>
            <link>https://medium.com/@tilak559/sql-questions-and-answers-2c029997d1c2?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/2c029997d1c2</guid>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Mon, 23 Dec 2024 08:10:36 GMT</pubDate>
            <atom:updated>2024-12-23T08:10:36.880Z</atom:updated>
            <content:encoded><![CDATA[<h3>1. What is the difference between HAVING and WHERE?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*wKCgpjueBlZNNpHXzNfwZA.png" /></figure><p><strong>Example:</strong></p><pre>SELECT department_id, COUNT(*) AS employee_count<br>FROM employees<br>WHERE salary &gt; 50000<br>GROUP BY department_id<br>HAVING COUNT(*) &gt; 5;</pre><ul><li>WHERE salary &gt; 50000: Filters rows before grouping.</li><li>HAVING COUNT(*) &gt; 5: Filters groups after aggregation.</li></ul><h3>2. What is the difference between UNION and UNION ALL?</h3><p><strong>UNIONUNION ALL</strong>Combines results and removes duplicates.Combines results without removing duplicates.Slower due to duplicate elimination.Faster as no duplicate check is performed.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Jy8giQ0yACPNGR2He7237g.png" /></figure><p><strong>Example:</strong></p><pre>-- UNION: Removes duplicates<br>SELECT name FROM customers<br>UNION<br>SELECT name FROM suppliers;<br><br>-- UNION ALL: Includes duplicates<br>SELECT name FROM customers<br>UNION ALL<br>SELECT name FROM suppliers;</pre><ul><li>Use UNION when you need unique values.</li><li>Use UNION ALL for better performance when duplicates are acceptable.</li></ul><h3>3. What is the difference between GROUP BY and ORDER BY?</h3><p><strong>GROUP BYORDER BY</strong>Groups rows based on one or more columns.Sorts rows based on one or more columns.Used to perform aggregations.Used to organize the result set.The order of rows is not guaranteed.Ensures a specific order in the result.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*dRrGhNaI3OKk-8jufFXldw.png" /></figure><p><strong>Example:</strong></p><pre>-- GROUP BY: Summarize data<br>SELECT department_id, COUNT(*) AS employee_count<br>FROM employees<br>GROUP BY department_id;<br><br>-- ORDER BY: Sort results<br>SELECT name, salary<br>FROM employees<br>ORDER BY salary DESC;</pre><ul><li>GROUP BY is used for aggregation (e.g., COUNT, SUM).</li><li>ORDER BY sorts the final output.</li></ul><h3>4. What is the difference between DELETE and TRUNCATE?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*gV6klBA2cdKOoaUdsGJ8WA.png" /></figure><p><strong>Example:</strong></p><pre>-- DELETE specific rows<br>DELETE FROM employees WHERE department_id = 1;<br><br>-- TRUNCATE entire table<br>TRUNCATE TABLE employees;</pre><ul><li>Use DELETE when removing specific rows.</li><li>Use TRUNCATE for clearing all data in a table quickly.</li></ul><h3>5. What is the difference between a Primary Key and a Unique Key?</h3><p><strong>Primary KeyUnique Key</strong>Uniquely identifies each row in a table.Ensures unique values in a column or set of columns.Only one per table.Can have multiple unique keys per table.Implicitly NOT NULL.Can contain NULL (but only one NULL per unique key).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-VwkFDYWtXRmUv7OljYIDw.png" /></figure><p><strong>Example:</strong></p><pre>-- Primary Key<br>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50)<br>);<br><br>-- Unique Key<br>CREATE TABLE users (<br>    user_id INT,<br>    email VARCHAR(100) UNIQUE<br>);</pre><ul><li>Primary Key is the main identifier for a table.</li><li>Unique Key ensures data uniqueness without being the primary identifier.</li></ul><h3>6. What is the difference between INNER JOIN and OUTER JOIN?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Z7RFbTL3e3YUbNesK31k8A.png" /></figure><p><strong>Example:</strong></p><pre>-- INNER JOIN<br>SELECT e.name, d.department_name<br>FROM employees e<br>INNER JOIN departments d ON e.department_id = d.department_id;<br><br>-- OUTER JOIN<br>SELECT e.name, d.department_name<br>FROM employees e<br>LEFT JOIN departments d ON e.department_id = d.department_id;</pre><ul><li>INNER JOIN: Only rows with matches in both tables are included.</li><li>LEFT OUTER JOIN: Includes all rows from the left table, even if there’s no match in the right table.</li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2c029997d1c2" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Normalization and Denormalization in SQL: Simplified Explanation with Examples]]></title>
            <link>https://medium.com/@tilak559/normalization-and-denormalization-in-sql-simplified-explanation-with-examples-57e6f8935fea?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/57e6f8935fea</guid>
            <category><![CDATA[normalization]]></category>
            <category><![CDATA[denormalization]]></category>
            <category><![CDATA[sql]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Mon, 23 Dec 2024 07:56:19 GMT</pubDate>
            <atom:updated>2024-12-23T07:56:19.877Z</atom:updated>
            <content:encoded><![CDATA[<p>When designing a database, it’s essential to strike a balance between efficient storage and performance. This is where <strong>Normalization</strong> and <strong>Denormalization</strong> come into play. Normalization focuses on organizing data to reduce redundancy, while Denormalization aims to optimize query performance by reintroducing some redundancy. Let’s break these concepts down with simple explanations and examples.</p><h3>What is Normalization?</h3><p>Normalization is the process of organizing a database to reduce redundancy and improve data integrity. This is done by dividing data into smaller, related tables and establishing relationships between them. Normalization follows a series of steps called <strong>Normal Forms</strong> (1NF, 2NF, 3NF, etc.), each with specific rules to achieve a better-structured database.</p><h3>The Normal Forms</h3><h4>1. First Normal Form (1NF)</h4><p><strong>Rule:</strong> A table is in 1NF if:</p><ul><li>Each column contains atomic (indivisible) values.</li><li>Each row is unique.</li></ul><p><strong>Example:</strong> A table with multiple phone numbers in one column is not in 1NF:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*QTY9nwl6XGUSM2hUM5HV0A.png" /></figure><h4>2. Second Normal Form (2NF)</h4><p><strong>Rule:</strong> A table is in 2NF if:</p><ul><li>It is already in 1NF.</li><li>All non-key columns depend on the entire primary key (no partial dependency).</li></ul><p><strong>Example:</strong> Consider a table where Order_ID and Product_ID form the composite primary key:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*vt4EfwdQQs3KcYVEptZ9ig.png" /></figure><h4>3. Third Normal Form (3NF)</h4><p><strong>Rule:</strong> A table is in 3NF if:</p><ul><li>It is already in 2NF.</li><li>All non-key columns depend only on the primary key (no transitive dependency).</li></ul><p><strong>Example:</strong> Consider a table:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*bkla9UkiJcyY0vRcSE2_vA.png" /></figure><h3>What is Denormalization?</h3><p>Denormalization is the process of combining tables to optimize query performance, especially in analytical scenarios. It reduces the need for complex joins by intentionally introducing redundancy.</p><p><strong>Why Use Denormalization?</strong></p><ul><li>Faster read performance.</li><li>Simplified querying for reports or dashboards.</li></ul><p><strong>Example:</strong> Instead of normalized tables:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ryNObvLH8XxzEuKtpevCqA.png" /></figure><p><strong>Benefits of Denormalization:</strong></p><ul><li>Queries are simpler and faster.</li><li>Useful for read-heavy systems like reporting databases.</li></ul><p><strong>Drawbacks of Denormalization:</strong></p><ul><li>Increased storage usage.</li><li>Potential for data inconsistencies during updates</li></ul><h3>Normalization vs. Denormalization</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*geAuS3ObMVqU-tffW1mjyg.png" /></figure><h3>When to Use Normalization or Denormalization?</h3><p><strong>Use Normalization:</strong></p><ul><li>When data integrity and consistency are critical.</li><li>For systems with frequent data updates (e.g., transactional systems).</li></ul><p><strong>Use Denormalization:</strong></p><ul><li>When performance is a priority (e.g., analytical systems, reporting).</li><li>For systems with frequent reads and few updates.</li></ul><h3>Conclusion</h3><p>Normalization and Denormalization are complementary techniques for structuring a database. Normalization ensures data consistency and eliminates redundancy, while Denormalization improves performance in read-heavy scenarios. Choosing the right approach depends on the specific requirements of your system. By mastering these concepts, you can design efficient and reliable databases.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=57e6f8935fea" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Understanding Database Constraints: Ensuring Data Integrity in SQL]]></title>
            <link>https://medium.com/@tilak559/understanding-database-constraints-ensuring-data-integrity-in-sql-16ba4eecd957?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/16ba4eecd957</guid>
            <category><![CDATA[primary-keys]]></category>
            <category><![CDATA[constraints]]></category>
            <category><![CDATA[notnull]]></category>
            <category><![CDATA[foreign-key]]></category>
            <category><![CDATA[default]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Mon, 23 Dec 2024 07:48:03 GMT</pubDate>
            <atom:updated>2024-12-23T07:48:03.920Z</atom:updated>
            <content:encoded><![CDATA[<p>Database constraints are essential tools for maintaining data integrity and enforcing rules in SQL. They define rules for the data in a table, ensuring it remains accurate, consistent, and reliable. In this blog, we’ll cover key constraints like <strong>Primary Key</strong>, <strong>Foreign Key</strong>, <strong>Unique</strong>, <strong>Not Null</strong>, <strong>Check</strong>, and <strong>Default</strong>, with examples to illustrate their use.</p><h3>1. Primary Key</h3><p>A <strong>Primary Key</strong> uniquely identifies each row in a table. It cannot contain NULL values, and the values in the column(s) must be unique.</p><p><strong>Use Case:</strong> Ensuring that each row in a table has a unique identifier.</p><p><strong>Example:</strong></p><pre>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50),<br>    salary DECIMAL(10, 2)<br>);</pre><p><strong>Explanation:</strong></p><ul><li>The employee_id column is the primary key and must contain unique, non-null values.</li></ul><h3>2. Foreign Key</h3><p>A <strong>Foreign Key</strong> enforces a relationship between two tables. It ensures that the value in a column (or columns) matches a value in another table’s primary key.</p><p><strong>Use Case:</strong> Maintaining referential integrity between related tables.</p><p><strong>Example:</strong></p><pre>CREATE TABLE departments (<br>    department_id INT PRIMARY KEY,<br>    department_name VARCHAR(50)<br>);<br><br>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50),<br>    department_id INT,<br>    FOREIGN KEY (department_id) REFERENCES departments(department_id)<br>);</pre><p><strong>Explanation:</strong></p><ul><li>The department_id column in the employees table references the department_id column in the departmentstable.</li><li>Ensures that each employee belongs to a valid department.</li></ul><h3>3. Unique</h3><p>The <strong>Unique</strong> constraint ensures that all values in a column are unique. Unlike the primary key, a table can have multiple unique constraints.</p><p><strong>Use Case:</strong> Enforcing uniqueness for specific columns, such as email addresses.</p><p><strong>Example:</strong></p><pre>CREATE TABLE users (<br>    user_id INT PRIMARY KEY,<br>    email VARCHAR(100) UNIQUE,<br>    username VARCHAR(50) UNIQUE<br>);</pre><p><strong>Explanation:</strong></p><ul><li>Both email and username must have unique values across all rows in the users table.</li></ul><h3>4. Not Null</h3><p>The <strong>Not Null</strong> constraint ensures that a column cannot have NULL values.</p><p><strong>Use Case:</strong> Enforcing mandatory fields, such as names or dates.</p><p><strong>Example:</strong></p><pre>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50) NOT NULL,<br>    hire_date DATE NOT NULL<br>);</pre><p><strong>Explanation:</strong></p><ul><li>Both name and hire_date must have non-null values.</li></ul><h3>5. Check</h3><p>The <strong>Check</strong> constraint ensures that a column’s value meets a specified condition.</p><p><strong>Use Case:</strong> Validating data, such as ensuring salaries are within a valid range.</p><p><strong>Example:</strong></p><pre>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50),<br>    salary DECIMAL(10, 2),<br>    CHECK (salary &gt;= 30000 AND salary &lt;= 200000)<br>);</pre><p><strong>Explanation:</strong></p><ul><li>The salary column must have a value between 30,000 and 200,000.</li></ul><h3>6. Default</h3><p>The <strong>Default</strong> constraint sets a default value for a column when no value is provided.</p><p><strong>Use Case:</strong> Providing default values for optional fields.</p><p><strong>Example:</strong></p><pre>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50),<br>    hire_date DATE DEFAULT CURRENT_DATE<br>);</pre><p><strong>Explanation:</strong></p><p>If no value is specified for hire_date, it defaults to the current date.</p><h3>Combining Constraints</h3><p>You can combine multiple constraints in a single column to enforce complex rules.</p><p><strong>Example:</strong></p><pre>CREATE TABLE employees (<br>    employee_id INT PRIMARY KEY,<br>    name VARCHAR(50) NOT NULL,<br>    email VARCHAR(100) UNIQUE,<br>    salary DECIMAL(10, 2) CHECK (salary &gt;= 30000 AND salary &lt;= 200000),<br>    hire_date DATE DEFAULT CURRENT_DATE<br>);</pre><p><strong>Explanation:</strong></p><ul><li>employee_id is a primary key.</li><li>name cannot be null.</li><li>email must be unique.</li><li>salary is constrained to a valid range.</li><li>hire_date defaults to the current date if not provided.</li></ul><h3>Why Use Constraints?</h3><ol><li><strong>Ensure Data Integrity:</strong> Prevent invalid or inconsistent data from entering the database.</li><li><strong>Reduce Errors:</strong> Enforce business rules directly in the database.</li><li><strong>Simplify Application Logic:</strong> Move validation rules from application code to the database.</li></ol><h3>Best Practices</h3><ol><li><strong>Plan Constraints Carefully:</strong> Analyze your data requirements before defining constraints.</li><li><strong>Use Descriptive Names:</strong> When naming constraints (e.g., in ALTER TABLE), use meaningful names like chk_salary_range.</li><li><strong>Test Thoroughly:</strong> Ensure that constraints don’t conflict with valid data during insert or update operations.</li><li><strong>Document Constraints:</strong> Clearly document the purpose of each constraint for future reference.</li></ol><h3>Conclusion</h3><p>Database constraints are vital for maintaining the integrity, accuracy, and reliability of your data. By using constraints like <strong>Primary Key</strong>, <strong>Foreign Key</strong>, <strong>Unique</strong>, <strong>Not Null</strong>, <strong>Check</strong>, and <strong>Default</strong>, you can enforce rules directly at the database level, ensuring that your data adheres to your business requirements. Start incorporating these constraints into your database designs to create robust and error-resistant systems.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=16ba4eecd957" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Window Functions in SQL: A Complete Guide]]></title>
            <link>https://medium.com/@tilak559/window-functions-in-sql-a-complete-guide-199412f69c0d?source=rss-e88414f712dd------2</link>
            <guid isPermaLink="false">https://medium.com/p/199412f69c0d</guid>
            <category><![CDATA[window-functions]]></category>
            <category><![CDATA[rank]]></category>
            <category><![CDATA[dense-rank]]></category>
            <category><![CDATA[row-number]]></category>
            <dc:creator><![CDATA[Tilak Mudgal]]></dc:creator>
            <pubDate>Mon, 23 Dec 2024 07:44:17 GMT</pubDate>
            <atom:updated>2024-12-23T07:44:17.274Z</atom:updated>
            <content:encoded><![CDATA[<p>Window functions are powerful tools in SQL that allow you to perform calculations across a subset of rows related to the current row, without collapsing the data into a single result set. Unlike aggregate functions like SUM() or COUNT(), which return a single value for a group of rows, window functions retain the individual rows while adding additional analytical insights.</p><p>In this guide, we’ll explain what window functions are, their common use cases, and walk through examples of popular functions such as ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE(), LEAD(), and LAG().</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*JXIS2GtqVzrOQv4Tvn8yPA.png" /></figure><h3>What Are Window Functions?</h3><p>A window function performs a calculation across a set of rows (called a window) defined by a PARTITION BY clause. This &quot;window&quot; can be the entire dataset or a subset of it. The results of window functions are returned for each row in the result set.</p><p><strong>Syntax:</strong></p><pre>function_name(expression) OVER (<br>    [PARTITION BY column_name(s)]<br>    [ORDER BY column_name(s)]<br>)</pre><ul><li><strong>PARTITION BY</strong>: Divides the result set into subsets (like groups). Optional.</li><li><strong>ORDER BY</strong>: Specifies the order of rows within each partition.</li><li><strong>Function Name</strong>: Determines the type of calculation (e.g., ranking, summing).</li></ul><h3>Key Window Functions</h3><h4>1. ROW_NUMBER()</h4><p>Assigns a unique number to each row within a partition, starting at 1 for the first row.</p><p><strong>Use Case:</strong> Generate unique row numbers for each row in a dataset or partition.</p><p><strong>Example:</strong> Rank employees by hire date within each department.</p><pre>SELECT department_id, employee_name, hire_date,<br>       ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY hire_date) AS row_number<br>FROM employees;</pre><p><strong>Explanation:</strong></p><ul><li>PARTITION BY department_id: Groups rows by department.</li><li>ORDER BY hire_date: Assigns row numbers based on hire date within each department.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TMEzCkR4Bhgj12GYVrxsBw.png" /></figure><h4>2. RANK()</h4><p>Assigns a rank to each row within a partition. If there are ties, the same rank is given, and the next rank is skipped.</p><p><strong>Use Case:</strong> Determine rankings with ties in a competition or scores.</p><p><strong>Example:</strong> Rank employees by salary within each department.</p><pre>SELECT department_id, employee_name, salary,<br>       RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank<br>FROM employees;</pre><p><strong>Explanation:</strong></p><ul><li>PARTITION BY department_id: Groups rows by department.</li><li>ORDER BY salary DESC: Ranks employees by descending salary within each department.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*bQWfJBrQTi-uWekjcC-cMA.png" /></figure><h4>3. DENSE_RANK()</h4><p>Similar to RANK(), but without skipping ranks for ties.</p><p><strong>Use Case:</strong> Assign continuous ranks even when there are ties.</p><p><strong>Example:</strong></p><pre>SELECT department_id, employee_name, salary,<br>       DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank<br>FROM employees;</pre><p><strong>Difference from </strong><strong>RANK():</strong> If two employees in department 1 have the same salary, both will be ranked 1, and the next rank will be 2 (not 3).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ve_ZLJmRfdzHm5gjFBJSLw.png" /></figure><h4>4. NTILE(n)</h4><p>Divides rows into n equal-sized groups and assigns a bucket number to each row.</p><p><strong>Use Case:</strong> Distribute rows into quantiles (e.g., quartiles, deciles).</p><p><strong>Example:</strong> Divide employees into four salary quartiles.</p><pre>SELECT employee_name, salary,<br>       NTILE(4) OVER (ORDER BY salary DESC) AS quartile<br>FROM employees;</pre><p><strong>Explanation:</strong> Rows are divided into four groups, with each group assigned a quartile (1 to 4).</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*lSSZnEtD1AMfqjEQGVt_fg.png" /></figure><h4>5. LEAD() and LAG()</h4><ul><li><strong>LEAD()</strong>: Accesses data from the next row in the result set.</li><li><strong>LAG()</strong>: Accesses data from the previous row in the result set.</li></ul><p><strong>Use Case:</strong> Compare a value in one row to the previous or next row.</p><p><strong>Example:</strong> Compare each employee’s salary to the next highest salary.</p><pre>SELECT employee_name, salary,<br>       LEAD(salary) OVER (ORDER BY salary DESC) AS next_salary<br>FROM employees;</pre><p><strong>Explanation:</strong></p><ul><li>LEAD(salary): Returns the salary from the next row in descending order.</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*YPws0O2eU_8kQYKxVd-REw.png" /></figure><h4>6. SUM() (or Other Aggregate Functions with a Window)</h4><p>Calculates cumulative or running totals across a window.</p><p><strong>Use Case:</strong> Calculate cumulative sales totals by month.</p><p><strong>Example:</strong></p><pre>SELECT employee_name, salary,<br>       SUM(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS cumulative_salary<br>FROM employees;</pre><p><strong>Explanation:</strong></p><p>Calculates a running total of salaries within each department.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*svvW5dnOnci5q4dl3581gA.png" /></figure><h3>Best Practices for Window Functions</h3><ol><li><strong>Understand the Dataset:</strong> Ensure you partition and order rows appropriately for meaningful results.</li><li><strong>Avoid Overusing:</strong> Window functions can be computationally expensive; optimize queries with indexes.</li><li><strong>Combine Functions:</strong> Use multiple window functions together for advanced analytics.</li></ol><h3>Conclusion</h3><p>Window functions are invaluable for data analysis, providing deep insights while preserving individual rows in the result set. Whether ranking employees, calculating running totals, or comparing rows, window functions unlock new possibilities for SQL queries.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=199412f69c0d" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>