<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Das Sudeept on Medium]]></title>
        <description><![CDATA[Stories by Das Sudeept on Medium]]></description>
        <link>https://medium.com/@das.sudeept?source=rss-999fdebe56f------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*65DTXlZLLCNZuGIowl7a2g.png</url>
            <title>Stories by Das Sudeept on Medium</title>
            <link>https://medium.com/@das.sudeept?source=rss-999fdebe56f------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 30 May 2026 17:34:01 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@das.sudeept/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Parking Lot System: A Complete Low-Level Design Walkthrough for Machine Coding Interviews]]></title>
            <link>https://medium.com/javarevisited/parking-lot-system-a-complete-low-level-design-walkthrough-for-machine-coding-interviews-387d0fcc36ec?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/387d0fcc36ec</guid>
            <category><![CDATA[spring-boot]]></category>
            <category><![CDATA[low-level-design]]></category>
            <category><![CDATA[interview]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[java]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Tue, 28 Apr 2026 21:26:21 GMT</pubDate>
            <atom:updated>2026-04-29T08:35:38.748Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*uVi-4JzbplIkWreELls6UA.jpeg" /></figure><p>This article is a focused revision guide. I’ll walk through the key design decisions, the class structure, the patterns used, and the edge cases that separate an average attempt from a great one.</p><h3>The Problem at a Glance</h3><p>Design a Parking Lot system that:</p><ul><li>Has <strong>multiple floors</strong>, each with <strong>multiple spots</strong></li><li>Supports <strong>different vehicle types : </strong>Motorcycle, Car, Truck</li><li>Has <strong>different spot sizes</strong> : Small, Medium, Large</li><li>Issues a <strong>ticket on entry</strong> and calculates a <strong>fee on exit</strong></li><li>Tracks <strong>real-time availability</strong> of spots</li></ul><p>Sounds manageable, right? The real challenge is doing it in a way that is clean, extensible, and doesn’t collapse into a single God class.</p><h3>Step 1 : Identify the Entities</h3><p>Before writing any code, pause and list the nouns in the problem. These become your classes.</p><ul><li><strong>Vehicle</strong> (Motorcycle, Car, Truck)</li><li><strong>ParkingSpot</strong> (has a size, a floor, an availability status)</li><li><strong>ParkingFloor</strong> (a collection of spots)</li><li><strong>ParkingLot</strong> (the whole system of multiple floors, gates)</li><li><strong>ParkingTicket</strong> (issued at entry, presented at exit)</li><li><strong>EntryGate</strong> and <strong>ExitGate</strong></li><li><strong>FeeStrategy</strong> (how you charge)</li><li><strong>SpotAllocationStrategy</strong> (how you assign spots)</li></ul><p>Getting this list right in the first few minutes of an interview immediately signals structured thinking to the interviewer.</p><h3>Step 2 : Model the Relationships</h3><pre>ParkingLot<br>  ├── has many ParkingFloors<br>  │     └── each has many ParkingSpots<br>  ├── has EntryGates<br>  └── has ExitGates</pre><pre>ParkingTicket<br>  ├── belongs to a Vehicle<br>  └── points to a ParkingSpot</pre><p>A ParkingTicket is the bridge between entry and exit. At entry you issue it; at exit you look it up, calculate the duration, charge the fee, and free the spot.</p><h3>Step 3 : The Core Classes</h3><h3>Vehicle Hierarchy</h3><pre>public abstract class Vehicle {<br>    private String licensePlate;<br>    private VehicleType type;<br>    public Vehicle(String licensePlate, VehicleType type) {<br>        this.licensePlate = licensePlate;<br>        this.type = type;<br>    }<br>    public VehicleType getType() { return type; }<br>    public String getLicensePlate() { return licensePlate; }<br>}<br>public class Car extends Vehicle {<br>    public Car(String licensePlate) {<br>        super(licensePlate, VehicleType.CAR);<br>    }<br>}</pre><p>Keep it simple. The vehicle hierarchy exists primarily so vehicle type can be used to determine spot size. Don’t over-engineer it.</p><h3>ParkingSpot</h3><pre>public class ParkingSpot {<br>    private final String spotId;<br>    private final SpotSize size;<br>    private final int floorNumber;<br>    private boolean isAvailable;<br>    private Vehicle parkedVehicle;<br>    public synchronized boolean assignVehicle(Vehicle vehicle) {<br>        if (!isAvailable) return false;<br>        this.parkedVehicle = vehicle;<br>        this.isAvailable = false;<br>        return true;<br>    }<br>    public synchronized void freeSpot() {<br>        this.parkedVehicle = null;<br>        this.isAvailable = true;<br>    }<br>}</pre><p>Notice the synchronized on mutation methods, spots are a shared resource and concurrent access is a real concern.</p><h3>ParkingTicket</h3><pre>public class ParkingTicket {<br>    private final String ticketId;<br>    private final Vehicle vehicle;<br>    private final ParkingSpot spot;<br>    private final LocalDateTime entryTime;<br>    private LocalDateTime exitTime;<br>    private double fee;<br>// constructor, getters...<br>}</pre><p>Tickets are immutable at creation (except for exit time and fee, populated at checkout). Never store just the vehicle plate. Store the entire ParkingSpot reference so the exit gate can free it directly.</p><h3>ParkingLot : Singleton</h3><pre>public class ParkingLot {<br>    private static ParkingLot instance;<br>    private final List&lt;ParkingFloor&gt; floors;<br>    private final Map&lt;String, ParkingTicket&gt; activeTickets;<br>    private ParkingLot() {<br>        floors = new ArrayList&lt;&gt;();<br>        activeTickets = new ConcurrentHashMap&lt;&gt;();<br>    }<br>    public static synchronized ParkingLot getInstance() {<br>        if (instance == null) instance = new ParkingLot();<br>        return instance;<br>    }<br>}</pre><p>The Parking Lot is a natural Singleton, there’s only one lot. Use ConcurrentHashMap for the active tickets map if you&#39;re supporting concurrent access.</p><h3>Step 4 : Strategy Pattern for Allocation and Fee</h3><p>This is where many candidates stumble. The temptation is to hardcode spot assignment logic inside EntryGate and fee logic inside ExitGate. Don&#39;t.</p><h3>Spot Allocation Strategy</h3><pre>public interface SpotAllocationStrategy {<br>    Optional&lt;ParkingSpot&gt; allocate(List&lt;ParkingFloor&gt; floors, VehicleType type);<br>}<br>public class NearestFirstAllocationStrategy implements SpotAllocationStrategy {<br>    @Override<br>    public Optional&lt;ParkingSpot&gt; allocate(List&lt;ParkingFloor&gt; floors, VehicleType type) {<br>        SpotSize required = getRequiredSize(type);<br>        return floors.stream()<br>            .flatMap(floor -&gt; floor.getSpots().stream())<br>            .filter(spot -&gt; spot.isAvailable() &amp;&amp; spot.getSize() == required)<br>            .findFirst();<br>    }<br>    private SpotSize getRequiredSize(VehicleType type) {<br>        return switch (type) {<br>            case MOTORCYCLE -&gt; SpotSize.SMALL;<br>            case CAR        -&gt; SpotSize.MEDIUM;<br>            case TRUCK      -&gt; SpotSize.LARGE;<br>        };<br>    }<br>}</pre><p>Now if tomorrow you need a “Handicap Nearest” strategy or a “Load Balanced” strategy, you implement a new class. You don’t touch existing code. That’s the Open/Closed Principle in action.</p><h3>Fee Strategy</h3><pre>public interface FeeStrategy {<br>    double calculate(ParkingTicket ticket);<br>}<br>public class HourlyFeeStrategy implements FeeStrategy {<br>    private final double ratePerHour;<br>    public HourlyFeeStrategy(double ratePerHour) {<br>        this.ratePerHour = ratePerHour;<br>    }<br>    @Override<br>    public double calculate(ParkingTicket ticket) {<br>        long minutes = Duration.between(ticket.getEntryTime(), LocalDateTime.now()).toMinutes();<br>        long hours = (minutes / 60) + 1; // round up to nearest hour<br>        return hours * ratePerHour;<br>    }<br>}</pre><p>You can easily swap in a DayRateFeeStrategy or a WeekendSurgeFeeStrategy without touching the exit gate at all.</p><h3>Step 5 — Entry and Exit Gates</h3><pre>public class EntryGate {<br>    private final SpotAllocationStrategy allocationStrategy;<br>    public EntryGate(SpotAllocationStrategy allocationStrategy) {<br>        this.allocationStrategy = allocationStrategy;<br>    }<br>    public ParkingTicket parkVehicle(Vehicle vehicle, List&lt;ParkingFloor&gt; floors) {<br>        Optional&lt;ParkingSpot&gt; spot = allocationStrategy.allocate(floors, vehicle.getType());<br>        if (spot.isEmpty()) throw new ParkingLotFullException(&quot;No spot available for &quot; + vehicle.getType());<br>        spot.get().assignVehicle(vehicle);<br>        return new ParkingTicket(vehicle, spot.get());<br>    }<br>}<br>public class ExitGate {<br>    private final FeeStrategy feeStrategy;<br>    public ExitGate(FeeStrategy feeStrategy) {<br>        this.feeStrategy = feeStrategy;<br>    }<br>    public double processExit(ParkingTicket ticket) {<br>        double fee = feeStrategy.calculate(ticket);<br>        ticket.setFee(fee);<br>        ticket.setExitTime(LocalDateTime.now());<br>        ticket.getSpot().freeSpot();<br>        return fee;<br>    }<br>}</pre><p>Each gate has a single responsibility. They depend on interfaces, not concrete implementations. This is dependency inversion at its cleanest.</p><h3>Step 6 — Enums Over Magic Strings</h3><p>Always model fixed sets of values as enums.</p><pre>public enum VehicleType {<br>    MOTORCYCLE, CAR, TRUCK<br>}<br>public enum SpotSize {<br>    SMALL, MEDIUM, LARGE<br>}</pre><p>Avoid strings like &quot;car&quot; or &quot;medium&quot; scattered across the codebase. Enums give you compile-time safety and enable switch expressions cleanly.</p><h3>The Key Design Decisions — Quick Reference</h3><p>Decision Choice Why ParkingLot lifecycle Singleton Only one lot exists Spot assignment Strategy interface Swap algorithms without changing gates Fee calculation Strategy interface New pricing models without touching exit logic Vehicle types Inheritance Common base, type-specific behaviour Thread safety synchronized + ConcurrentHashMap Spots are shared resources Ticket storage Map in ParkingLot O(1) lookup at exit</p><h3>Edge Cases You Must Handle</h3><p><strong>1. Lot is full</strong> Throw a meaningful ParkingLotFullException rather than returning null. Null-returns are silent failures — exceptions are explicit.</p><p><strong>2. Invalid ticket at exit</strong> Validate the ticket ID exists in the active tickets map before proceeding. Throw InvalidTicketException with the ticket ID in the message.</p><p><strong>3. Vehicle already parked</strong> Track active license plates and reject a vehicle that’s already inside.</p><p><strong>4. Zero-duration exit</strong> The fee calculation should handle 0 minutes gracefully round up to 1 hour minimum is a reasonable real-world rule.</p><p><strong>5. Concurrent entry</strong> Two cars hitting Entry Gate 1 and Entry Gate 2 simultaneously could get assigned the same spot if assignVehicle isn&#39;t synchronized.</p><h3>What Interviewers Look For</h3><p><strong>Modelling clarity</strong> — Can you identify entities without being prompted? Can you articulate why a ParkingTicket exists as a separate class?</p><p><strong>Separation of concerns</strong> — Does each class do one thing? Is the ParkingLot class clean, or is it a dumping ground for all logic?</p><p><strong>Extensibility</strong> — If I ask you to add a new vehicle type mid-interview, can you do it without rewriting existing logic?</p><p><strong>Real-world thinking</strong> — Do you bring up thread safety without being asked? Do you think about overflow scenarios?</p><p><strong>Code quality</strong> — Are your names meaningful? Is there dead code? Are exceptions informative?</p><h3>What to Do in the First 10 Minutes of the Interview</h3><ol><li><strong>Clarify scope</strong> — How many floors? Is there a display board? Payment modes? (Understand what’s in/out)</li><li><strong>List entities</strong> — Say them out loud: Vehicle, Spot, Floor, Ticket, Gate</li><li><strong>Sketch the class diagram</strong> — Even rough boxes and arrows show structured thinking</li><li><strong>Start with models</strong>, then services, then strategies — work bottom-up</li><li><strong>Mention thread safety</strong> — even if you don’t implement it, naming the problem earns points</li></ol><h3>Summary</h3><p>The Parking Lot problem isn’t about memorising a template — it’s about demonstrating that you can take a vague real-world system and decompose it into clean, maintainable, extensible code under pressure. The Strategy pattern for fee and allocation, Singleton for the lot, proper ticket lifecycle management, and handling edge cases gracefully are the four things that elevate a solution from “decent” to “strong hire.”</p><p>Build it, extend it, break it with edge cases, then fix it. That’s the practice loop that actually prepares you.</p><p><em>Find the full source code on GitHub: </em><a href="https://github.com/sudeept-das/Machine_Coding_Parking_Lot"><em>Machine_Coding_Parking_Lot</em></a></p><h3>Grab these resources: 🛒 Full Editions (use code FRIENDS20 for 20% off):</h3><p>Grokking the Java Interview: <a href="https://javinpaul.gumroad.com/l/QqjGH">link</a><br>Grokking the Spring Boot Interview: <a href="https://gumroad.com/a/797676435/hrUXKY">link</a><br>250+ Spring Professional Certification Practice Questions: <a href="https://gumroad.com/a/797676435/sygyq">link</a></p><h3>🆓 Try before you buy — Free Sample Copies:</h3><p><a href="https://gumroad.com/a/797676435/HMOAv">Grokking the Java Interview [Free Sample Copy]</a><br><a href="https://gumroad.com/a/797676435/pfolo">Grokking the Spring Boot Interview [Free Sample Copy]</a><br><a href="https://gumroad.com/a/797676435/qelhye">Spring Boot Certification Practice Questions [Free Sample Copy]</a></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=387d0fcc36ec" width="1" height="1" alt=""><hr><p><a href="https://medium.com/javarevisited/parking-lot-system-a-complete-low-level-design-walkthrough-for-machine-coding-interviews-387d0fcc36ec">Parking Lot System: A Complete Low-Level Design Walkthrough for Machine Coding Interviews</a> was originally published in <a href="https://medium.com/javarevisited">Javarevisited</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[The Five Latency Worlds Every Backend Engineer Should Understand]]></title>
            <link>https://medium.com/@das.sudeept/the-five-latency-worlds-every-backend-engineer-should-understand-c41eb4c72a66?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/c41eb4c72a66</guid>
            <category><![CDATA[software-architecture]]></category>
            <category><![CDATA[design-systems]]></category>
            <category><![CDATA[system-design-interview]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[interview]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Tue, 10 Mar 2026 05:15:09 GMT</pubDate>
            <atom:updated>2026-03-10T05:15:09.697Z</atom:updated>
            <content:encoded><![CDATA[<p>Modern software systems feel complex.</p><p>We talk about:</p><ul><li>microservices</li><li>distributed systems</li><li>databases</li><li>caching layers</li><li>queues and event streams</li></ul><p>But underneath all of this complexity lies a much simpler reality:</p><p>Every system is governed by latency.</p><p>The time required to access data changes dramatically depending on where that data lives.<br>Sometimes the difference is millions of times.</p><p>Experienced engineers carry a mental model called back-of-the-envelope latency numbers — rough estimates that help reason about performance instantly.</p><p>But memorizing a table of numbers is not enough.</p><p>The real insight comes from understanding the five latency worlds that exist inside every modern system:</p><ol><li>CPU World</li><li>Fast I/O World</li><li>Service Layer World</li><li>Database World</li><li>Network-Dominated World</li></ol><p>Each world operates on a completely different time scale.</p><p>Understanding these worlds can fundamentally change how you design backend systems.</p><h3>The Latency Ladder</h3><p>Let’s start with the classic latency numbers engineers often memorize.</p><p>All values below are written in seconds using scientific notation.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/944/1*4l0JkmsqbgahEm3uwtbtGQ.png" /></figure><p>Looking at the table alone can be overwhelming.</p><p>Instead, it helps to group these numbers into latency worlds.</p><h3>World 1: CPU World (10⁻¹⁰ → 10⁻⁸ seconds)</h3><p>The CPU world represents operations that occur inside the processor itself.</p><p>This is the fastest possible environment for computation.</p><p>Typical operations include:</p><ul><li>register access</li><li>L1 cache access</li><li>L2 cache access</li><li>L3 cache access</li></ul><p>The CPU memory hierarchy looks like this:</p><p>CPU Core -&gt; L1 Cache -&gt; L2 Cache -&gt; L3 Cache -&gt; RAM</p><p>Each step further away from the CPU increases latency significantly.</p><p>Typical sizes:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*CohGzt7cteJqogCIlnb5vQ.png" /></figure><p>This hierarchy exists for a simple reason:</p><p>CPUs are much faster than memory.</p><p>If every memory access required going to RAM, modern processors would spend most of their time waiting.</p><h3>Example: CPU Cache Behavior</h3><p>Consider a simple Java loop:</p><pre>int sum = 0;<br>for (int i = 0; i &lt; 1000; i++) {<br>  sum += i;<br>}</pre><p>The variables sum and i are extremely hot.</p><p>The CPU keeps them in:</p><p>CPU registers or L1 cache</p><p>That means the CPU can access them in roughly:</p><p>10⁻⁹ seconds</p><p>That is incredibly fast.</p><h3>World 2: Fast I/O World (10⁻⁸ → 10⁻⁵ seconds)</h3><p>The next latency tier involves interactions with the operating system or hardware devices.</p><p>Typical operations include:</p><ul><li>mutex locks</li><li>thread synchronization</li><li>context switching</li><li>kernel system calls</li><li>sending packets through network interfaces</li></ul><p>Example latencies:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/964/1*2uVDrt9hhowdlwlZjlhlqQ.png" /></figure><p>Although these operations are slower than CPU cache access, they are still extremely fast compared to disk or database operations.</p><p>This world represents the boundary between CPU execution and system interaction.</p><h3>World 3: Service Layer World (10⁻⁴ → 10⁻³ seconds)</h3><p>This is where most backend engineers spend their time.</p><p>Typical operations in the service layer include:</p><ul><li>request parsing</li><li>validation</li><li>authentication</li><li>object creation</li><li>serialization</li><li>internal caching</li><li>business logic execution</li></ul><p>A typical service request flow might look like this:</p><p>HTTP request arrives -&gt; Request parsing -&gt; Input validation -&gt; Business logic -&gt; Database request</p><p>The service layer typically executes within:</p><p>100 microseconds → 1 millisecond</p><p>Compared to CPU cache operations, this is already thousands of times slower.</p><p>But it is still fast enough to serve high-throughput systems.</p><h3>World 4: Database World (10⁻³ → 10⁻² seconds)</h3><p>Databases introduce additional latency because they involve:</p><ul><li>query planning</li><li>index traversal</li><li>transaction management</li><li>disk access</li><li>concurrency control</li></ul><p>Typical operations:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/836/1*eNg5LbA0Or1xOyXnHy0NTA.png" /></figure><p>This is why caching layers are so important.</p><p>If your service performs multiple database calls per request, latency can increase very quickly.</p><p>Many high-scale architectures therefore rely on:</p><ul><li>Redis</li><li>Memcached</li><li>in-memory caches</li></ul><p>to reduce database load.</p><h3>World 5: Network-Dominated Systems (10⁻³ → 10⁻¹ seconds)</h3><p>The final latency world appears when systems communicate across machines.</p><p>Network operations introduce several delays:</p><ul><li>packet transmission</li><li>routing</li><li>serialization</li><li>congestion control</li><li>round trip time</li></ul><p>Typical network latencies:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/906/1*v5uEZmOwivV0mdH0FJe5EQ.png" /></figure><p>These delays are largely governed by the speed of light and network infrastructure.</p><p>For example:</p><p>San Francisco → London round trip ≈ 80 ms</p><p>This explains why globally distributed systems require:</p><ul><li>edge caches</li><li>regional replicas</li><li>CDN layers</li></ul><p>to maintain low latency.</p><h3>The Orders of Magnitude Problem</h3><p>The most important takeaway is how quickly latency grows.</p><pre>CPU cache access ~10⁻⁹ s<br>RAM access ~10⁻⁷ s<br>SSD read ~10⁻⁴ s<br>Database query ~10⁻³ s<br>Network request ~10⁻¹ s</pre><p>This massive gap explains many architectural decisions in modern systems.</p><p>For example:</p><ul><li>why caching dramatically improves performance</li><li>why batching operations helps throughput</li><li>why distributed systems are difficult to optimize</li></ul><h3>A Simple Java Experiment</h3><p>Although we cannot directly control CPU caches in Java, we can approximate cache behavior by changing dataset size.</p><pre>public class CacheDemo {<br>  static int[] small = new int[1024]; // ~4 KB<br>  static int[] medium = new int[32000]; // ~128 KB<br>  static int[] large = new int[4_000_000]; // ~16 MB<br>  <br>  static void run(int[] arr) {<br>    long start = System.nanoTime();<br>    long sum = 0;<br>    for (int i = 0; i &lt; arr.length; i++) {<br>      sum += arr[i];<br>    }<br>    long end = System.nanoTime();<br>    System.out.println(end - start);<br>  }<br>  <br>  public static void main(String[] args) {<br>    run(small);<br>    run(medium);<br>    run(large);<br>  }<br>}</pre><p>As the dataset grows, data moves through the memory hierarchy:</p><p>L1 → L2 → L3 → RAM</p><p>Increasing latency.</p><h3>The Mental Shortcut Engineers Use</h3><p>Many senior engineers remember a simplified ratio:</p><p>L1 : L2 : L3 : RAM : SSD : Network</p><p>1 : 4 : 30 : 100 : 10,000 : 1,000,000</p><p>This rough model allows you to estimate performance in seconds without running benchmarks.</p><h3>The Most Important Insight</h3><p>Most performance problems are not caused by slow CPUs.</p><p>CPUs are incredibly fast.</p><p>The real bottleneck is usually data movement.</p><p>Every time the CPU needs data, it asks:</p><p>Is the data in L1?</p><p>Is it in L2?</p><p>Is it in L3?</p><p>Do I need RAM?</p><p>Do I need disk?</p><p>Do I need another machine?</p><p>Each step adds orders of magnitude more latency.</p><p>Great systems minimize how often data must travel across these boundaries.</p><h3>Final Thoughts</h3><p>Understanding latency worlds changes how you think about system design.</p><p>Whenever you build a system, ask a simple question:</p><p>Where does the data live?</p><p>Because the difference between <strong>10⁻⁹ seconds and 10⁻¹ seconds</strong> is the difference between <strong>a system that feels instant and one that feels slow.</strong></p><p>Sometimes the biggest architectural improvements come from something as small as a few nanoseconds.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c41eb4c72a66" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Distributed Message Queues — What Actually Matters in Production]]></title>
            <link>https://medium.com/techtrends-digest/distributed-message-queues-what-actually-matters-in-production-2d7e8b015bc0?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/2d7e8b015bc0</guid>
            <category><![CDATA[message-queue]]></category>
            <category><![CDATA[design]]></category>
            <category><![CDATA[distributed-systems]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[interview]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Mon, 26 Jan 2026 18:36:32 GMT</pubDate>
            <atom:updated>2026-03-15T09:46:59.439Z</atom:updated>
            <content:encoded><![CDATA[<h3>Distributed Message Queues, What Actually Matters in Production</h3><p>Microservices sound clean on slides.<br>In production, they mostly fail at <strong>communication</strong>.</p><p>When teams move from a monolith to microservices, the same problems appear every time:</p><ul><li>Services become <strong>tightly coupled</strong> through synchronous APIs</li><li>Scaling one service forces others to scale</li><li>One service outage cascades into many</li><li>User-facing requests wait on slow downstream systems</li></ul><p><strong>Distributed message queues exist to break these dependencies.</strong><br>They let services coordinate <em>without calling each other directly</em>.</p><h3>Message Queues vs Event Streaming (Stop Mixing Them Up)</h3><p>These are often grouped together as “messaging,” but they solve <strong>different problems</strong>.</p><h4>Message Queues → Work execution</h4><p>Examples: RabbitMQ, Amazon SQS, ActiveMQ, Redis queues<br><strong>Flow:</strong><br>Producer → Queue → Consumer processes → message disappears<br><strong>Mental model:<br></strong><em>“Do this task and forget about it.”<br></em>Use message queues when:<br> — exactly one worker should do the work<br> — retries matter<br> — background jobs must not block users</p><h4>Event Streaming → Event history</h4><p>Examples: Kafka, Pulsar, Kinesis<br><strong>Flow:</strong><br>Producer writes events → events stay for retention → many consumers read independently<br><strong>Mental model:<br></strong><em>“This happened. Anyone interested can react.”<br></em>Use event streams when:<br> — multiple systems react to the same event<br> — replaying history matters<br> — analytics, auditing, and reprocessing are required</p><h3>Core Ideas</h3><ul><li><strong>Point-to-point messaging</strong> delivers each message to exactly one consumer.</li><li><strong>Publish-subscribe</strong> delivers each event to all subscribed consumers.</li><li>A <strong>topic</strong> is a named channel for messages or events.</li><li><strong>Partitions</strong> split a topic into ordered logs for parallelism.</li><li><strong>Brokers</strong> store partitions and serve reads and writes.</li><li>A <strong>consumer group</strong> cooperates so each partition is handled by one consumer.</li><li><strong>Message storage</strong> must favor sequential writes and ordered reads.</li><li>A <strong>producer</strong> decides where messages go and retries safely on failure.</li><li>A <strong>consumer</strong> tracks progress using offsets to resume after crashes.</li><li><strong>Push delivery</strong> is low latency but risky for slow consumers.</li><li><strong>Pull delivery</strong> is safer under load but needs long polling.</li><li><strong>State storage</strong> tracks offsets and ownership.</li><li><strong>Metadata storage</strong> defines topics, partitions, and replicas.</li><li><strong>Replication</strong> keeps data available when brokers fail.</li><li><strong>Acknowledgements (acks)</strong> trade latency for durability.</li><li><strong>Partitions</strong>, not consumer count, usually limit throughput.</li><li><strong>At-most-once</strong> allows loss, <strong>at-least-once</strong> allows duplicates, <strong>exactly-once</strong> is expensive.</li></ul><h3>Mental Model You Can Picture</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/870/1*0d-1qHHfmDHykExL3VLifw.jpeg" /><figcaption>Topics, partitions, brokers and consumer group</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/922/1*8d9YauuhfUFy6OwT6yJ4sA.png" /><figcaption>Message Data Structure</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/332/1*ECcZG5pz2L4rCiMfwqiK0Q.png" /><figcaption>Producer Flow: <strong>Producer-Side Routing with Buffering &amp; Batching (Industry Standard)</strong></figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/822/1*5CbknRK6WMW1J_fb9cjXuw.jpeg" /><figcaption>New consumer Joining the consumer group</figcaption></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/810/1*RmoGMVBwnSqHtmVJml_acw.png" /><figcaption>Centralized coordination-based State storage and metadata storage (e.g., ZooKeeper)</figcaption></figure><h3>Message Queues vs Event Streams: Real Trade-offs</h3><h4>Traditional Message Queues</h4><p>Optimized for <strong>task execution</strong>:</p><ul><li>messages disappear after consumption</li><li>retention is short</li><li>storage needs are small</li><li>global ordering is not guaranteed</li></ul><p><strong>Common misunderstandings</strong></p><ul><li>❌ “Queues are just smaller Kafka”<br>✅ Queues execute work; they don’t preserve history</li><li>❌ “Ordering is guaranteed”<br>✅ Ordering breaks with retries or multiple consumers</li></ul><h4>Event Streaming Platforms</h4><p>Optimized for <strong>event history</strong>:</p><ul><li>events are immutable</li><li>retention is configurable</li><li>consumers can replay independently</li><li>ordering is guaranteed <em>within partitions</em></li></ul><p><strong>Common misunderstandings</strong></p><ul><li>❌ “Event streaming is always better”<br>✅ It’s better for events, not tasks</li><li>❌ “More partitions always increase throughput”<br>✅ Coordination and leader placement still cap performance</li></ul><h3>P2P vs Pub/Sub (Quick Intuition)</h3><ul><li><strong>Point-to-point</strong> distributes <em>work</em></li><li><strong>Publish-subscribe</strong> distributes <em>information</em></li></ul><p><strong>Misunderstandings</strong></p><ul><li>❌ “Add consumers to scale forever”<br>✅ Throughput is capped by brokers and partitions</li><li>❌ “Pub-sub is always more scalable”<br>✅ It scales distribution, not processing speed</li></ul><h3>Topics, Partitions, Brokers, Consumer Groups (And How They Fail)</h3><p><strong>Definitions</strong></p><ul><li>Topics hold data</li><li>Partitions enable parallelism</li><li>Brokers store partitions</li><li>Consumer groups split work</li></ul><p><strong>Failure modes</strong></p><ul><li>Topic explosion with unclear ownership</li><li>Hot partitions from bad key choice</li><li>Broker overload from uneven leader placement</li><li>Rebalance storms from frequent consumer churn</li></ul><p><strong>Reality check</strong></p><ul><li>Topics are cheap logically, expensive operationally</li><li>Partitions are a scalability tool, not free performance</li><li>Adding consumers doesn’t beat partition limits</li></ul><h3>Storage: Why “Just Use a Database” Breaks</h3><p>Queues stress storage very differently from OLTP systems.</p><ul><li><strong>SQL queues</strong> fail due to locks, deletes, and index maintenance</li><li><strong>LSM stores</strong> suffer from compaction storms and read amplification</li><li><strong>Append-only logs</strong> scale well but bottleneck on hot leaders</li></ul><p><strong>Common myths</strong></p><ul><li>❌ “Databases are durable so they’re good queues”<br>✅ Durability ≠ streaming throughput</li><li>❌ “Indexes make consumption fast”<br>✅ Index maintenance kills write performance</li></ul><h3>Producer Flow: Why Batching Is Non-Negotiable</h3><h4>External routing layers</h4><p>Add latency and create a new single point of failure.</p><h4>Producer-side routing without batching</h4><p>Too many small requests → throughput collapses.</p><h4>Producer-side routing with buffering and batching (standard)</h4><ul><li>cache metadata</li><li>choose partition</li><li>buffer messages</li><li>send batches to leaders</li></ul><p><strong>Failure modes</strong></p><ul><li>batches too large → latency spikes</li><li>batches too small → throughput drops</li><li>backpressure → memory pressure</li><li>stale metadata → retries</li></ul><p><strong>Key insight:</strong> producers are <strong>stateful systems</strong>, not thin clients.</p><h3>Consumer Flow: Where Systems Stall</h3><ul><li>Slow consumers lag far behind producers</li><li>Push delivery overwhelms slow consumers</li><li>Pull delivery needs long polling</li><li>Rebalancing pauses consumption</li><li>Failure detection depends on heartbeat timeouts</li></ul><p><strong>Misunderstanding</strong></p><ul><li>❌ “Rebalancing is seamless”<br>✅ Rebalancing always pauses progress</li></ul><h3>Coordination (ZooKeeper-Style): Powerful but Fragile</h3><p>Centralized coordination tracks:</p><ul><li>consumer group membership</li><li>partition ownership</li><li>leader election</li></ul><p><strong>Where it fails</strong></p><ul><li>high consumer churn</li><li>frequent offset commits</li><li>metadata becoming too dynamic</li></ul><p><strong>Key distinction</strong></p><ul><li>State = offsets and ownership (hot, write-heavy)</li><li>Metadata = cluster configuration (cold, consistency-critical)</li></ul><h3>Replication &amp; ACKs: Speed vs Safety</h3><ul><li>Leaders handle all writes</li><li>Followers replicate asynchronously</li><li>ISR shrinks under slow replicas</li></ul><p><strong>ACK trade-offs</strong></p><ul><li>ACK=0 → fastest, unsafe</li><li>ACK=1 → fast, can lose data</li><li>ACK=all → safest, higher latency</li></ul><p><strong>Reality</strong><br>Replication improves availability, not throughput.</p><h3>Scalability: What Actually Limits You</h3><ul><li>Producers scale until leaders melt</li><li>Consumers scale until partitions cap parallelism</li><li>Brokers scale until leader placement becomes skewed</li></ul><p><strong>One-line takeaway</strong><br>Scalability is constrained by partitions, leaders, and controlled replica movement — not by instance count.</p><h3>Delivery Semantics (Why Guarantees Lie)</h3><ul><li><strong>At-most-once</strong>: data loss is normal</li><li><strong>At-least-once</strong>: duplicates are normal</li><li><strong>Exactly-once</strong>: expensive and limited</li></ul><p><strong>Truth</strong><br>Delivery guarantees are <strong>end-to-end</strong>, and the weakest dependency defines correctness.</p><h3>Real-World Example: Payments &amp; Orders at Scale</h3><p>In a commerce platform, message queues decouple checkout, payments, inventory, and notifications so user requests stay fast. When an order is placed, checkout publishes a PaymentRequested message. A payment service processes it asynchronously and emits success or failure events. Failed messages go to a retry queue with delayed processing so transient gateway issues don’t block new orders. Idempotency keys and safe offset commits enforce correctness because queues alone can’t protect against external side effects. Event streams retain immutable payment events for replay, reconciliation, and debugging.</p><p><strong>Takeaway:</strong> queues execute work, streams preserve history, and real systems use both.</p><h3>Interview-Grade Takeaways</h3><ul><li>Messaging protocols define how production, consumption, retries, and heartbeats work under failure</li><li>Retry queues prevent failures from blocking progress</li><li>Historical replay requires external archival once retention expires</li></ul><h3>Final One-Line Summary</h3><p><strong>Distributed message queues are not about moving data — they are about isolating failure, controlling load, and choosing which parts of your system are allowed to wait.</strong></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=2d7e8b015bc0" width="1" height="1" alt=""><hr><p><a href="https://medium.com/techtrends-digest/distributed-message-queues-what-actually-matters-in-production-2d7e8b015bc0">Distributed Message Queues — What Actually Matters in Production</a> was originally published in <a href="https://medium.com/techtrends-digest">Coffee☕ And Code💚</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[SCALE FROM ZERO TO MILLIONS OF USERS]]></title>
            <link>https://medium.com/@das.sudeept/scale-from-zero-to-millions-of-users-a7d4873ddd17?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/a7d4873ddd17</guid>
            <category><![CDATA[design-systems]]></category>
            <category><![CDATA[systems-thinking]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[interview]]></category>
            <category><![CDATA[software-engineering]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Sun, 18 Jan 2026 22:45:23 GMT</pubDate>
            <atom:updated>2026-01-18T22:45:23.896Z</atom:updated>
            <content:encoded><![CDATA[<h3>How to scale systems from zero to millions of users?</h3><ul><li>In the real world, a system is built that supports a few users and is gradually scaled up to serve millions of users.</li><li>What breaks at scale? Single-Server Bottlenecks, Database Limitations, Monolithic Design, State Management Issues, Load Balancing Failures, Cache Inconsistencies, Background Jobs &amp; Queues, Logging, Monitoring &amp; Alerting, Data Center &amp; Network Constraints and Content Delivery.</li></ul><h3>Core ideas:</h3><ul><li>Multiple servers: Segregating a single server into multiple servers web/mobile traffic servers, Database servers, backend job servers, etc.</li><li>Database(SQL/NoSQL) choice</li><li>Database scaling: Vertical Scaling(more power: CPU/memory) vs Horizontal Scaling (increase instances)</li><li>Load balancer: balancing traffic</li><li>Database Replication: Master/Slave handling and election</li><li>Cache: expensive or frequently used resource volatile storage</li><li>Content Delivery Network(CDN): Static(video, images, css, JS files, etc.) content storage</li><li>Stateless/Stateful architecture: Sticky user session server or independent servers</li><li>Data centers: Multiple available data centers for availability</li><li>Message Queue: For supporting asynchronous communication using pub/sub model</li><li>Logging, metrics, automation</li></ul><h3>One mental model / diagram (textual)</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/972/1*ejRXixOjOREKJtW7SPI34Q.png" /></figure><h3>Trade-offs &amp; failure modes</h3><ol><li><strong>Multiple servers:</strong> <br>Separating web/mobile traffic (web tier) and database (data tier) servers allows them to be scaled independently.</li><li><strong>SQL</strong>(RDMS- MySQL, Oracle DB, PostgreSQL, etc.): <br>Stores data in tables and rows. You can perform join operations using SQL across different DB tables.</li><li><strong>NoSQL:<br>- </strong>4 Categories:<br>key-value stores: CouchDB, MongoDB,etc. ; <br>graph stores: Neo4j, etc. ; <br>column stores: Cassandra,HBase,etc.;<br>document stores: Amazon DynamoDB, Redis, etc.;<br>Non-relational DBs might be the right choice if:<br>• Your application requires super-low latency.<br>• Your data are unstructured, or you do not have any relational data.<br>• You only need to serialize and deserialize data (JSON, XML, YAML, etc.).<br>• You need to store a massive amount of data.</li><li><strong>Vertical Vs Horizontal Scaling:</strong><br>When traffic is low but computations for single request is high, <strong>vertical scaling</strong> is a great option, and the simplicity of vertical scaling is its main advantage. Unfortunately, it comes with serious limitations.<br>• Vertical scaling has a hard limit. It is impossible to add unlimited CPU and memory to a single server.<br>• Vertical scaling does not have failover and handle redundancy. If one server goes down, the website/app goes down with it completely.<br><strong>Horizontal scaling</strong>(also known as sharding) is more desirable for large scale applications due to the limitations of vertical scaling.<br>The most important factor to consider when implementing a sharding strategy is the choice of the sharding key that evenly distributes data. This sharding key may include multiple columns to determine data distribution.<br>- It introduces complexities and new challenges to the system: <strong>Resharding data</strong>(due to uneven data distribution), <strong>Celebrity Problem</strong>(One shard with requests), <strong>Join and de-normalization(</strong>join operations become difficult after sharding)</li><li><strong>Load Balancer: <br></strong>The load balancer communicates with servers through private IPs.<br>• If one server instance goes offline, all the traffic will be routed to another server instance(pod). This prevents the website from going offline. We also need to add a new healthy web server to the server pool to balance the load can be managed using autoscaling(or any other approaches based on requirement) where manual effort will not be required.<br>• If the website traffic grows rapidly, and two server instances (pod) are not enough to handle the traffic, the load balancer can handle this problem gracefully. You only need to add more servers using autoscaling to the web server pool and the load balancer routes requests automatically.</li><li><strong>Database Replication(Master/Slave):<br></strong>• If only one slave DB instance is available and it goes offline, read operations will be directed to the master DB instance temporarily. As soon as the issue is found, a new slave DB instance will replace the old one. In case multiple slave DB instances are available, read operations are redirected to other healthy slave DB instances.<br>• If the master DB instance goes offline, a slave DB instance will be promoted to be the new master. All the DB operations will be temporarily executed on the new master DB instance. A new slave DB instance will replace the old one for data replication immediately. In production systems, promoting a new master is more complicated as the data in a slave DB might not be up to date. For this there are various approaches to solve it based on the requirement and system.</li><li><strong>Cache:<br>- </strong>Consider using cache when data is read frequently but modified infrequently. Since cache is volatile, losing data on restart is expected. So main data storage should be in a persisted DB.<br>- Expiration policy(TTL): When to expire the data is important because too long will make it stale and too short will make reloading the data too frequently.<br>- Consistency: sync between cache and the data store.<br>- Mitigating failures: A single point of failure is a part of system. To mitigate it the data should be replicated across multiple regions. Another approach is to overprovision the required resources and have an alert on certain limit. <br>- Eviction Policy: Once cache is full the system needs to decide which data to remove on new data addition. Common eviction policies are LRU, FIFO, LFU, etc.</li><li><strong>CDN(Content Delivery Network):<br></strong>• Cost: CDNs are run by third-party providers, and you are charged for data transfers in and out of the CDN. So you should consider moving infrequently used assets out of the CDN.<br>• Setting an appropriate cache expiry: same reason as Expiration policy(TTL) of cache.<br>• CDN fallback: For a temporary CDN outage, website should be able to detect the problem and request resources from the origin.<br>• Invalidating files: Remove a file from the CDN before it expires by performing one of the following operations:<br>- Invalidate the CDN object using APIs provided by CDN vendors.<br>- Use object versioning to serve a different version of the object.</li><li><strong>Stateless/Stateful architecture:<br>- </strong>Stateful architecture: The server remembers the client state/data from one request to another. Usually controlled with sticky sessions in load balancer. Which results in adding complexity to handle state/data in a single server(pod) and handle failures.<br>- Stateless architecture: State data is stored in a shared data store and kept out of web servers(pods). Resulting simplicity , more robust and easy to scale(using auto scaling).</li><li><strong>Data Centers:<br></strong>• Traffic redirection: Proper redirection is needed to direct traffic to the correct data center for the user based on the nearest Data center available.<br>• Data synchronization: In failover cases, traffic might be routed to a data center where data is unavailable. In such cases , a common strategy is to replicate data across multiple data centers. <br>• Test and deployment: With multi-data center setup, it is important to test your website/application at different locations. Automated deployment tools are vital to keep services consistent through all the data centers.</li><li><strong>Message queue: <br>-</strong> With message queues, the producer can post the message to the queue even when consumer is unavailable. And the consumer can consume the message even when producer is unavailable.</li><li><strong>Logging,Metrics,Automation<br>- </strong>Logging: Monitoring error logs is important as it helps to identify errors and problems in the system. You can monitor error logs at per server level or use tools to aggregate them to a centralized service for easy search and viewing.<br>- Metrics: Collecting different types of system metrics help us to gain business insights and understand the health status of the system. Some of the following metrics are useful:<br>• Host level metrics: CPU, Memory, disk I/O, etc.<br>• Aggregated level metrics: The performance of the entire Database tier, cache tier, etc.<br>• Key business metrics: daily active users, retention, revenue, etc.<br>- Automation: When a system gets big and complex, we need to build or leverage automation tools to improve developer’s productivity. Continuous integration is a good practice, in which each code check-in is verified through automation(SLTs), allowing teams to detect problems early.</li></ol><h3>Real-world system example</h3><p><strong>Context:</strong> Users abandoned checkout due to slow multi-option payment processing.<br> <strong>Scale Challenge:</strong> Support peak traffic while maintaining low latency.</p><p><strong>Architecture Solutions:</strong></p><ul><li>Microservices and stateless architecture: Separate services for payments, checkout, and order management.</li><li>Load Balancer: Routes requests efficiently across service instances.</li><li>Cache Layer: Session/checkout data cached for fast access.</li><li>Monitoring &amp; Alerts: Metrics and alerts detect issues early and trigger retries</li></ul><p><strong>Impact:</strong></p><ul><li>Reduced latency and increased conversion rate.</li><li>System scaled easily during traffic spikes.</li></ul><h3>One interview-grade takeaway</h3><p>To scale our system to support millions of users:<br>• Keep web tier stateless<br>• Build redundancy at every tier<br>• Cache data as much as you can<br>• Support multiple data centers<br>• Host static assets in CDN<br>• Scale your data tier by sharding<br>• Split tiers into individual services<br>• Monitor your system and use automation tools</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=a7d4873ddd17" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building a Distributed Sequence Generator Using DynamoDB (UKey Pattern)]]></title>
            <link>https://medium.com/techtrends-digest/building-a-distributed-sequence-generator-using-dynamodb-ukey-pattern-ef1b7ceff2b4?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/ef1b7ceff2b4</guid>
            <category><![CDATA[database-scalability]]></category>
            <category><![CDATA[software-engineering]]></category>
            <category><![CDATA[uniqueness]]></category>
            <category><![CDATA[dynamodb]]></category>
            <category><![CDATA[software-development]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Thu, 18 Dec 2025 14:02:20 GMT</pubDate>
            <atom:updated>2026-03-15T09:46:58.251Z</atom:updated>
            <content:encoded><![CDATA[<p>Generating <strong>unique, incremental IDs</strong> in a distributed system sounds trivial — until you actually need to do it at scale, without a single database or leader. Traditional auto-increment columns don’t work well when multiple services need IDs independently, and UUIDs, while convenient, often fail business requirements around ordering or readability.</p><p>In this article, we’ll walk through a <strong>DynamoDB-backed UKey (Unique Key) pattern</strong> — a <strong>distributed, per-key, monotonic sequence generator</strong> that provides database-like sequence behavior in a highly available environment.</p><h3>The Problem</h3><p>Many systems need IDs that are:</p><ul><li><strong>Unique</strong></li><li><strong>Incremental</strong></li><li><strong>Ordered per business entity</strong></li><li><strong>Safe under concurrency</strong></li><li><strong>Available across services</strong></li></ul><p>Examples:</p><ul><li>Product IDs per region</li><li>Order numbers per marketplace</li><li>Invoice numbers per seller</li></ul><p>UUIDs don’t provide ordering.<br>Snowflake-style IDs add complexity.<br>Database sequences don’t scale across services.</p><p>So how do we build something <strong>simple, reliable, and distributed</strong>?</p><h3>The Idea: UKey as a Distributed Counter</h3><p>At its core, UKey is:</p><blockquote><em>A </em><strong><em>DynamoDB-backed, per-key atomic counter</em></strong><em> that returns a monotonically increasing number on every request.</em></blockquote><p>Each logical “sequence” is identified by a <strong>string key</strong> (for example, product_id or order_id).</p><h3>High-Level Architecture</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/556/1*b0cdCaXzeMpiFIyUR_FBtQ.png" /></figure><ul><li>Multiple services can request IDs concurrently.</li><li>DynamoDB acts as the <strong>source of truth</strong>.</li><li>No leader election, no locks, no coordination service.</li></ul><h3>Data Model</h3><p><strong>Table: </strong><strong>ukey_counters</strong></p><p>AttributeTypeDescriptionkey (PK)StringSequence identifierlast_valueNumberLast issued IDupdated_atNumberTimestamp (optional)</p><p>Example item:</p><pre>{<br>  &quot;key&quot;: &quot;product_id&quot;,<br>  &quot;last_value&quot;: 12894,<br>  &quot;updated_at&quot;: 1702900000<br>}</pre><p>Each row represents <strong>one independent sequence</strong>.</p><h3>How getNext(key) Works</h3><p>The magic lies in <strong>DynamoDB’s atomic updates</strong>.</p><h3>Step-by-Step Flow</h3><ol><li>Client calls getNext(&quot;product_id&quot;)</li><li>UKey client issues a DynamoDB UpdateItem</li><li>DynamoDB atomically increments last_value</li><li>Updated value is returned to the client</li></ol><h3>DynamoDB Operation</h3><pre>UPDATE ukey_counters<br>SET last_value = if_not_exists(last_value, 0) + 1<br>WHERE key = :key<br>RETURNING UPDATED_NEW</pre><p>In DynamoDB terms:</p><pre>UpdateItem<br>ADD last_value :inc<br>ReturnValues = UPDATED_NEW</pre><p>This guarantees:</p><ul><li>No duplicates</li><li>Correct ordering per key</li><li>Safe concurrency</li></ul><h3>Why This Works</h3><h3>Atomicity</h3><p>DynamoDB guarantees atomic updates at the <strong>item level</strong>. Even with 100 concurrent callers, each increment is serialized correctly.</p><h3>Consistency</h3><p>Each getNext() returns a <strong>unique, strictly increasing number</strong> for that key.</p><h3>Availability</h3><p>DynamoDB is fully managed and highly available — no single point of failure.</p><h3>Concurrency and Safety</h3><ul><li>Multiple clients can call getNext() simultaneously</li><li>No distributed locks required</li><li>No race conditions</li><li>No leader election</li></ul><p>This makes UKey ideal for <strong>multi-service architectures</strong>.</p><h3>Performance Characteristics</h3><p>AspectBehaviorLatencySingle DynamoDB write (~milliseconds)ThroughputScales with number of keysOrderingGuaranteed per keyBottleneckHot key under heavy traffic</p><h3>Limitations and Trade-offs</h3><h3>Hot Key Problem</h3><p>If a single key is hit very frequently, it can become a write hotspot.</p><h3>Write Latency</h3><p>Each ID generation requires a DynamoDB write.</p><h3>Not Globally Ordered</h3><p>Ordering is guaranteed <strong>per key</strong>, not across all keys.</p><h3>Optimizations for Scale</h3><h3>1. Range Allocation</h3><p>Instead of incrementing by 1:</p><ul><li>Allocate ranges (e.g., +100)</li><li>Cache locally</li><li>Reduce DynamoDB calls by 100x</li></ul><h3>2. Sharded Counters</h3><p>Split one logical key into multiple shards:</p><pre>product_id#1<br>product_id#2<br>product_id#3</pre><h3>3. In-Memory Buffering</h3><p>Keep the next N IDs in memory and refill asynchronously.</p><h3>Security &amp; Access Control</h3><ul><li>Restrict IAM permissions to:</li><li>UpdateItem only</li><li>No read access required</li><li>No public exposure</li></ul><h3>How It Compares to Other ID Strategies</h3><p>StrategyProsConsUUIDStatelessNo orderingSnowflakeHigh throughputMore complexityDB SequenceSimpleCentralized DB<strong>DynamoDB UKey</strong>Distributed, orderedHot key risk</p><h3>When to Use UKey</h3><p>✅ Business IDs<br>✅ Human-readable sequences<br>✅ Per-entity ordering required<br>❌ Ultra-high-throughput global IDs<br>❌ Security-sensitive public identifiers</p><h3>Final Thoughts</h3><p>The DynamoDB UKey pattern is a <strong>clean, reliable way to generate sequential IDs in a distributed system</strong>. By leveraging DynamoDB’s atomic counters, you get correctness, availability, and simplicity — without introducing heavy coordination or complex ID schemes.</p><p>If your system needs <strong>ordered, unique identifiers across services</strong>, UKey is a practical and battle-tested approach.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=ef1b7ceff2b4" width="1" height="1" alt=""><hr><p><a href="https://medium.com/techtrends-digest/building-a-distributed-sequence-generator-using-dynamodb-ukey-pattern-ef1b7ceff2b4">Building a Distributed Sequence Generator Using DynamoDB (UKey Pattern)</a> was originally published in <a href="https://medium.com/techtrends-digest">Coffee☕ And Code💚</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ Mastering Divide & Conquer: Different Ways to Add Parentheses (LeetCode 241)]]></title>
            <link>https://medium.com/techtrends-digest/mastering-divide-conquer-different-ways-to-add-parentheses-leetcode-241-6f34a0295d79?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/6f34a0295d79</guid>
            <category><![CDATA[interview]]></category>
            <category><![CDATA[algorithms]]></category>
            <category><![CDATA[data-structures]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[leetcode]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Wed, 29 Oct 2025 22:08:37 GMT</pubDate>
            <atom:updated>2026-03-15T09:46:55.786Z</atom:updated>
            <content:encoded><![CDATA[<blockquote><em>“It’s not about computing one result — it’s about exploring </em>every possible world<em> that parentheses can create.”</em></blockquote><h3>🎯 Introduction</h3><p>Imagine you’re given a math expression like &quot;2*3-4*5&quot;.</p><p>Now, what if you could parenthesize it in <strong>every possible way</strong> — and compute <em>all</em> the results?</p><p>That’s the challenge behind <strong>LeetCode 241: Different Ways to Add Parentheses.</strong></p><p>At first glance, this looks like a pure brute-force problem. But if you approach it smartly — with <strong>divide and conquer + memoization</strong> — it turns into one of the most elegant recursive problems on LeetCode.</p><h3>💡 Problem Statement</h3><blockquote><em>Given a string expression of numbers and operators (</em><em>+, </em><em>-, </em><em>*),<br> return </em><strong><em>all possible results</em></strong><em> from computing all different ways to group numbers and operators.</em></blockquote><p>Example:</p><pre>Input: &quot;2*3-4*5&quot;<br>Output: [-34, -14, -10, -10, 10]</pre><h3>🧩 Intuition</h3><p>Every operator (+, -, *) is a <strong>potential split point</strong>.<br> If we split the expression at that operator:</p><ul><li>The <strong>left</strong> side becomes one subexpression.</li><li>The <strong>right</strong> side becomes another.</li></ul><p>We can recursively compute all possible results for the left and right sides, and then combine them.</p><p>The only problem?<br> You’ll compute the same subexpressions over and over again.</p><p>That’s where <strong>memoization</strong> saves the day.</p><h3>⚙️ Approach Breakdown</h3><p>Let’s go step-by-step 👇</p><h3>1. Parse the Expression</h3><p>Instead of repeatedly slicing the string inside recursion (which is expensive),<br> we <strong>preprocess</strong> the expression into two lists:</p><ul><li>numbers: all numeric values</li><li>operators: all operation symbols</li></ul><p>Example:</p><pre>Expression: &quot;2*3-4*5&quot;<br>→ numbers = [2, 3, 4, 5]<br>→ operators = [&#39;*&#39;, &#39;-&#39;, &#39;*&#39;]</pre><h3>2. Recursive Function with Memoization</h3><p>Define a recursive function:</p><pre>computeRecursive(l, r)</pre><ul><li>It computes all possible results from numbers in range [l, r].</li><li>Base case: if l == r, just return that number.</li><li>Recursive case:<br> For every operator between l and r, split, compute left/right results, and combine.</li></ul><p>We use a <strong>HashMap memo</strong> to store results for each (l, r) range — so repeated subproblems are instantly reused.</p><h3>3️⃣ Combine Step (Divide &amp; Conquer in Action)</h3><p>For each operator between l and r:</p><ul><li>Compute all possible results of the left subexpression.</li><li>Compute all possible results of the right subexpression.</li><li>Combine each pair using the operator.</li></ul><p>Example:</p><pre>Left = [2, 6]<br>Right = [3, 5]<br>Operator = &#39;-&#39;</pre><pre>→ Combine all:<br>[2-3, 2-5, 6-3, 6-5]</pre><h3>💻 Code Implementation</h3><pre>import java.util.*;</pre><pre>class Solution {<br>    private Map&lt;String, List&lt;Integer&gt;&gt; memo;<br>    private List&lt;Integer&gt; numbers;<br>    private List&lt;Character&gt; operators;</pre><pre>    public List&lt;Integer&gt; diffWaysToCompute(String expression) {<br>        parseExpression(expression);<br>        memo = new HashMap&lt;&gt;();<br>        return computeRecursive(0, numbers.size() - 1);<br>    }</pre><pre>    private void parseExpression(String expression) {<br>        numbers = new ArrayList&lt;&gt;();<br>        operators = new ArrayList&lt;&gt;();<br>        int i = 0, n = expression.length();<br>        while (i &lt; n) {<br>            int start = i;<br>            while (i &lt; n &amp;&amp; Character.isDigit(expression.charAt(i))) i++;<br>            numbers.add(Integer.parseInt(expression.substring(start, i)));<br>            if (i &lt; n) operators.add(expression.charAt(i++));<br>        }<br>    }</pre><pre>    private List&lt;Integer&gt; computeRecursive(int l, int r) {<br>        String key = l + &quot;,&quot; + r;<br>        if (memo.containsKey(key)) return memo.get(key);<br>        if (l == r) return List.of(numbers.get(l));</pre><pre>        List&lt;Integer&gt; result = new ArrayList&lt;&gt;();<br>        for (int i = l; i &lt; r; i++) {<br>            List&lt;Integer&gt; leftResults = computeRecursive(l, i);<br>            List&lt;Integer&gt; rightResults = computeRecursive(i + 1, r);<br>            char op = operators.get(i);<br>            for (int a : leftResults)<br>                for (int b : rightResults)<br>                    result.add(evaluate(a, b, op));<br>        }<br>        memo.put(key, result);<br>        return result;<br>    }</pre><pre>    private int evaluate(int a, int b, char op) {<br>        return switch (op) {<br>            case &#39;+&#39; -&gt; a + b;<br>            case &#39;-&#39; -&gt; a - b;<br>            case &#39;*&#39; -&gt; a * b;<br>            default -&gt; 0;<br>        };<br>    }<br>}</pre><h3>🧮 Example Walkthrough</h3><p>Expression: &quot;2*3-4*5&quot;</p><ol><li>Split at first *:</li></ol><ul><li>Left = &quot;2&quot;</li><li>Right = &quot;3-4*5&quot;</li></ul><p>2. Split right at -:</p><ul><li>Left = &quot;3&quot;</li><li>Right = &quot;4*5&quot;</li></ul><p>3. Combine all possibilities recursively.</p><p>Possible results:</p><pre>(2*(3-(4*5))) = -34  <br>((2*(3-4))*5) = -10  <br>((2*3)-(4*5)) = -14  <br>(2*(3-4))*5 = -10  <br>(((2*3)-4)*5) = 10</pre><p>Output:<br> [-34, -14, -10, -10, 10]</p><h3>⏱️ Time &amp; Space Complexity</h3><p>Complexity Explanation <strong>Time</strong> Exponential in worst case (O(Catalan(n))), but memoization cuts redundant recomputation. <strong>Space</strong> O(n²) for memoization + recursion stack.</p><h3>🚀 Key Takeaways</h3><ul><li>Use <strong>divide and conquer</strong> to explore all groupings.</li><li><strong>Memoization</strong> turns exponential recursion into something manageable.</li><li><strong>Parsing once</strong> saves time and keeps recursion clean.</li><li>Problems like this teach how <strong>expression trees</strong> and <strong>dynamic programming on intervals</strong> work.</li></ul><h3>🧭 Final Thoughts</h3><p>“Different Ways to Add Parentheses” is more than just a recursion exercise — <br> it’s a masterclass in <strong>breaking problems into subproblems</strong>, caching results, and combining answers intelligently.</p><p>If you can intuitively trace this recursion, you’ve already leveled up your divide-and-conquer skills.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6f34a0295d79" width="1" height="1" alt=""><hr><p><a href="https://medium.com/techtrends-digest/mastering-divide-conquer-different-ways-to-add-parentheses-leetcode-241-6f34a0295d79">🧠 Mastering Divide &amp; Conquer: Different Ways to Add Parentheses (LeetCode 241)</a> was originally published in <a href="https://medium.com/techtrends-digest">Coffee☕ And Code💚</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[VPC (Virtual Private Cloud): Your Private Highway in the Cloud]]></title>
            <link>https://medium.com/@das.sudeept/vpc-virtual-private-cloud-your-private-highway-in-the-cloud-be8b72e0e7ba?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/be8b72e0e7ba</guid>
            <category><![CDATA[system-design-interview]]></category>
            <category><![CDATA[vpc]]></category>
            <category><![CDATA[software-engineering]]></category>
            <category><![CDATA[software-architecture]]></category>
            <category><![CDATA[software-development]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Sun, 18 May 2025 04:55:07 GMT</pubDate>
            <atom:updated>2025-05-18T06:26:42.591Z</atom:updated>
            <content:encoded><![CDATA[<p>In today’s cloud-native world, secure and scalable networking is not a luxury — it’s a necessity. And at the heart of this lies one powerful construct: the <strong>VPC (Virtual Private Cloud)</strong>.<br>But what exactly is a VPC? Why is it important? And how do modern cloud architects leverage it to build secure, isolated, and scalable infrastructures?Let’s dive in.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*OvlCErCcmNgjDb_PdoZFTw.png" /></figure><h3>What is a VPC?</h3><p>A <strong>Virtual Private Cloud</strong> is your <strong>isolated slice of the cloud provider’s network</strong>, where you can define and control your virtual network — just like you would in a traditional data center.</p><p>Imagine it as your <strong>private highway system in the public cloud</strong>, where you choose who can drive, which lanes they take, and what speed limits apply.</p><h3>Key Highlights:</h3><ul><li><strong>Network Isolation</strong>: Resources in your VPC are isolated from other tenants.</li><li><strong>Custom IP Ranges</strong>: You define your own IP address space (e.g., 10.0.0.0/16).</li><li><strong>Subnets</strong>: Divide your VPC into public and private zones.</li><li><strong>Routing Rules</strong>: Full control over traffic flow using route tables.</li><li><strong>Security</strong>: Built-in firewalls (security groups, NACLs) protect your resources.</li></ul><h3>VPC Core Components</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*XKILCT-ND_Tit2o8IT-K8g.png" /></figure><h3>Public vs. Private Subnets</h3><ul><li><strong>Public Subnet</strong>: Connected to Internet Gateway. Typically contains load balancers, bastion hosts.</li><li><strong>Private Subnet</strong>: No direct internet access. Hosts backend services, databases, application servers.</li></ul><h3>🛠️ VPC Design Best Practices</h3><ul><li>Use multiple Availability Zones (AZs) for HA and fault tolerance.</li><li>Separate public and private subnets.</li><li>Use NAT Gateway for secure internet access in private subnets.</li><li>Restrict security groups and NACLs by least privilege principle.</li><li>Use VPC Flow Logs and logging services for observability.</li></ul><h3>️ VPC Use Cases</h3><h3>1. Web Application Architecture</h3><ul><li>Public subnet: Load balancer, web servers</li><li>Private subnet: Application servers, databases</li><li>NAT Gateway: Allows app servers to fetch updates</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*m8Ix0CR14c5Ut0kBq4tmLA.png" /></figure><h3>2. Multi-Tier Applications</h3><ul><li>Isolate layers (presentation, business, DB) in separate subnets</li><li>Use security groups to control who can talk to whom</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*rpgF84-Aw4y7WnevoUOtXg.png" /></figure><h3>3. Hybrid Cloud</h3><ul><li>Connect on-premise data centers using <strong>VPN or Direct Connect</strong></li><li>Extend internal services securely into the cloud</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ytQeLWryjqYYFDphvGkcTQ.png" /></figure><h3>Security in a VPC</h3><p>Cloud providers (AWS, GCP, Azure) give robust tools to secure traffic:</p><ul><li><strong>Security Groups</strong>: Instance-level firewalls</li><li><strong>NACLs</strong>: Subnet-level rules for granular access</li><li><strong>PrivateLink &amp; Endpoint Services</strong>: Secure private access to services (no internet)</li></ul><p><strong>Pro Tip</strong>: Always use private subnets for sensitive resources like databases.</p><h3>VPC Connectivity Models</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/940/1*5Xs33Wh03pYJfJKmGtL2jg.png" /></figure><h3>🧪 Hands-On Example: AWS VPC</h3><p>Here’s a basic 3-tier VPC setup on AWS:</p><pre>VPC (10.0.0.0/16)<br>├── Public Subnet (10.0.1.0/24)<br>│   └── Load Balancer<br>├── Private Subnet (10.0.2.0/24)<br>│   └── App Server<br>├── Private Subnet (10.0.3.0/24)<br>│   └── Database</pre><ul><li>IGW attached to VPC</li><li>NAT Gateway in public subnet</li><li>Security Groups and Route Tables define access</li></ul><h3>🚀 Benefits of VPC</h3><ul><li>✅ <strong>Control</strong> over networking — like in on-prem data centers</li><li>✅ <strong>Isolation</strong> from other cloud tenants</li><li>✅ <strong>Scalability</strong> — expand subnets, attach peering, use transit gateways</li><li>✅ <strong>Security</strong> — granular control over traffic, encryption, firewalls</li></ul><h3>Common Pitfalls to Avoid</h3><ul><li>Overlapping CIDRs across VPCs (hurts peering)</li><li>Open access in Security Groups (e.g., 0.0.0.0/0)</li><li>Public-facing databases</li><li>Not using flow logs for traffic visibility</li></ul><h3>Conclusion</h3><p>A <strong>VPC is the foundational layer of cloud networking</strong> — giving you full control over who can talk to whom, and how.</p><p>Whether you’re building a simple web app, a complex enterprise architecture, or connecting multiple regions, <strong>mastering VPCs is a must-have skill</strong> for any modern cloud engineer or architect.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=be8b72e0e7ba" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Load Balancers: The Silent Traffic Directors of the Web]]></title>
            <link>https://medium.com/techtrends-digest/load-balancers-the-silent-traffic-directors-of-the-web-6b6bd2edb0aa?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/6b6bd2edb0aa</guid>
            <category><![CDATA[software-architecture]]></category>
            <category><![CDATA[system-design-interview]]></category>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[interview]]></category>
            <category><![CDATA[infrastructure]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Sat, 17 May 2025 03:07:09 GMT</pubDate>
            <atom:updated>2026-03-15T09:46:57.216Z</atom:updated>
            <content:encoded><![CDATA[<p>Have you ever wondered how Netflix doesn’t crash even when millions binge at once? Or how Amazon handles a flurry of shoppers during Black Friday?<br> A big part of the answer lies in a silent, behind-the-scenes traffic director: the <strong>Load Balancer (LB)</strong>.</p><h3>What is a Load Balancer?</h3><p>A <strong>Load Balancer</strong> is like a smart traffic police for your apps. It <strong>distributes incoming network traffic across multiple servers</strong> to ensure no single server bears too much load.</p><p>Whether it’s a web request, database query, or API call — load balancers make sure traffic flows smoothly.</p><h3>Why Do We Need Load Balancers?</h3><p>Here’s what makes them essential:</p><ul><li><strong>High Availability</strong>: If one server goes down, traffic gets rerouted.</li><li><strong>Scalability</strong>: Add or remove servers without downtime.</li><li><strong>Improved Performance</strong>: Distributes workload evenly for faster responses.</li><li><strong>Security</strong>: Acts as a gatekeeper with features like SSL termination and DDoS protection.</li></ul><h3>Types of Load Balancers</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Cg1lgkCrnDY7EhYiK0TvYg.png" /></figure><h3>How Load Balancers Work</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*ApcEVvTIw6P9mXujmqBPsg.png" /></figure><ol><li><strong>Client sends a request</strong> (e.g., visit a website)</li><li><strong>Load balancer receives it and </strong>picks the best backend server (based on health, load, rules)</li><li><strong>Server processes the request</strong></li><li><strong>Load balancer sends back the response to the client</strong></li></ol><h3>Common Load Balancing Algorithms</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*Zn5iaJCHZv2AnFbQ41YKPg.png" /></figure><h3>Extra Perks of Modern Load Balancers</h3><ul><li><strong>Health Checks</strong>: Only routes to “healthy” servers</li><li><strong>SSL/TLS Termination</strong>: Decrypts traffic at the edge</li><li><strong>Sticky Sessions</strong>: Keeps a user on the same server</li><li><strong>Rate Limiting</strong>: Protects from abuse</li><li><strong>WebSockets Support</strong>: Enables real-time communication</li></ul><h3>Load Balancing in the Cloud</h3><p>Popular cloud-native solutions:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*TQngTBu0BMb6MH6aONpSVA.png" /></figure><p>In <strong>microservices</strong>, load balancers also operate <strong>inside</strong> the cluster (like <strong>Envoy</strong>, <strong>Istio</strong>, or <strong>Linkerd</strong>) for <strong>service-to-service</strong> traffic.</p><h3>Real-World Examples</h3><ul><li>Netflix uses <strong>Global Load Balancing</strong> to serve streams from the closest and fastest region.</li><li>Amazon uses <strong>Layer 7 LBs</strong> to route users to different services based on URLs.</li><li>Slack uses <strong>reverse proxies</strong> and load balancing to support real-time chat.</li></ul><h3>TL;DR</h3><p>A Load Balancer is your app’s <strong>traffic manager</strong>, ensuring uptime, speed, and scale.</p><p>Without them, modern web applications would crumble under real-world loads.</p><h3>Final Thoughts</h3><p>As systems grow more distributed, load balancers are evolving from simple round-robin routers into <strong>full-blown edge decision-makers</strong>.</p><p>Next time you visit your favorite site and everything “just works,” remember — somewhere in the background, a load balancer is working quietly to keep it that way.</p><p>Please add your thoughts on the article and load balancer.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=6b6bd2edb0aa" width="1" height="1" alt=""><hr><p><a href="https://medium.com/techtrends-digest/load-balancers-the-silent-traffic-directors-of-the-web-6b6bd2edb0aa">Load Balancers: The Silent Traffic Directors of the Web</a> was originally published in <a href="https://medium.com/techtrends-digest">Coffee☕ And Code💚</a> on Medium, where people are continuing the conversation by highlighting and responding to this story.</p>]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ The Ultimate Guide to Writing a Root Cause Analysis (RCA)]]></title>
            <link>https://medium.com/@das.sudeept/the-ultimate-guide-to-writing-a-root-cause-analysis-rca-b678d5236174?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/b678d5236174</guid>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[software-architecture]]></category>
            <category><![CDATA[software]]></category>
            <category><![CDATA[rca]]></category>
            <category><![CDATA[documentation]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Mon, 12 May 2025 20:16:19 GMT</pubDate>
            <atom:updated>2025-05-12T20:16:19.658Z</atom:updated>
            <content:encoded><![CDATA[<p>Whether you’re recovering from a system outage, a failed deployment, or a production bug, a good RCA is essential — not just to explain what went wrong, but to make sure it <em>doesn’t happen again</em>.</p><p>This guide breaks down the <strong>essential components of a strong RCA</strong>, with examples, templates, and best practices.</p><h3>✅ What is an RCA?</h3><p>A <strong>Root Cause Analysis (RCA)</strong> is a structured document that answers:</p><ul><li><strong>What happened?</strong></li><li><strong>Why did it happen?</strong></li><li><strong>What can we do to prevent it in the future?</strong></li></ul><p>An RCA is not about assigning blame. It’s about <strong>improving systems</strong> and <strong>learning as a team</strong>.</p><h3>✍️ When to Write an RCA?</h3><p>Use an RCA for:</p><ul><li>Production incidents or outages</li><li>Security or data breaches</li><li>Unexpected feature regressions</li><li>Cost or performance spikes</li><li>Any incident that had measurable user/business impact</li></ul><h3>🧱 Structure of an Effective RCA</h3><p>A well-written RCA usually includes these components:</p><ol><li><strong>Summary</strong></li><li><strong>Impact</strong></li><li><strong>Timeline of Events</strong></li><li><strong>Root Cause (5 Whys)</strong></li><li><strong>Learnings</strong></li><li><strong>Action Items</strong></li><li><strong>Appendix (Logs, Charts, Graphs)</strong></li></ol><h3>🪧 1. Summary (What happened?)</h3><p>Write a short, non-technical paragraph explaining:</p><ul><li>What went wrong</li><li>When it happened</li><li>How it was resolved</li></ul><p><strong>Example</strong>:</p><blockquote><em>On May 9th, a feature flag rollout caused the homepage and checkout to return 500 errors for 30% of users between 11:03 AM and 11:20 AM. The flag triggered a code path that overwhelmed an internal service, which had no retry or fallback. The issue was mitigated by rolling back the flag.</em></blockquote><h3>📉 2. Impact</h3><p>Explain the <strong>user and business impact</strong>:</p><ul><li>How many users were affected?</li><li>Was there data loss?</li><li>What was the revenue/latency/cost implication?</li></ul><p><strong>Example</strong>:</p><ul><li>500 errors for ~30% of traffic</li><li>Checkout failures for logged-in users</li><li>Revenue impact estimated at $18K</li><li>No data loss, no security exposure</li></ul><h3>⏱ 3. Timeline of Events</h3><p>Build a timeline with exact timestamps and key events.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*WIiC7NH2wSBHhYt1zBlmCw.png" /></figure><p>Tips:</p><ul><li>Stick to facts, not interpretations</li><li>Use logs, alert timestamps, and monitoring data</li></ul><h3>❓ 4. Root Cause Analysis (5 Whys)</h3><p>Use the <strong>5 Whys</strong> to move from symptoms to root causes.</p><p><strong>Tip</strong>: The 5 Whys depend on your ability to identify <em>systemic gaps</em>, <em>process failures</em>, or <em>human assumptions</em>.</p><h4>Example:</h4><ol><li><strong>Why</strong> were users seeing 500s?<br> → Because an internal API failed due to rate limits.</li><li><strong>Why</strong> did it fail?<br> → Because traffic surged from a flag rollout.</li><li><strong>Why</strong> wasn’t the API prepared for this load?<br> → Because it wasn’t load-tested for partial rollouts.</li><li><strong>Why</strong> wasn’t load testing tied to flag rollouts?<br> → Because we lacked a defined launch checklist.</li><li><strong>Why</strong> don’t we have a checklist?<br> → Because our deployment process doesn’t include flag readiness steps.</li></ol><blockquote><em>🎯 </em><strong><em>Root Cause</em></strong><em>: Missing rollout standards for feature flags and lack of capacity validation.</em></blockquote><h3>📚 5. Learnings</h3><p>Summarize the <em>insights</em> gained. Split into:</p><ul><li>What went wrong</li><li>What worked</li><li>What could have reduced the impact</li></ul><p><strong>Example</strong>:</p><p><strong>What went wrong:</strong></p><ul><li>Flag enabled at 50% without validating downstream impact</li><li>No fallback for API failure</li><li>Alerting was reactive, not proactive</li></ul><p><strong>What worked:</strong></p><ul><li>Rapid rollback mechanism</li><li>On-call rotation responded within SLA</li></ul><p><strong>What could have helped:</strong></p><ul><li>Canary rollout strategy</li><li>Monitoring for 429 errors</li><li>Feature flag guardrails</li></ul><h3>🔧 6. Action Items</h3><p>Your most important section — <strong>concrete steps</strong> to prevent recurrence.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*MfRWy5pPuHIIK9qJZi_Ekg.png" /></figure><p>Tips:</p><ul><li>Assign owners and dates</li><li>Track status (In Progress, Done, Blocked)</li></ul><h3>📎 7. Appendix (optional)</h3><p>Attach graphs, logs, screenshots of alerts, etc. This provides supporting context without cluttering the main document.</p><h3>🧠 Bonus: Best Practices</h3><ul><li><strong>Be honest.</strong> Don’t sugarcoat failures.</li><li><strong>Be blame-free.</strong> Focus on systems, not people.</li><li><strong>Write for the future.</strong> Someone should understand this in 6 months.</li><li><strong>Share widely.</strong> Transparency builds trust.</li></ul><h3>🏁 Final Thoughts</h3><p>A strong RCA is your team’s chance to turn a failure into a feature of your culture. When done right, it reduces repeat mistakes, creates systemic safeguards, and fosters a learning mindset.</p><p>Your future self — and your customers — will thank you for it.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=b678d5236174" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[ Everything You Need to Know About CDNs (Content Delivery Networks)]]></title>
            <link>https://medium.com/@das.sudeept/everything-you-need-to-know-about-cdns-content-delivery-networks-9e5782541157?source=rss-999fdebe56f------2</link>
            <guid isPermaLink="false">https://medium.com/p/9e5782541157</guid>
            <category><![CDATA[software-development]]></category>
            <category><![CDATA[system-design-interview]]></category>
            <category><![CDATA[architecture]]></category>
            <category><![CDATA[servers]]></category>
            <category><![CDATA[cdn]]></category>
            <dc:creator><![CDATA[Das Sudeept]]></dc:creator>
            <pubDate>Sun, 11 May 2025 20:27:48 GMT</pubDate>
            <atom:updated>2025-05-11T21:15:12.595Z</atom:updated>
            <content:encoded><![CDATA[<p>In an era where users expect websites to load in milliseconds and downtime is unforgivable, <strong>Content Delivery Networks (CDNs)</strong> play a silent but powerful role in ensuring speed, scale, and security. Let’s break down what CDNs are, how they work, and why you should be using one.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*esWcK8lL93Q1oBykKdLIJQ.png" /></figure><h3>📦 What is a CDN?</h3><p>A <strong>Content Delivery Network (CDN)</strong> is a geographically distributed group of servers that work together to provide fast delivery of Internet content. It minimizes the distance between a user and a website’s server by caching content at edge locations across the globe.</p><h3>🔁 How CDNs Work — Under the Hood</h3><ol><li><strong>User Request</strong>: A user opens your site.</li><li><strong>DNS Routing</strong>: The request is routed via DNS to the nearest CDN PoP (Point of Presence).</li><li><strong>Edge Server Handling</strong>: If the content is cached (a cache hit), it’s delivered instantly.</li><li><strong>Cache Miss</strong>: If not, the edge server fetches it from the origin server, stores it, and delivers it to the user.</li></ol><blockquote><em>Most CDNs use </em><strong><em>Time-to-Live (TTL)</em></strong><em> rules and cache invalidation strategies to keep data fresh.</em></blockquote><h3>🧩 Key Components of a CDN</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*g_ORAn7-DJwXsQBQ5RiKdg.png" /></figure><h3>⚡ CDN Features and Benefits</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*C6xqywZSA_u6ja1PJSmPZQ.png" /></figure><h3>🧠 Types of Content Served by CDNs</h3><ul><li><strong>Static Assets</strong>: Images, JavaScript, CSS, fonts</li><li><strong>Dynamic Content</strong>: Personalized HTML, API responses (with edge logic)</li><li><strong>Video Streaming</strong>: HLS/DASH adaptive bitrate video</li><li><strong>Software Distribution</strong>: Updates, installers, patches</li><li><strong>Web Applications</strong>: Complete SPAs (Single Page Apps) served via edge</li></ul><h3>🧪 CDN Use Cases in the Real World</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*b7n2ae7hCub5QHlZAUJ_rA.png" /></figure><h3>🔐 CDN + Security = Edge Shield</h3><p>CDNs today come bundled with robust <strong>security features</strong>:</p><ul><li><strong>WAF (Web Application Firewall)</strong>: Blocks XSS, SQLi, etc.</li><li><strong>DDoS Protection</strong>: Mitigates attacks at the edge</li><li><strong>SSL Offloading</strong>: Terminates SSL at edge for speed</li><li><strong>Bot Protection</strong>: Identifies and blocks bad traffic</li></ul><blockquote><em>Example: </em><strong><em>Cloudflare</em></strong><em> automatically blocks over 100 billion malicious requests per day.</em></blockquote><h3>🏁 Conclusion: Which CDN is Best for You?</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*mYnP0Qucd7Tuo_RWpNFR0Q.png" /></figure><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*LmqZqANDzmqFRRAbRy_3Pg.png" /></figure><h3>🛠️ Developer’s CDN Checklist</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*nSASJhmiQLNZjBswxOPE7g.png" /></figure><h3>⚙️ Advanced CDN Concepts</h3><h3>1. Edge Computing</h3><p>Modern CDNs let you <strong>run code at the edge</strong>, close to users — for tasks like A/B testing, personalization, auth token validation.</p><p>Providers: Cloudflare Workers, Fastly Compute@Edge, Akamai EdgeWorkers</p><h3>2. Origin Failover</h3><p>If your primary origin goes down, CDNs can <strong>fail over</strong> to a secondary server for resilience.</p><h3>3. Real-Time Logs &amp; Analytics</h3><p>Monitor traffic, cache hit/miss ratio, threat reports via dashboards or logging integrations.</p><h3>4. Dynamic Acceleration</h3><p>Some CDNs (like Fastly) also cache <strong>API responses</strong>, reducing time-to-first-byte (TTFB) dramatically.</p><h3>🤔 When Not to Use a CDN?</h3><p>While CDNs are powerful, there are edge cases:</p><ul><li>Highly sensitive apps needing <strong>full control</strong> of delivery (e.g., banking apps)</li><li>Intranet-only apps with no public access</li><li>Dynamic, non-cacheable data changing every second</li></ul><h3>📊 CDN vs No CDN — Impact</h3><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*X1LIFXoD0TPwp5R8L3WR2Q.png" /></figure><h3>✨ Conclusion</h3><p>A CDN is no longer a “nice-to-have”; it’s a <strong>critical part of modern application architecture</strong>. Whether you’re running a blog, SaaS product, streaming platform, or mobile app — a CDN helps you deliver <strong>fast</strong>, <strong>secure</strong>, and <strong>reliable</strong> content globally.</p><blockquote><em>The best part? You can get started for free (Cloudflare, BunnyCDN) and scale as you grow.</em></blockquote><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9e5782541157" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>