<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Bargav Jagatha on Medium]]></title>
        <description><![CDATA[Stories by Bargav Jagatha on Medium]]></description>
        <link>https://medium.com/@bargav25?source=rss-84c1de1b096f------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*fgsDV8MiYa9BkD_q5zFPCQ.jpeg</url>
            <title>Stories by Bargav Jagatha on Medium</title>
            <link>https://medium.com/@bargav25?source=rss-84c1de1b096f------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Sat, 30 May 2026 07:55:06 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@bargav25/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[Distributed Training: Pipeline Parallelism]]></title>
            <link>https://medium.com/@bargav25/distributed-training-pipeline-parallelism-1cf1c1cb9150?source=rss-84c1de1b096f------2</link>
            <guid isPermaLink="false">https://medium.com/p/1cf1c1cb9150</guid>
            <category><![CDATA[machine-learning]]></category>
            <category><![CDATA[distributed-training]]></category>
            <category><![CDATA[large-language-models]]></category>
            <dc:creator><![CDATA[Bargav Jagatha]]></dc:creator>
            <pubDate>Wed, 19 Mar 2025 07:00:08 GMT</pubDate>
            <atom:updated>2025-03-19T07:00:08.030Z</atom:updated>
            <content:encoded><![CDATA[<p>When training very large models, we often run into memory limits on a single GPU. <strong>Model parallelism</strong> helps us overcome these limits by splitting the model across multiple GPUs. There are two main types:</p><h3><strong>Tensor Parallelism</strong></h3><p>Here, each large tensor (for example, a weight matrix) is split into slices and distributed across GPUs. Each GPU holds only a slice of every large tensor. During operations like matrix multiplication, the GPUs work on their slice and then collaborate to aggregate the full result.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*X279Qocp-8gDbKke2X7MZA.png" /></figure><h3><strong>Pipeline Parallelism</strong></h3><p>In contrast, pipeline parallelism splits the model <em>vertically</em>, assigning entire groups of layers to different GPUs. For example, in a simple 4-layer model:</p><pre>output = L4(L3(L2(L1(input))))</pre><p>We can assign layers L1 and L2 to GPU0 and layers L3 and L4 to GPU1. The forward pass flows as follows:</p><blockquote><strong>GPU0:</strong> <em>Computes intermediate = L2(L1(input))</em></blockquote><blockquote><strong>GPU1:</strong> <em>Receives intermediate, computes output = L4(L3(intermediate))</em></blockquote><p>During backpropagation, gradients from GPU1 are sent back to GPU0 so that each layer gets its correct gradient.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*EZYford1MIevvoU3VyB2vQ.png" /></figure><h3><strong>The Challenge with Naive Implementation</strong></h3><p>In a naive model parallel setup, <strong>only one GPU is active at a time</strong>:</p><p><strong>Low GPU Utilization:</strong></p><p>While GPU0 is busy processing its layers, GPU1 is idle waiting for the output to be transferred. As more GPUs are added, each device might only be active a small fraction of the time.</p><p><strong>Communication Overhead:</strong></p><p>Every time data moves from one GPU to another (e.g., from GPU0 to GPU1), a transfer occurs. On a single machine these transfers are relatively fast, but if the GPUs are on different machines, the overhead can significantly slow down training.</p><p>Imagine training on four GPUs: with naive model parallelism, each GPU might only be busy about 25% of the time (ignoring transfer times), which is not very efficient.</p><h3><strong>Enter GPipe: Smarter Pipeline Parallelism</strong></h3><p><strong>GPipe</strong> addresses these inefficiencies by splitting each mini-batch into smaller <strong>micro-batches</strong>. Instead of waiting for an entire batch to be processed layer-by-layer, each micro-batch can be pipelined through the layers concurrently. Here’s how it works:</p><p>1. <strong>Micro-batching:</strong></p><p>The original batch is divided into several micro-batches. For example, if you set<strong> chunks=4</strong>, a mini-batch is split into 4 micro-batches.</p><p>2. <strong>Pipeline Scheduling:</strong></p><p>• <strong><em>Forward Pass:</em></strong></p><p>Each micro-batch flows through the layers in a staggered fashion. While GPU0 is processing micro-batch 1 on its assigned layers, GPU1 might already be processing micro-batch 0 on its part of the model.</p><p>• <strong><em>Backward Pass:</em></strong></p><p>After the forward pass, gradients are computed in reverse order. As soon as a GPU finishes processing a micro-batch, it can start its backward computation, even if the rest of the micro-batches are still in progress.</p><p><strong><em>This interleaving of computation and communication greatly reduces idle time. GPUs are busy processing different micro-batches simultaneously, which increases overall utilization and speeds up training.</em></strong></p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*yci0nhzWbs1PxlmtB90y5Q.png" /></figure><p>In the diagram above, note how the <strong><em>bubbles — representing idle periods</em></strong> — are minimized compared to the naive approach.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/859/1*48Vo8B8KMs58ZjNTpxSq9g.png" /></figure><p>With a pipeline parallelism degree of 4 (4 GPUs), each GPU handles multiple micro-batches in an overlapping manner: first processing several forward passes and then, as work on other GPUs completes, beginning the backward passes.</p><p>For example, GPU0 performs the same forward path on chunk 0, 1, 2 and 3 (F0,0, F0,1, F0,2, F0,3) and then it waits for other GPUs to do their work and only when their work is starting to be complete, GPU0 starts to work again doing the backward path for chunks 3, 2, 1 and 0 (B0,3, B0,2, B0,1, B0,0).</p><p>With chunks=1 you end up with the naive MP, which is very inefficient. With a very large chunks value you end up with tiny micro-batch sizes which could be not every efficient either. So one has to experiment to find the value that leads to the highest efficient utilization of the gpus.</p><p>While the diagram shows that there is a bubble of “dead” time that can’t be parallelized because the last forward stage has to wait for backward to complete the pipeline, the purpose of finding the best value for chunks is to enable a high concurrent GPU utilization across all participating GPUs which translates to minimizing the size of the bubble.</p><h3><strong>Wrapping Up</strong></h3><p>Now that we have understand the common terminology and goals of Pipeline Parallel, Its worth noting that there are several possible ways of scheduling forward and backward microbatches across devices, and each approach offers different tradeoffs between pipeline bubble size, amount of communication, and memory footprint.</p><p>In future posts, we’ll delve into advanced pipeline scheduling strategies and discuss how they further improve performance and scalability.</p><p>Thanks for reading :)</p><h3>References</h3><ul><li><a href="https://huggingface.co/blog/huseinzol05/tensor-parallelism">Tensor Parallelism</a></li><li><a href="https://developer.nvidia.com/blog/scaling-language-model-training-to-a-trillion-parameters-using-megatron/">Scaling Language Model Training to a Trillion Parameters Using Megatron | NVIDIA Technical Blog</a></li><li><a href="https://huggingface.co/docs/transformers/v4.13.0/en/parallelism">Model Parallelism</a></li><li><a href="https://siboehm.com/articles/22/pipeline-parallel-training">Pipeline-Parallelism: Distributed Training via Model Partitioning</a></li></ul><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=1cf1c1cb9150" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[3D Human Pose Estimation using LSTM and Transformer based models]]></title>
            <link>https://medium.com/@bargav25/3d-human-pose-estimation-using-lstm-and-transformer-based-models-9bc98fe090bc?source=rss-84c1de1b096f------2</link>
            <guid isPermaLink="false">https://medium.com/p/9bc98fe090bc</guid>
            <category><![CDATA[transformers]]></category>
            <category><![CDATA[3d-pose-estimation]]></category>
            <category><![CDATA[3d-computer-vision]]></category>
            <category><![CDATA[sequential-model]]></category>
            <category><![CDATA[lstm]]></category>
            <dc:creator><![CDATA[Bargav Jagatha]]></dc:creator>
            <pubDate>Tue, 31 Dec 2024 06:08:34 GMT</pubDate>
            <atom:updated>2025-01-02T00:59:27.620Z</atom:updated>
            <content:encoded><![CDATA[<p>Ever wondered how computers can understand and track human movement in 3D space? I recently developed a system that does exactly that, combining the power of LSTM networks and Transformers to create accurate 3D pose estimates from video footage, using <strong><em>2D to 3D Lifting Approach</em></strong>.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*15dJZWxLbs-g-3bB8BhanA.gif" /></figure><h3>The Challenge of Understanding Human Movement</h3><p>Tracking human movement in 3D space is a complex problem that has applications ranging from animation to medical analysis. While 2D pose estimation has made significant strides, accurately predicting 3D poses brings additional challenges:</p><ul><li>Depth perception from 2D videos</li><li>Handling occlusions and self-occlusions</li><li>Maintaining temporal consistency</li><li>Processing long sequences efficiently</li></ul><figure><img alt="" src="https://cdn-images-1.medium.com/max/512/1*J1OO3gWC_21wv1XON8hrRA.png" /><figcaption>Monocular 3D Pose Estimation</figcaption></figure><h3>Our Approach: Comparing Classic and Modern Architecture</h3><p>Since this is 2D to 3D Lifiting is a sequential problem, We used following neural network architectures as our models:</p><ol><li><strong>LSTM Networks</strong>: Perfect for understanding sequential data and temporal relationships in movement</li><li><strong>Transformer Models</strong>: Excellent at capturing long-range dependencies and parallel processing</li></ol><p>They both achieved an impressive MPJPE (Mean Per Joint Position Error) of:</p><ul><li><strong>55mm with our LSTM-based model</strong></li><li><strong>64mm with our Transformer-based approach</strong></li></ul><p>To put this in perspective, the current state-of-the-art achieves around 30mm — showing that our implementation provides robust performance while remaining accessible and adaptable.</p><h3>Building on Giants’ Shoulders</h3><p>Our work builds upon several outstanding projects in the field:</p><ul><li><a href="https://github.com/QitaoZhao/PoseFormerV2"><em>PoseFormerV2</em></a><em> by QitaoZhao</em></li><li><a href="https://github.com/facebookresearch/VideoPose3D"><em>VideoPose3D</em></a> <em>by Facebook Research</em></li><li><a href="https://github.com/una-dinosauria/3d-pose-baseline"><em>3D Pose Baseline</em></a></li></ul><p>By incorporating insights from these projects and adding our own innovations, we’ve created a flexible framework that researchers and developers can easily adapt to their needs.</p><h3>Real-World Applications</h3><p>This technology has practical applications across multiple fields:</p><ul><li><strong>Animation</strong>: Creating realistic character movements</li><li><strong>Sports Analysis</strong>: Studying athlete performance</li><li><strong>Medical Assessment</strong>: Tracking patient movement patterns</li><li><strong>Human-Computer Interaction</strong>: Building more intuitive interfaces</li></ul><h3>Technical Implementation</h3><p>We trained our model on the Human3.6M dataset, processing videos in windows of 81 frames to capture complex motion patterns. The system processes these sequences through:</p><ol><li>Initial pose detection using YOLOv3 and HRNet</li><li>Temporal modeling with our model architectures</li><li>Final 3D pose estimation</li></ol><h3>Future Directions</h3><ul><li>Reducing the computational requirements</li><li>Improving real-time performance</li><li>Extending to multi-person scenarios</li><li>Handling more challenging viewpoints</li></ul><h3>Join the Journey</h3><p>This project represents a step forward in making 3D pose estimation more accessible to researchers and developers. Whether you’re interested in computer vision, deep learning, or practical applications of AI, there’s something here for you to explore and build upon.</p><p>Want to learn more or contribute to the project? Check out our GitHub repository or reach out to discuss potential collaborations!</p><p><a href="https://github.com/bargav25/3D-Human-Pose-Estimation">GitHub - bargav25/3D-Human-Pose-Estimation</a></p><p><em>The code and pretrained models are available on GitHub, along with detailed setup instructions and documentation.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=9bc98fe090bc" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[KeypointNeRF: A New Approach to 3D Motion Capture Using Neural Radiance Fields]]></title>
            <link>https://medium.com/@bargav25/keypointnerf-a-new-approach-to-3d-motion-capture-using-neural-radiance-fields-c8eb4a1c8ece?source=rss-84c1de1b096f------2</link>
            <guid isPermaLink="false">https://medium.com/p/c8eb4a1c8ece</guid>
            <category><![CDATA[3d-computer-vision]]></category>
            <category><![CDATA[computer-vision]]></category>
            <category><![CDATA[neural-radiance-fields]]></category>
            <category><![CDATA[3d-pose-estimation]]></category>
            <dc:creator><![CDATA[Bargav Jagatha]]></dc:creator>
            <pubDate>Tue, 31 Dec 2024 05:45:17 GMT</pubDate>
            <atom:updated>2024-12-31T05:45:17.537Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*zm65sKJKNQaH6i4gTeOZ_g.png" /><figcaption>Overall Pipeline</figcaption></figure><p>Ever wondered how we could capture the intricate movements of animals in 3D without complex multi-camera setups or pre-defined skeletal models? That’s exactly what we tackled in my directed study, developing <strong>KeypointNeRF</strong> — a novel approach that’s changing how we think about 3D motion capture.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*M10q1fVfyK_dFFy_fMKriw.png" /><figcaption>Comparing rendered images</figcaption></figure><h3>The Challenge: Why Traditional Methods Fall Short</h3><p>Think about capturing a rat’s movement in 3D. Traditional methods typically require either:</p><ul><li>Multiple cameras capturing the subject from different angles</li><li>Pre-defined skeleton models (like those used for human motion capture)</li><li>Complex setups that aren’t practical in many real-world scenarios</li></ul><p>This becomes especially challenging when you’re working with animals, where you can’t simply apply human-based skeletal models, and setting up multiple cameras might disturb their natural behavior.</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/600/1*SboK63Lc5_rniIC0AfCgJQ.gif" /><figcaption>Rendered Rat</figcaption></figure><h3>Our Innovation: Keypoint-Based Neural Radiance Fields</h3><p>We developed a solution that combines the power of Neural Radiance Fields (NeRFs) with a flexible keypoint-based approach. Instead of relying on rigid skeletal models or multiple camera views, we use 3D keypoints and their relationships to capture motion. Here’s what makes it special:</p><ol><li><strong>Single Camera Solution</strong>: Unlike traditional methods, our approach works with footage from just one camera — making it much more practical for real-world applications.</li><li><strong>No Skeleton Required</strong>: Rather than forcing a pre-defined skeleton model, we use keypoints that can adapt to any articulated object, whether it’s a rat, a robot, or any other moving subject.</li><li><strong>Smart Background Handling</strong>: We integrated <strong>SAM2</strong> (Segment Anything Model v2) to automatically remove backgrounds, letting us focus purely on the subject’s motion.</li></ol><h3>The Technical Magic Behind It</h3><p>The real innovation lies in how we handle the 3D space. For each point in space, we compute:</p><ul><li>Relative distances to keypoints</li><li>Directional relationships</li><li>View-dependent effects</li></ul><p>This creates a rich representation that captures not just position, but the complete dynamic nature of the subject’s motion. Think of it as creating a dynamic 3D map that updates with every movement.</p><h3>Real-World Applications</h3><p>This research opens up exciting possibilities across multiple fields:</p><ul><li><strong>Animal Behavior Studies</strong>: Scientists can now capture and analyze animal movements more naturally</li><li><strong>Computer Animation</strong>: Create more realistic animations without complex rigging</li><li><strong>Biomechanics Research</strong>: Study movement patterns with less invasive equipment</li><li><strong>Medical Motion Analysis</strong>: Track patient movements for physical therapy or diagnosis</li></ul><h3>Looking Ahead</h3><p>While our current results are promising, we’re already thinking about future improvements:</p><ul><li>Enhancing motion consistency across frames</li><li>Implementing real-time processing capabilities</li><li>Extending the framework to handle even more complex movements</li></ul><h3>The Bigger Picture</h3><p>This project represents more than just a technical achievement — it’s about making 3D motion capture more accessible and practical. By removing the need for complex multi-camera setups and pre-defined skeletons, we’re opening up new possibilities for researchers, animators, and scientists across various fields.</p><p>Would you like to learn more about this research or discuss potential applications? Feel free to reach out or check out our project materials on GitHub!</p><p><a href="https://github.com/bargav25/RatNeRF">GitHub - bargav25/RatNeRF</a></p><p><em>This research was conducted as part of my directed study at Boston University, building upon recent advances in Neural Radiance Fields and 3D computer vision technology.</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=c8eb4a1c8ece" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Plan As You Go: How We Built an AI-Powered Boston Trip Planner in One Hour]]></title>
            <link>https://medium.com/@bargav25/plan-as-you-go-how-we-built-an-ai-powered-boston-trip-planner-in-one-hour-61dfbbcf4a97?source=rss-84c1de1b096f------2</link>
            <guid isPermaLink="false">https://medium.com/p/61dfbbcf4a97</guid>
            <category><![CDATA[ai-ml-app-development]]></category>
            <category><![CDATA[hackathons]]></category>
            <category><![CDATA[llm-applications]]></category>
            <category><![CDATA[travel-technology]]></category>
            <dc:creator><![CDATA[Bargav Jagatha]]></dc:creator>
            <pubDate>Tue, 31 Dec 2024 05:26:43 GMT</pubDate>
            <atom:updated>2024-12-31T05:52:17.710Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*-Bt-InajtOzS2Pup3ZvEGw.png" /></figure><p>Ever tried planning a trip and felt overwhelmed by endless browser tabs, conflicting reviews, and the constant fear of missing out on the best experiences? That’s exactly what drove my roommate and me to create <strong><em>Plan As You Go</em></strong> during a recent hackathon — a smart trip planner that combines real-time Boston events with AI-powered personalization.</p><p><a href="https://github.com/bargav25/weekend_planner">GitHub - bargav25/weekend_planner</a></p><iframe src="https://cdn.embedly.com/widgets/media.html?src=https%3A%2F%2Fwww.youtube.com%2Fembed%2FvzGA5w4GMQg%3Ffeature%3Doembed&amp;display_name=YouTube&amp;url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DvzGA5w4GMQg&amp;image=https%3A%2F%2Fi.ytimg.com%2Fvi%2FvzGA5w4GMQg%2Fhqdefault.jpg&amp;type=text%2Fhtml&amp;schema=youtube" width="640" height="480" frameborder="0" scrolling="no"><a href="https://medium.com/media/50c89ca5fcd9387ae45f59ac481c0465/href">https://medium.com/media/50c89ca5fcd9387ae45f59ac481c0465/href</a></iframe><h3>The “Aha!” Moment</h3><p>As Boston residents, we’ve seen countless tourists (and even locals) struggle to piece together the perfect itinerary. Sure, everyone knows about the Freedom Trail and Fenway Park, but what about that underground jazz concert happening next weekend? Or that pop-up food festival in Cambridge? That’s when it hit us — why not create a tool that blends the best of both worlds: AI’s comprehensive knowledge of Boston’s attractions and real-time data about current events?</p><h3>Building the Time Machine</h3><p>The most exciting part? We built this entire project in just one hour! Here’s how we did it:</p><ol><li>First, we tapped into <a href="https://www.thebostoncalendar.com/">https://www.thebostoncalendar.com/</a> to get real-time event data, ensuring our users would never miss out on the city’s latest happenings.</li><li>Then, we leveraged <strong>Gemini’s Flash API</strong> to create a smart recommendation engine. We crafted our prompts to generate structured JSON responses, making it easy to parse and display personalized recommendations based on:</li></ol><blockquote><strong>Food preferences</strong> (because no one should miss out on Boston’s incredible culinary scene)</blockquote><blockquote><strong>Date flexibility</strong> (weekend warriors, we’ve got you covered)</blockquote><blockquote><strong>Budget constraints</strong> (from student-friendly to luxury experiences)</blockquote><blockquote><strong>Personal interests</strong> (history buff? Food enthusiast? Art lover? Check, check, and check!)</blockquote><h3>The Magic Behind the Scenes</h3><p>What makes Plan As You Go special isn’t just its comprehensive database — it’s how it understands what makes each trip unique. By combining real-time events with AI-powered recommendations, we created a system that doesn’t just list attractions; it crafts experiences.</p><p>Want to catch a Red Sox game and find the perfect pre-game dinner spot in Fenway? Our AI considers everything from walking distance to reservation availability. Interested in contemporary art? It might pair a visit to the ICA with an upcoming gallery opening in SoWa that perfectly matches your interests.</p><h3>The Learning Experience</h3><p>While we didn’t win the hackathon (note to self: always read the fine print about AI usage declarations!), we created something we’re genuinely proud of. Plan As You Go demonstrates how AI can transform the way we explore cities, making travel planning more personalized and spontaneous.</p><h3>What’s Next?</h3><p>This one-hour project opened our eyes to the possibilities of AI-powered travel planning. Could this be scaled to other cities? Could we add more real-time data sources? The possibilities are endless, and we’re just getting started.</p><p>For those interested in the technical details or wanting to contribute, check out our <a href="https://github.com/bargav25/weekend_planner"><strong>GitHub repository</strong></a>. Who knows? Maybe your contribution will help someone discover their next favorite Boston experience!</p><p>Remember, sometimes the best projects don’t come from months of planning — they come from recognizing a simple problem and realizing you have the tools to solve it right now.</p><p><em>Have you used AI to build something cool in record time? Share your story in the comments below!</em></p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=61dfbbcf4a97" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>