Concurrent Streams and Copy/Compute Overlap: CUDA 101 (part 2).

5 min readSep 26, 2023

Welcome back to the second part of our CUDA series! 🚀 🚀 🚀

In our previous post, we uncovered the foundational concepts of CUDA programming, unlocking the door to remarkable parallel processing capabilities. Today, we’re delving even deeper into the exciting realm of concurrent streams, a game-changing feature that can supercharge your CUDA applications for peak performance.

Introduction to Concurrent Streams

In the world of GPU computing, achieving peak performance is the ultimate goal. What if I told you there’s a way to elevate your GPU’s potential even further? Enter concurrent streams — a powerhouse feature in CUDA programming. In this post, we’ll delve into the fundamentals. You’ll gain a clear understanding of what concurrent streams are and why they are essential in the CUDA toolkit. By unlocking this level of parallelism, you’ll take a significant step towards achieving the full potential of your GPU.

You can think of concurrent streams as separate queues or pipelines for executing GPU tasks. Each stream operates independently and asynchronously, allowing multiple GPU operations to occur concurrently.

Concurrent Streams and Copy/Compute Overlap: CUDA 101 (part 2).

Introduction to Concurrent Streams

Independence and Parallelism

Written by Nacho Zobian