Introducing Bend: The High-Level Language for Massively Parallel Computing

Aditya Pratap Singh
3 min readMay 19, 2024

In the ever-evolving landscape of programming languages, Bend emerges as a revolutionary high-level language designed to harness the power of massively parallel computing. Imagine combining the expressive capabilities of Python and Haskell with the raw computational power of GPUs, all without the complexity of traditional parallel programming. Bend offers just that.

What is Bend?

Bend is a high-level programming language that allows you to write expressive, Python-like code while achieving near-linear speedup on multi-core and massively parallel hardware like GPUs. Unlike low-level alternatives such as CUDA and Metal, Bend abstracts away the complexities of parallel programming, allowing developers to focus on writing clean, high-level code. Powered by the HVM2 runtime, Bend seamlessly manages parallel execution without the need for explicit thread management or synchronization primitives.

Key Features of Bend

- High-Level Syntax: Bend’s syntax is intuitive and reminiscent of languages like Python and Haskell, making it accessible and easy to learn.
- Automatic Parallelization: Bend automatically parallelizes code, ensuring that any code that can run in parallel will do so without requiring explicit parallel annotations.
- Massively Parallel Execution: Bend leverages the power of GPUs and multi-core CPUs to deliver significant performance improvements for suitable tasks.
- Advanced Language Features: Supports higher-order functions, full closures, unrestricted recursion, and continuations.

Getting Started with Bend

To start using Bend, you’ll need to install Rust nightly, HVM2, and Bend. Here are the installation steps:

1. Install Rust nightly.
2. Install HVM2 and Bend using Cargo:

cargo +nightly install hvm
cargo +nightly install bend-lang

Running Bend Programs

bend run <file.hvm>

Bend supports various execution modes to suit different needs:
- Sequential execution using the Rust interpreter:

bend run <file.hvm>

Parallel execution using the C interpreter:

bend run-c <file.hvm>

Massively parallel execution using the CUDA interpreter:

bend run-cu <file.hvm>

For maximum performance, Bend code can also be compiled to standalone C/CUDA files using `gen-c` and `gen-cu`.

Parallel Programming Made Easy

Bend simplifies parallel programming by automatically parallelizing code where possible. For instance, consider the following expressions:

# Cannot run in parallel because of sequential dependency
(((1 + 2) + 3) + 4)
# Can run in parallel because of independent computations
((1 + 2) + (3 + 4))

Bitonic Sorter Example

Here’s a more complex example: implementing a bitonic sorter using Bend. This algorithm, though not traditionally suited for GPUs, benefits immensely from Bend’s parallel execution model.

# Sorting Network = just rotate trees!
def sort(d, s, tree):
switch d:
case 0:
return tree
case _:
(x,y) = tree
lft = sort(d-1, 0, x)
rgt = sort(d-1, 1, y)
return rots(d, s, lft, rgt)
# Rotates sub-trees (Blue/Green Box)
def rots(d, s, tree):
switch d:
case 0:
return tree
case _:
(x,y) = tree
return down(d, s, warp(d-1, s, x, y))

Impressive Benchmarks

Bend delivers impressive performance improvements with minimal effort:
- CPU (1 thread): 12.15 seconds
- CPU (16 threads): 0.96 seconds
- GPU (NVIDIA RTX 4090, 16k threads): 0.21 seconds

That’s a 57x speedup by simply leveraging Bend’s automatic parallelization capabilities.

Versatility in Parallel Computing

Bend is not limited to a specific paradigm. It supports a wide range of applications, from real-time rendering to concurrent systems. For example, you can render images in real-time using a simple shader function:

# given a shader, returns a square image
def render(depth, shader):
bend d = 0, i = 0:
when d < depth:
color = (fork(d+1, i*2+0), fork(d+1, i*2+1))
else:
width = depth / 2
color = shader(i % width, i / width)
return color
# given a position, returns a color
# for this demo, it just busy loops
def demo_shader(x, y):
bend i = 0:
when i < 5000:
color = fork(i + 1)
else:
color = 0x000001
return color
# renders a 256x256 image using demo_shader
def main:
return render(16, demo_shader)

Join the Bend Community

To dive deeper into Bend, explore the following resources:
- GUIDE.md
- FEATURES.md
- HVM2 Paper
- HigherOrderCO
- Discord Community

Note on Performance

While Bend excels in parallel execution, its single-core performance is currently suboptimal. However, the team is actively working on optimizing the compiler and improving performance with each release.

Conclusion

Bend represents a significant leap forward in parallel computing, making it accessible and efficient for developers to harness the power of GPUs and multi-core systems. With its high-level syntax and automatic parallelization, Bend opens new possibilities for writing performant, concurrent applications without the traditional complexities of parallel programming. Whether you’re working on real-time rendering, complex algorithms, or concurrent systems, Bend is poised to be a valuable tool in your programming arsenal.

--

--