Write CUDA Kernels in Pure Rust. No DSL. No Bindings.
If you've ever tried combining Rust and CUDA, you know the pain. You either wrap the CUDA C API with unsafe bindings, write inline PTX assembly, or use a half-baked DSL that abstracts too much and debugs too little.
But what if you could just write Rust? Like, actually write Rust, and have it compile directly to GPU code?
That's exactly what cuda-oxide does. It's a research project from NVIDIA Labs that lets you write CUDA kernels in pure Rust — no C, no DSL, no bindings. Just Rust code that compiles to PTX (the intermediate language for NVIDIA GPUs).
What It Does
cuda-oxide is a compiler and runtime that takes Rust source code and turns it into PTX that runs on NVIDIA GPUs. It's not a wrapper around CUDA C — it's a new path that uses the Rust compiler's backend to generate GPU code directly.
You write functions in Rust (with some #[kernel] attributes), and the tool compiles them to .ptx files that you can load and launch from your Rust host code. No C, no CUDA C, no bindings. Just Rust on both sides.
Why It's Cool
The obvious win: you get Rust's type system and memory safety on the GPU. That's huge for correctness in parallel code.
But there's a deeper benefit. cuda-oxide is fundamentally different from existing approaches:
- No DSLs. You're not learning a custom language or macro system. It's Rust.
- No bindings. No FFI to C, no manual memory management across languages.
- Full Rust semantics. You can use
if let,match, closures, iterators, and evenResulttypes in your kernels. The compiler handles the translation to PTX.
The clever part is how it works: instead of modifying the Rust compiler, it uses the existing LLVM backend and adds a PTX target. So your Rust code goes through the same optimizations as normal Rust, then gets emitted as PTX instead of x86 or ARM assembly.
How to Try It
This is an early research project, so don't expect a stable package manager install yet. But you can build and play with it:
git clone https://github.com/NVlabs/cuda-oxide
cd cuda-oxide
cargo build --release
The repo includes examples (like a vector addition kernel) and a runtime loader. You write your kernel in a #[kernel] function, compile with the cuda-oxide toolchain, and then use the cuda-oxide-runti