Write CUDA GPU kernels in pure Rust. No DSL. No bindings. Just Rust to PTX.
W

Write CUDA GPU kernels in pure Rust. No DSL. No bindings. Just Rust to PTX.

Write CUDA GPU kernels in pure Rust. No DSL. No bindings. Just Rust to PTX.

2,860 stars
N/A forks
N/A contributors

README

Project documentation from GitHub

Write CUDA Kernels in Pure Rust. No DSL. No Bindings.

If you've ever tried combining Rust and CUDA, you know the pain. You either wrap the CUDA C API with unsafe bindings, write inline PTX assembly, or use a half-baked DSL that abstracts too much and debugs too little.

But what if you could just write Rust? Like, actually write Rust, and have it compile directly to GPU code?

That's exactly what cuda-oxide does. It's a research project from NVIDIA Labs that lets you write CUDA kernels in pure Rust — no C, no DSL, no bindings. Just Rust code that compiles to PTX (the intermediate language for NVIDIA GPUs).

What It Does

cuda-oxide is a compiler and runtime that takes Rust source code and turns it into PTX that runs on NVIDIA GPUs. It's not a wrapper around CUDA C — it's a new path that uses the Rust compiler's backend to generate GPU code directly.

You write functions in Rust (with some #[kernel] attributes), and the tool compiles them to .ptx files that you can load and launch from your Rust host code. No C, no CUDA C, no bindings. Just Rust on both sides.

Why It's Cool

The obvious win: you get Rust's type system and memory safety on the GPU. That's huge for correctness in parallel code.

But there's a deeper benefit. cuda-oxide is fundamentally different from existing approaches:

  • No DSLs. You're not learning a custom language or macro system. It's Rust.
  • No bindings. No FFI to C, no manual memory management across languages.
  • Full Rust semantics. You can use if let, match, closures, iterators, and even Result types in your kernels. The compiler handles the translation to PTX.

The clever part is how it works: instead of modifying the Rust compiler, it uses the existing LLVM backend and adds a PTX target. So your Rust code goes through the same optimizations as normal Rust, then gets emitted as PTX instead of x86 or ARM assembly.

How to Try It

This is an early research project, so don't expect a stable package manager install yet. But you can build and play with it:

git clone https://github.com/NVlabs/cuda-oxide
cd cuda-oxide
cargo build --release

The repo includes examples (like a vector addition kernel) and a runtime loader. You write your kernel in a #[kernel] function, compile with the cuda-oxide toolchain, and then use the cuda-oxide-runti

Did you like this issue?

Join our weekly newsletter

Love discovering amazing projects?

Help us continue bringing you the best open-source discoveries every week.

Back to Projects
Last updated: May 18, 2026