We are excited to announce the availability of CuPy v9.0.0. This release contains the effort of development in the past 7 months, including CUDA JIT to transpile Python code to CUDA, support for NVIDIA cuSPARSELt, AMD ROCm support through binary packages, and so on.
See here for the full release note.
CuPy v9 introduced the JIT API that allows you to define CUDA kernels in Python code. Here is how the code would look like:
This is an equivalent of a reduction code shown in NVIDIA’s slide deck (page 7). You can then execute the kernel like this: