- Split up “initializeCuda” into three functions: one to initialize the CUDA API, another one to get a device handle and a third one to create a context
- Added a function to query device attributes
- Added functions to query these particular attributes: compute capability, the maximum number of threads per block and the first grid dimension
- Added a function to query the device name
- Introduced the CudaEnvironment data class: it holds the context and the relevant device attributes
- Kernels are now compiled for specific compute capabilities.
- The number of threads and blocks is computed automatically based on the input dimension and device attributes.