Up and running with CUDA on Microsoft Surface Book

So you want to do some deep learning on the go

It was time to put my 7 year old HP laptop out to pasture once the screen started turning yellow at anything but a perfect 35degree angle. What a good excuse to get a dGPU Surface Book, since I’ve been meaning to try a few more experiments around deep learning that don’t require running off to a cluster (plus hard to carry those on plane flights!)

Out of the box and time to start downloading packages, and oh wait, can’t run CUDA…darn.

The in-box nvidia driver for the Surface Book doesn’t support CUDA.

Go directly to the nvidia site to grab an official driver for the Surface Book: http://www.geforce.com/drivers/results/96081

After that, follow the instructions from nvidia.


The exact NVidia driver may have changed and as of this post the 7.5 CUDA toolkit only supported Visual Studio 2013, not 2015.

Running for the first time:

Check that nvcc is installed:

C:\programdata\NVIDIA Corporation\CUDA Samples\v7.5\bin\win64\Release> nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005–2015 NVIDIA Corporation
Built on Tue_Aug_11_14:49:10_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Run deviceQuery from the same directory:

CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: “GeForce GPU”
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 5.0
Total amount of global memory: 1024 MBytes (1073741824 bytes)
( 3) Multiprocessors, (128) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 993 MHz (0.99 GHz)
Memory Clock rate: 2505 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 1048576 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce
Result = PASS

Here are the results of bandwidthTest

[CUDA Bandwidth Test] — Starting…
Running on…
Device 0: GeForce GPU
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1482.5
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 1592.4
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 33943.5
Result = PASS

That’s it, CUDA 7.5 is installed on the Surface Book.

Next time, actually toying around with training on the Surface Book.

Throughout all this, I’ve been using Chocolatey and Windows 10 package management interface which makes bulk updates, and re-installation much cleaner.

This is a blog and the opinions, thoughts, and hopefully occasional stupidity expressed here represent my own and not those of my employer.

Show your support

Clapping shows how much you appreciated Ryan Galgon’s story.