Geek Culture
Published in

Geek Culture

Running Deepy Locally on WSL2 in Windows 11

Hi!

In this post I’ll share my experience running Deepy, our multiskill AI Assistant platform, on a local PC running Windows 11 with WSL2.

Last week my trusty Lenovo ThinkPad X1 Yoga 2nd Gen showed its age (it was bought in December 2017, and has been used almost non-stop across Russia in US ever since) and I’ve decided to move to a new PC. This time given the pandemics and work-from-home thing, my goal was to have a solid rig w/o compromises. The dream is to be able to host our AI Assistants/socialbots at least partially locally.

To give some perspective at DeepPavlov, we have DGX cluster with VMs running w/ a number of NVIDIA GTX 1080 TI GPUs. For our last year’s DREAM AI Assistant Demo I used one such VM with 3 GPUs; back then it was more or less enough to host the entire thing. We also have PCs with i7–7700, 32GB RAM, and same GPUs. At home, my X1 Yoga was equipped with a normal NVIDIA GTX 1080 GPU. Similar setup but non-TI GPU and just 16GB of RAM.

And so, my new rig is a PC (fifth in my entire life), with AMD Ryzen 7 5800x (16x virtual cores), 64GB RAM, the same NVIDIA GTX 1080 previously installed into Thunderbolt 3 dock, and 2TB Samsung SSD (NVMe). This rig has a potential to host up to 4GPUs (3 PCIE 16x ports + 2 Thunderbolt 3 ports enabled by the GIGABYTE VISION D-P motherboard), and up to 128GB of RAM.

This rig is currently running under Windows 11 (22000.51 at the time of writing). What I want is to be able to run our AI assistant platform locally (at least partially) and experience what you, our developers, would experience.

Note: There is no plan running it under Ubuntu as there are some desktop apps (mainly Microsoft Office and Visual Studio) that won’t run there.

The key thing is that our Deepy as well as DREAM AI Assistant Platform Demo, and other socialbots use our and third-party GPU-heavy NLP models. It makes sense to run them on my 1080 instead of using CPU.

Why WSL2?

Until recently, it was impossible in Windows to pass your host’s GPU into the virtual machines, but now this is a standard functionality of Windows 11.

As a side effect, this also allows you to run Linux GUI apps, too.

To run our platform under Windows 11, therefore, we can use one of at least three possible ways below:

  • Full-blown VM (running in Hyper-V)
  • Building on Windows
  • WSL2

While it might be interesting to build our platform natively on Windows, given that it would involve checking compatibility for all of the components, it makes sense to run things natively in Linux (e.g., Ubuntu 18.04) which we use in our data center anyways.

Turns out if you want to use your GPU in Hyper-V VM, you have to explicitly dismount it from your host machine, and mount it in the VM. This is not an option as my rig has only one GPU, and it can’t live without it.

Therefore, WSL2 with its host GPU sharing is the way to go. Below you can find the detailed instructions enabling you to get our Deepy up and running inside WSL2 on a Windows 11 machine with the compatible NVIDIA GPU.

Step-By-Step Instructions

Prerequisites

You need admin rights to your machine. Use Windows Terminal (or any other terminal app of your choice) to begin installation.

Step 1: Install Windows Subsystem for Linux:

dism.exe /online /enable-feature /featurename:Microsoft-Windows-Subsystem-Linux /all /norestart

Step 2: Install Virtual Machine Platform (required for WSL2):

dism.exe /online /enable-feature /featurename:VirtualMachinePlatform /all /norestart

Step 3: Download the latest Linux kernel update: AMD64

Step 4: Make sure your WSL is set as WSL2 by default:

wsl --set-default-version 2

Step 5: Pick your Linux distributive from Microsoft Store, e.g., Ubuntu 18.04:

  • Ubuntu 18.04 LTS (other distribs are listed here; you’ll need glibc-enabled distributive to use NVIDIA CUDA-enabled driver in WSL, e.g., Ubuntu or Debian)

Step 6: Download and install the preview GPU driver:

NVIDIA CUDA-enabled driver for WSL

Step 7 (Optional): Use NVIDIA GeForce experience to upgrade your driver to the latest version.

Step 8: Make sure your WSL2 Linux Kernel version is at least 4.19.121 or hire:

wsl cat /proc/version

If it’s not, use Windows Update to get the latest version.

Step 9: Follow the instructions from NVIDIA to install NVIDIA CUDA Toolkit. I’ve included them here for brevity but feel free to follow the link above:

$ apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub

$ sh -c 'echo "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" > /etc/apt/sources.list.d/cuda.list'

$ apt-get update

Step 10: Install CUDA. We use CUDA 10 in our current version of DeepPavlov Library, so use this command:

$ apt-get install -y cuda-toolkit-10-0

Step 11: Make sure your CUDA apps can run using your host NVIDIA GPU:

$ git clone https://github.com/NVIDIA/cuda-samples$ cd /usr/local/cuda/samples/4_Finance/BlackScholes$ ./BlackScholes

You should get an output like this:

Initializing data...
...allocating CPU memory for options.
...allocating GPU memory for options.
...generating input data in CPU mem.
...copying input data to GPU mem.
Data init done.

Executing Black-Scholes GPU kernel (131072 iterations)...
Options count : 8000000
BlackScholesGPU() time : 1.314299 msec
Effective memory bandwidth: 60.868973 GB/s
Gigaoptions per second : 6.086897

...

If this is what you see (your numbers will be different), this is great! It means you are now ready to move forward with NVIDIA Docker installation.

Step 12: Install Docker CE

Important: currently, Docker Desktop’s WSL2 backend isn’t supported by NVIDIA Container Toolkit. However I won’t advise you to use Docker Desktop anyways.

$ curl https://get.docker.com | sh

Step 13: Install NVIDIA Container Toolkit (you can read their instructions here):

Setup the stable and experimental repositories and the GPG key. The changes to the runtime to support WSL 2 are available in the experimental repository:

$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list

Install the NVIDIA runtime packages (and their dependencies) after updating the package listing:

$ sudo apt-get update

$ sudo apt-get install -y nvidia-docker2

Open a separate WSL 2 window and start the Docker daemon again using the following commands to complete the installation.

$ sudo service docker stop$ sudo service docker start

Step 14: Check that your NVIDIA docker containers can run on your WSL2 machine

In this example, let’s run an N-body simulation CUDA sample. This example has already been containerized and available from NGC.

$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

From the console, you should see an output as shown below.

$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
Run "nbody -benchmark [-numbodies=<numBodies>]" to measure performance.
-fullscreen (run n-body simulation in fullscreen mode)
-fp64 (use double precision floating point values for simulation)
-hostmem (stores simulation data in host memory)
-benchmark (run benchmark to measure performance)
-numbodies=<N> (number of bodies (>= 1) to run in simulation)
-device=<d> (where d=0,1,2.... for the CUDA device to use)
-numdevices=<i> (where i=(number of CUDA devices > 0) to use for simulation)
-compare (compares simulation results running once on the default GPU and once on the CPU)
-cpu (run n-body simulation on the CPU)
-tipsy=<file.bin> (load a tipsy model file for simulation)
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.> Windowed mode
> Simulation data stored in video memory
> Single precision floating point simulation
> 1 Devices used for simulation
GPU Device 0: "GeForce GTX 1070" with compute capability 6.1
> Compute 6.1 CUDA device: [GeForce GTX 1070]
15360 bodies, total time for 10 iterations: 11.949 ms
= 197.446 billion interactions per second
= 3948.925 single-precision GFLOP/s at 20 flops per interaction

Hooray!

Run Deepy on your new WSL2-based Linux environment

Finally, it’s time to try out Deepy!

Step 1: Clone Deepy’s repository

$ git clone https://github.com/deepmipt/deepy

Step 2: Build and run it

$ docker-compose up --build

Step 3: Try it out!

Once the whole thing will be downloaded, built, and run, you can play with the Deepy on your machine.

Experiment With Deepy

There are several ways to play with Deepy:

  • via its REST APIs available in this case at http://127.0.0.1:4242/ (see the docs here)
  • via Deepy 3000 (our small UWP app originally built for our talk at NVIDIA GTC Fall 2020)
  • via our demo.deeppavlov.ai website running locally on your machine

For brevity, I’ll explain here how to get our demo website up and running:

Step 1: Clone demo2’s repository:

$ git clone https://github.com/deepmipt/demo2

Step 2: Install node.js (below are instructions taken from here):

Enable the NodeSource repository by running the following curl command as a user with sudo privileges :

$ curl -sL https://deb.nodesource.com/setup_12.x | sudo -E bash - 

The command will add the NodeSource signing key to your system, create an apt sources repository file, install all necessary packages and refresh the apt cache.

Once the NodeSource repository is enabled, install Node.js and npm by typing:

$ sudo apt-get install nodejs

The nodejs package contains both the node and npm binaries.

Step 3: Install Yarn:

Enable Yarn repository:

$ curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | sudo apt-key add -

Add the Yarn APT repository to your system’s software repository list by typing:

$ echo "deb https://dl.yarnpkg.com/debian/ stable main" | sudo tee /etc/apt/sources.list.d/yarn.list

Once the repository is added to the system, update the package list, and install Yarn, with:

$ sudo apt-get update$ sudo apt-get install yarn

Step 4: Edit URI used to access Deepy bot:

$ cd /demo2/src/components/skills/en$ nano Chat.tsx

In the editor, change URI to 127.0.0.1:4242:

Save and close the editor (Ctrl+X, Y).

Step 5: Run the website:

Go to the root of your cloned demo2 repository:

$ cd ../../../../$ yarn start

Once everything is built, you should see this:

Voila.

Step 6: Open your copy of our demo2 website in the browser:

http://localhost:3000/#/en/chat

This should show you Deepy 3000 web UI:

Click on “Agree”, and you’re good to go!

Now that you’ve got your system up and running, you can follow our Deepy’s wiki to learn more how to build your own skills and annotators for your own Multiskill AI Assistant!

Best of luck and let us know what you’ve built with Deepy!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Daniel Kornev

Chief Product Officer @ DeepPavlov.AI. Shipped a part of Yandex AI Assistant Alice in 2018. Previously worked at Microsoft, Google, and Microsoft Research.