Using Windows Subsystem for Linux for Data Science

Hugo Ferreira
Hugo Ferreira’s blog
5 min readJun 19, 2018

How to run a Linux environment directly on Windows 10

Learning Data Science on Linux… on Windows!

If you work on Data Science, and you do it on a Windows machine, you know that there are some libraries which do not work on Windows. At this point, you either move your work to a Linux machine, install a virtual machine or deal with cumbersome dual-booting.

Another option, if you’re running the latest versions of Windows 10, is to use the Windows Subsystem for Linux (WSL), which allows developers to run Linux environments directly on Windows, without the overhead of a virtual machine.

In this post, I’ll describe how to install an Ubuntu distribution on Windows 10 as the Linux subsystem and install Anaconda and the usual Data Science libraries. As you’ll see, if you do most of your work on Linux, the WSL allows you to keep your workflow almost unchanged if you need to use a Windows machine.

The version of Windows 10 I’m using at the moment is 1803 and the Linux distribution I’m going to install is Ubuntu 16.04.

Turn on Windows Subsystem for Linux

Before installing a Linux distribution, a few steps are necessary. First, you need to enable ‘Developer mode’. You can do this by going to Settings > Update & Security > For developers > Developer mode.

Then, go to Control Panel > Programas > Turn Windows Features On and Off and select ‘Windows Subsystem for Linux’.

You need to restart your computer after this.

Install Ubuntu from the Microsoft Store

After restarting your computer, go to the Microsoft Store and search for ‘Ubuntu’ (or your favourite Linux distribution; others available are openSUSE Leap 42, SUSE Linux Enterprise Server 12, Debian GNU/Linux and Kali Linux). Install it as you would any other Windows app.

Click the ‘Launch’ button. The first time you run it, you will be prompted to enter a UNIX username and password, which do not have to be the same as your Windows username and password.

After the initial setup, you can run the Ubuntu distribution directly from the Command Prompt by typing the bash command.

Note that you can access your Windows files and folders: the Windows file system is located at /mnt/c in the Bash shell environment.

Install Anaconda

For my Data Science work I have installed Anaconda for Windows, which is a good package and environment management system for Python. If you want to install it in the Linux subsystem, you just have to type the following in the bash:

wget https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
Anaconda3-5.2.0-Linux-x86_64.sh

The installer will prompt you several times, namely to ask you if you want to install Microsoft VS Code, which is an open-source code editor available for Windows, macOS and Linux. It is similar in many ways to Sublime Text, but I found it to be more user-friendly and it is apparently the most popular development environment tool at the moment.

If you need further help, just check the Anaconda documentation.

Create conda environments

If you have used conda before in any Linux environment, you should feel right at home from this point onwards. For instance, to create a new environment, you just type:

conda create -n myenv

To activate it, type:

source activate myenv

And then install all the packages you need for your Data Science workflow. For instance, a common set of packages is installed with

conda install numpy pandas scipy matplotlib seaborn scikit-learn

Use JupyterLab

I do much of my work using Jupyter notebooks. JupyterLab is the next-generation user interface for Project Jupyter and I have been using it more in the last few months. You can have several notebooks, text editors and terminals open simultaneously and, importantly for me, it finally has a dark theme with the base installation!

Using the Linux subsystem, you just type jupyter lab in the Bash shell and use the provided URL in your browser — on Windows! However, all the code is running in the Ubuntu distribution.

One JupyterLab extension I found very useful is jupyterlab-toc, which auto-generates a table of contents in the left area when you have a notebook or markdown document open (you can see it in the image above). To install, just follow the instructions provided in the link.

These are just a few tips on how to have a good default Python environment for Data Science in Windows if you are used to work on Linux and/or need to use some Linux-only tools, but don’t want to deal with dual-booting or virtual machines. I find that, after the initial setup, it’s just as easy as running these tools in a proper Ubuntu distribution.

You can find me on LinkedIn:

--

--

Hugo Ferreira
Hugo Ferreira’s blog

Data Scientist and Machine Learning enthusiast; physicist and maths geek.