Toolkit Tour: Data Science Development on Windows 10

Ethan Henley
The Startup
Published in
5 min readJun 10, 2020

Most of the programmers and researchers I know will swear by their MacBooks as the best personal computers available for software development. From astrophysics to app dev, a lot of work is simply more convenient on a UNIX-based operating system like the macOS — or so they think. While I’m not here to disparage anyone’s personal choices, I want to prove that software development isn’t that difficult on my personal choice of Windows 10, and to do it, I want to offer a walkthrough of my typical toolkit.

1. The Windows Terminal

A screenshot of the experimental Windows Terminal with an open PowerShell tab.

Most of the features I use for programming are labeled by Microsoft as “experimental,” but in almost all cases they’ve been as stable and reliable as the more official features of Windows 10. Though billed as a preview, I’ve found the Windows Terminal is as convenient and usable as the native Mac Terminal. But, it must be made clear, the Windows Terminal is really just an interface; it simply allows for the consolidation and convenient formatting of numerous command and terminal windows into one box. When the command prompt or PowerShell shell show up in the Windows Terminal, the functionality isn’t inherent to the terminal — it’s just a generic window into these prompts.

A screenshot of the preview Windows Terminal with an open cmd tab.

Of course, there is a positive side to this. Because it’s modular by necessity and doesn’t run on its own custom code, Windows Terminal can be expanded to virtually any typical shell you can run on Windows 10 via a customizable .JSON file that can be easily accessed.

2. Anaconda & Jupyter

Windows Terminal running Anaconda Prompt, which itself has opened a Jupyter Notebook.

A great part of operating system preference depends on whether or not specific software has stable and still-updated versions running on a given OS. Thankfully, the Windows 10 ecosystem is alive and well. I do most of my work in Python, and that usually means I’m working off of an Anaconda distribution. Frustratingly, the Anaconda software family does not interface naturally with command prompt or PowerShell the way it might with a Mac’s native terminal, but with Windows Terminal, all that functionality can be accessed just by adding an Anaconda Prompt tab.

A Python 3 Jupyter Notebook, running on a kernel opened in Anaconda Prompt.

Jupyter Notebooks operate on Windows essentially just the same as they do on any other OS. Though there will always exist differences in distributions across platforms, everything is essentially the same, and I can generally be confident that, with dependencies addressed, code will always work on my machine the same way it does on someone else’s.

I bring up Anaconda and Jupyter Notebooks here because I’m primarily a Python programmer right now, so I use them for most of my work. But, I am learning R on the side, and I agree with the consensus that the free version of RStudio is the best place to code in R on any type of machine. My experience with the Windows distribution of RStudio is limited, but so far I’ve found it to be completely capable, with no apparent flaws. I mention this only so that any Data Scientists who work primarily in R know that the Windows 10 ecosystem is capable of supporting them as well as my Python-programming comrades.

3. Windows Subsystems for Linux

Windows Terminal for a WSL Ubuntu instance.

I’m very fond of UNIX-style architecture, and have always preferred it to the Windows filesystems. Most of my colleagues feel the same way, and often this is used as an argument for Macs over Windows machines. But for most of my adult life, this has been a false choice: the Windows 10 file structure is compatible with many flavors of Linux, and the two are much more compatible than macOS is with ordinary Linux distributions.

I’m writing, of course, about Windows Subsystems for Linux, another “experimental” Windows feature that I consider a fundamental part of my tech stack. Even though I’m a Windows user, I know the bash shell better than I know PowerShell and cmd. So, when I want to do command line file manipulation, or better yet, shell scripting, I open up an Ubuntu instance in my Windows Terminal. This is what I use to interact with AWS, to SSH into a remote system, or to use Git version control and push and pull files to GitHub.

My command line editor of choice is Vim, though I often use Notepad++ and Sublime for scripting.

WSL does have limitations. While I can easily manipulate Windows files in the Ubuntu bash shell, Ubuntu files cannot currently be accessed by the normal Windows filesystem. And without writing and navigating awkward path structures, it’s easy to accidentally end up with two independent installations of, say, Python; one for your Windows system and one for your WSL system. Some of that should be remedied in WSL 2 as part of Windows 10 2004 (that’s the April 2020 update), but that release has been plagued by bugs and delays, and while I’m comfortable using the experimental features of a stable Windows release, I’ve been reluctant to jump onto a beta build just to get at some incomplete new features. In fact, I hoped that WSL 2 would be released before I decided to post this article, but alas, I think I’ll just need to write a follow-up in a few months to account for those changes.

4. A Complete Workflow

So now, with the basics out of the way, let’s say I wanted to do some work in a Jupyter Notebook and then push to my GitHub. I would

  1. open up Ubuntu Bash Shell in Windows Terminal and git pull for any updates
  2. open up Anaconda Prompt in Windows Terminal and spin up Jupyter Notebooks
  3. work on the notebook (in a Chrome tab)
  4. save the notebook
  5. go back to the Ubuntu tab, git add, commit, and push the directory.

I hope this gives a clear picture of the functional toolkit I use for data science, and that any Windows-skeptics reading have a better understanding of how easy it is to develop on Windows 10.

--

--