What the *$@% is Docker for the Beginner Programmer (Part 1): Virtual Environments

Introduction
Containers have been all the buzz in the past couple of years. A good friend once quipped that ‘you know something is actually good when even the IT guys say it’s good’. Apparently the IT guys are raving over Docker and I guess, so should you! But — what exactly is this stupid ‘container’ thing, and why does it matter?
A big motivator for this post is that there exist many guides to Docker, but all of them get so buried under the assumption that the audience are experienced industry software developers. I found that as a college student, most of us in the CS program could code well, but had little understanding of build dependencies and shipping to production, which means that it was incredibly difficult to actually understand Docker and its advantages.
Hopefully this guide will help you and demystify all these terms in a non-jargony way! There will be no big scary words like HYPERVISOR or DAEMON PROCESSES in this guide. Instead, I am going to step from the bottom-up and help the beginner developer understand what Docker can do for you. In other words, this is the guide I wish I had when I was starting to try and learn about Docker as a newbie.
Part 1: Virtual Environments
This particular part 1 of the guide will focus on introducing the concepts of Virtual Environments for those who are not yet familiar with them. It’s a good build-up to eventually discussing the motivations behind containerization, and if you don’t know it already, you should! Feel free to skip this guide and move on to the next part (UNDER CONSTRUCTION — Stay tuned!) if you have a good grasp of $PATH, pip, and virtualenv.
Guide Overview
- Who: Beginner developers (at the college intro programming level) — you’ve at least coded in some Java/Python before, but perhaps not at scale or in larger teams. Familiarity with Python is assumed in this guide because it is used as an example due to its widespread use.
- When: Reading this guide will take about half an hour. Grab some popcorn and get comfy!

Let’s jump right in!
Ugh, where is my Python??
Ever wondered what actually happens under the hood when you try to run a Python file? Let’s say you wrote a file called my_file.py and now you want to run it in Python. The way you do that is you go to your terminal and type in the following:
python /path/to/my_file.py
And… Magic - it works! But wait — what did that command actually do, and how did it find Python in the computer? To fully understand what happened, we can look for where Python actually is in your computer.
which python
The
whichcommand asks the terminal: which Python are we actually using? See, Python itself is really just a program that takes in a *.py file, and runs its contents as a program on your computer. In other words, at any point in time there can exist multiple instances of Python programs in your computer.

In my case, a call to which python reveals that my computer found Python in /Users/jaychia/anaconda/bin, and uses that whenever I invoke python. However, I actually have many other installations of Python available in /usr/local/bin as well! How did my computer know to use the one it found in /Users/jaychia/anaconda/bin?
An environment variable is a variable that is accessible by your current terminal window. At start up, certain variables are pre-defined, including the
$PATHvariable! To see what is currently stored in your$PATH, type inecho $PATHinto your terminal!$PATHis just a long string of folders separated by : characters, each folder specifies a location in which your shell searches for programs to run.
The way that your computer decides which Python to use is by looking in each folder specified on your$PATH. It looks for a binary (a compiled program, readable directly by your machine) called python, and selects the first one that it finds. This also happens for every other command you run on your terminal, for example ls, cd and which!
In fact, when you install a Python dependency, the dependency is installed into a folder corresponding to one of these many Pythons in your computer. This is why you might have had the (bad) fortune of encountering some of those no module named x errors. Even though you already installed x, it might have been installed relative to a different Python than the one that your computer is using.

Now you may imagine that keeping track of all these Python installations and their dependencies is going to be a pain, and it very much is! Imagine that we had 5 projects, all using different versions of Python and all requiring different Python libraries.
Key Idea: We need to maintain ISOLATED and CONSISTENT environments in which we can be sure that our code will work as intended (can find the required installations of dependencies, dependencies are the correct version etc)
That is why people have come up with solutions like virtual environments and containers. The idea is that by specifying the exact requirements for each project in advance, you can keep these requirements consistent across multiple developers in the team, and keep all of them using the same version of Python, the same installed libraries for that Python and isolated from other environments when working on that project. This eliminates versioning issues across machines. So if it works on your computer, it should work on your friend Bobby’s computer, and on the production server as well if everyone uses the same virtual environment specifications.
Ugh, module x not found??
To achieve that synchronization across multiple developers and their machines, we can use Virtual Environments (virtualenv). When you run the command virtualenv venv on your machine, (assuming you have installed pip and virtualenv) you create a new python installation, housed inside the venv folder that is created (specifically at venv/bin/python as can be seen below).

To set this python as the ‘active’ Python, we need to activate the virtual environment with source venv/bin/activate. This means that now when we type python in our command line, we are using your spanking new Python installation in venv/bin/python, and now we have access to libraries that have been installed along with it (currently none).
This is why whenever we start a new project, we just start with a clean install of Python (of a version of our choosing), and wipe the slate clean with a new virtual environment so everyone can be consistent. Then we start installing python packages with Pip with our virtual environment activated so that we keep track of exactly what dependencies we are using to run our code!
Pip is a package manager — a package manager is a program that allows you to manage installations. Pip for example, can install python libraries built by other developers, available online at the Pip registry (https://pypi.org/), into your currently activated Python!
To let other developers know what to install before running your awesome my_file.py, you can run pip freeze > requirements.txt which copies all your current library installations and pastes it into a file called requirements.txt. Other developers can then similarly install those requirements into their system by running pip install -r requirements.txt! Here is a walkthrough with pip:

Recap
That’s it for part 1!
As a recap, we covered some of the motivations behind Virtual Environments.
- We should maintain an isolated and reproducible environment for each of our projects so that code can work on your machine, Bob’s machine and the production machine
- Virtual Environments help us specify and isolate language-specific (in this case, Python) dependencies
- The workflow for a new project usually looks as such

- Create a new virtual environment
- Activate it
- Install your required dependencies
- Freeze your dependencies into a list and paste it into a requirements.txt file
- Share your requirements.txt file with collaborators so that they can install exactly what you installed into their virtual environments as well
In the next guide, we will dive into Docker and containers proper.
