What makes virtualenv “virtual”?

Juan Gutierrez
Ordergroove Engineering
7 min readJul 2, 2018

This post assumes the reader has a basic working knowledge of python, python virtual environments and linux environments. Most linux distributions will follow with the examples provided, Macs may have some deviations, and Windows paths will diverge greatly.

When I first started working on python projects, one of the first things I was told was “always use a virtual environment — it helps keeps the dependencies of your projects separate.” Dependency management is a struggle all software engineers eventually butt up against, be they missing, conflicting, or other. To hear there’s a tool out there that helps you work on multiple projects on your local <whatever_you_consider_local> is pretty damn handy.

One of the first things I thought about as I got more familiar with python and these virtual environments was “I can make another copy of repo X and start testing the newer versions of dependency Y!” A concrete point of independence and freedom not necessarily immediately obvious, setting aside the pesky bother of “time” to iron out all the kinks of upgrading. It was still nice to know that I could. But, like many things in software, I took virtual environments for granted. I took it at face value that they kept my dependencies separate and I didn’t give much thought or care as to how. Why should I care how? It just does! It manages things for me which then gives me the right to not have to think about “how” — right? Sure — but then this would be a pretty short post.

In a nutshell when the environment is activated, the PATH environment variable gets updated with your virtual environment’s bin path:

VIRTUAL_ENV=”/path/to/virtual-env”
export VIRTUAL_ENV

PATH=”$VIRTUAL_ENV/bin:$PATH”
export PATH

This allows for any executable (i.e. python or an executable your dependency may provide, i.e celery, coverage) to be available and more importantly, take precedence over anything you may have in /usr/bin, /usr/local/bin or whatever else you may have defined in your PATH variable.

But python is a scripting language. The entire python core library isn’t available in the python executable you use. At the OS level, package-manager- installed flavor of python, you’ll typically find the core python packages and modules in /usr/lib/pythonX.Y (or /usr/local/lib/pythonX.Y if you compiled it from source) where X.Y is your relevant python’s major.minor version number references.

My local system’s core python2.7 packages

So — if that’s where all the core libraries are…how does the virtual environment gain access to them if we’ve supposedly cut ourselves off from any outside influence? Well…we’ve only really MOSTLY cut ourselves off from outside influence. I mean, we still need the support of the core packages for crying out loud!

If you’ve gone poking around your file system, you may have noticed that at the operating system level, the python “goodies” are located in a bin and lib directory at the same level: /usr by default or /usr/local in my case. To be explicit: /usr/bin/pythonX.Y and /usr/lib/pythonX.Y. virtualenv being a python utility and the community always refering to the “Pythonic way,” virtualenv follows a similar folder structure: in the same folder where you find your virtual environment’s bin folder, you’ll also find a lib folder. It contains a pythonX.Y folder and within that…kind of the same things?

One of my virtual environment’s python2.7 lib folders

That’s a lot of links!virtualenv virtualizes paths for you in such a way that you’re no longer pointing at the operating system packages…sort of. It’s actually linking back up to the bare minimum of what it needs in order to run. Notice how site.py and site-packages don’t have links? Take a mental note — we’ll get back to that.

So why does this linking work? It actually has to do with how python generally works when looking for packages: the executable looks for packages in lib/pythonX.Y relative to the executable path!

/path/to/env/bin/lib/pythonX.Y
/path/to/env/lib/pythonX.Y
...
/usr/lib/pythonX.Y

This is not an exhaustive list, but it gives you the idea. Remember how I mentioned above that site.py isn’t actually linking back up? That’s because site.py is what handles discovery of the site-packages module A.K.A. the directory in which your dependencies get installed. Here’s an excerpt from the site.py docstring:

This will append site-specific paths to the module search path.  On
Unix, it starts with sys.prefix and sys.exec_prefix (if different) and appends lib/python<version>/site-packages as well as lib/site-python. It also supports the Debian convention of
lib/python<version>/dist-packages.

If it linked back to the global path, then, its relative module path would be looking for site-packages in the global path as well, and…well…that would kind of defeat the purpose of isolating dependencies. “No link” is a must for site.py.

Side note: once upon a time ago, virtualenv would by default link back up to the “global” site-packages directory and you would need to explicitly pass in the --no-site-packages parameter to tell it “don’t use the ‘global’ site packages directory.” This parameter is now being deprecated in more recent versions, and this has become the default behavior. You can pass in the --system-site-packages parameter to gain access to the system level site-package folder.

So — virtualenv works by linking up to the core packages of your python version, gaining access to the basics, and then isolating your dependencies into a folder within your environment’s folder structure. In fact…all of this is exactly how the operating system level packages are structured. Notice how the docstring excerpt doesn’t have any hardcoded paths — it just generically references lib.

juan@zebra:/usr/local/lib/python2.7$ ll site*
-rw-r--r-- 1 root staff 19609 Jan 6 2017 site.py

Everybody needs to know where their site-packages are located.

Let’s take this one step further…

This is pretty handy if you’re working on lots of different projects and they’re all working on the same version of python. But what happens if you want to work on projects that may support 2.7.6, 2.7.14, 3.3, 3.4, 3.5.1?

MIND…BLOWN

Enter pyenv. pyenv allows you to install and use different versions of python in parallel, without having to go through the pain of compiling it all from source yourself. Magic! Now, combine that with the -p /path/to/python/executable (or --python=/path/to/python/executable) provided by virtualenv and you can create virtual environments on your local with different versions of python. Extrapolating my earlier epiphany of testing a package with newer versions of a dependency: “I can make yet another copy of repo X and start testing newer versions of python against it!” See “mind blown” image above one more time…

Packaging Deployments

Our team is currently working towards the panacea that is “containers” but we have quite a number of systems already in play that will take time to migrate over to this veritable Garden of Eden we’ve all heard so much about. Just because you can’t use containers, doesn’t mean ALL of your applications should be limited to running a single version of python though…right? There must be a way we can use this knowledge of what virtual environments are doing and come up with some kind of reasonable interim solution. Here’s the best we’ve come up with so far.

Remember how virtualenv creates links to the global operating system libraries? When using a python executable provided by pyenv, you’ll create links to the python libraries located in the pyenv path, for example:

What’s to stop us from, instead of using links, copying all the contents from the source lib/pythonX.Y folder into our virtual environment’s target lib/pythonX.Y folder? The answer in fact, is nothing. I get it — virtualenv is being kind and polite by reusing already existing contents and not abusing the file system by constantly copying the same things over and over again. They don’t change — so why would you need to keep making copies? Well, in the case of deployments, there’s an argument to be made that all things of the application (your code, bin, lib, site-packages, etc.) should all be contained and shipped to a server, ready to go and that argument might go something like this:

It’s nice to be able to ship v1.tar.gz as a rollback of v2.tar.gz if your newest version accidentally cripples your application and if that tarball already has all your python core packages and all the components it needs to run in a self-contained way…then your rollback can be as easy as a drag-and-drop replacement of your application version + the necessary service restarts. (WARNING: I’m fully aware this is a gross oversimplification that does not take into account the very common complexities of database migrations, caching dependencies, and any other external mechanisms your application may control or relay on)

But…I digress…

Point is — you can! We’ve started experimenting with this without issue as we deploy smaller, more independent applications that make use of certain goodies in newer versions of python.

--

--