Things I wish Pip learned from Npm
or — a rant on how to make python package installation management work for the rest of us.
I’m a python hacker. I dabble in other languages from time to time (mainly javascript) but I really prefer python, at least for the server side. But.. I think python package installation isn’t adequate to modern development workflow.
Note: this post isn’t about the python packaging wars — distutils vs setuptools, eggs vs wheels, and the rest of the package publishers’ problems. It’s about the package user (or the app developer) perspective and about better ways Pip can serve it.
NPM
npm stands for Node Package Management and it does exactly that (and very well). In many ways It’s similar to pip (and to ruby gem). Using Npm painfully reminds me everytime about a couple of major features still missing from pip. So here is my Npm inspired Pip wishlist:
Save a requirement on Installation
npm makes this trivial:
$ npm install somePackage —save
And behold: npm both installs the desired package and adds it to package.json (npm uses the package config file as a sort of “requirements.txt” file).
Look at the python equivalent:
$ pip install some_package
$ pip freeze >> requirements.txt
And guess what happens after I installed another package?
$ pip install another_package
$ pip freeze | grep another_package >> requirements.txt
You may ask So what? 2 lines instead of one? this is what you are whining about? And the answer is “yes and no” . Yes I would prefer a one liner and no, this isn’t what I’m complaining about. The problem with this is that using pip freeze like that is an anti pattern, because of the way pip handles package installations.
Say I want to install some_package. that currently has a dependency on another_package. Pip would go through every package I install and then run the package setup.py. If the setup file contains its own dependencies, they would be installed too in this stage. So when I use pip freeze to create the requirements file I would expect this output inside:
some_package==2.1.0
But what I would really get is:
some_package==2.1.0
another_package_==0.2
And my requirements file just became polluted with dependencies I don’t really care about since they aren’t really my dependencies! they are my dependency’s dependencies, which could change someday and leave me with an unneeded lib in my requirements file (I won’t even know it’s unneeded, without using some specialized tool). If, maybe years after being removed from my dependency dependencies the “another_package” package is removed from Pypi, it would break my installation process for no reason what so ever. there are use cases for using pip freeze (npm close but not equivalent is shrinkwarp) but most of the time, especially when in active development it’s an anti pattern.
So after making the case that pip freeze is broken, I’m left with manually adding the dependency in the requirements.txt file. npm 1 — pip 0 ☹
Different dependencies for deployment and development.
It’s a common pattern: I install lots of helper libs to aid with testing and project setup. things I don’t really need in production (where I want to keep the dependencies list as thin as possible) think (from the top of my head): factory boy for object fixture creation, mock and pretend for mocking and stubbing, sure for assertions, nose or pytest for test runner, coverage.py for coverage, lettuce for bdd, django-extensions, django-debug-toolbar for easier django development and profiling, splinter or webtest for end to end testing from the browser, fabric (usually with fabtools and cuisine) to handle deployments and on we go.
npm makes handling this problem easy:
# Installing
$ npm install lettuce —save-dev# this command would save the dependency in the “dev-dependencies” section of the package.json
# in testing or development
$ npm install# would install everything
#while in production use:
$ npm install —production
and npm would install only dependencies under “dependencies” and won’t install lettuce, saving time, and potential security holes.
While in python I would have to maintain two different requirements.txt files! That sucks! Even when not taking into consideration the broken nature of pip freeze. Again a clear win for npm.
Separating installed dependencies from global state
I admit that while this is a major issue, there isn’t a clear winner here, certainly not as decisive as the previous two issues.
I find pip + virtualenv + virtualenvwrapper to be a superior solution to npm “just put everything in a local node_modules folder” approach. I like the separation of environment and project code that virtualenv+virtualenvwrapper provides. A separation allowing me to reuse the same installed environment; or use commands for this environment while I’m in a completely different folder of my filesystem.
But.. this doesn’t come out of the box, you need to install the extra tooling and remember to use it (activate). Even experienced python developers can be caught puzzled for a moment when sshing to a server and trying to run a python shell script without pre activating the venv.
I know python 3.3 has some sort of virtualenv baked in and this would be a huge step forward.. But it still misses the easiness provided with virtualenvwrapper (Besides, most of us use python 2.7, and this would probably be the situation for quite some time with 2.7 in prolonged maintenance mode until at least 2020).
The basic broken issue with pip here, is that pip’s choice of defaulting to a global installation is wrong; the default should be a local venv installation. Global installation (needed only in rare case — where we install tools and not libs) should be explicitly asked for.
npm handles that correctly:
$ npm install someLib # does a local install in a local node_modules folder
$ npm install someLib -g # Installed globally
With pip it’s far too easy to destroy some global tool setup or version, polluting the global namespace, while running a pip install without activating a virtualenv first. In our modern developer world of multiple projects, with lots of dependencies (Since we are all “dwarfs standing on the shoulders of giants”) this is simply not acceptable.
Dependency hell
A short exercise: shoot up a virtualenv and
$ pip install splinter ( a python selenium library)
# check pip freeze for selenium (which was installed with splinter):$ pip freeze | grep selenium
# Output:$ Selenium==2.41.o
Now lets install another package that depends on selenium:
$ pip install huxley
$ pip freeze | grep selenium
# output:
$ Selenium==2.35.0
Pip Just downgraded a package version without even asking for permission! Maybe splinter would be broken now? maybe I’ll lose some core functionality? It wouldn't matter even If I installed selenium as a “first level dependency”, Pip would handle it the same. There should be a
Broken
sign all over this behavior.
npm wouldn’t fall into this pit because each package has Its own dependency tree, even the (still) fledgling Bower, the frontend package manager would warn me and ask for guidance. But not pip ☹
npm isn’t flawless
I don’t think npm is perfect, it has its own problems. My greatest pain with npm is the nested folders created inside node_modules.
You see, npm creates a node_module for every package installed with the original npm install command. In this nested node_module folder it installs the package dependencies, which can also have dependencies and so on. With more and more folder depth, this can (and would) get ugly. Since npm installs dependencies for each package separately, it can actually install a specific someDep dependency quite a lot of times, and with different versions, all in different nested folders of the same root.
Now guess what happens where somewhere along the way you get some error message about someDep? Good luck trying to find the specific troublesome someDep installed in different versions and buried deep down some nested folder…
While Python philosophy of:
flat is better then nested
protects us from nested folder hell. every dependency is installed in the site-packages folder (the global or venv specific) and can be easily located there.
The future
The python community (Probably the best programming community ☺)has taken some steps forward to engage with python packaging and dependency handling problems. Getting virtualenv (in 3.3) and pip (in brand new “ensurepip” module in 3.4) into the standard library seems to be a step in the right direction. It also makes clear that pip is the dependency management tool to stay, and won’t be replaced in the near future. Now it’s time to move forward from the python packaging wars and fix Pip. And yes, I mean a python 2.7 new pip port as well of a 3.x version.
npm could be an example of the path ahead.