Vagga. The Higher Level Package Manager
People still keep asking me why vagga is here. I’ve written about comparing vagga to docker and vagrant. And also how it changes your workflow. Still folks think that vagga is just a hack to run containers in Rust. In this article, I show what vagga aims to be and why it’s different from other tools around.
The first important thing is that vagga is the tool for making development environments. It’s not actually a generic containerization. But it’s much more powerful than anything else for development environments.
First let’s look at what development environments are. Today we usually setup a development environment with a package manager, one of npm, pip, bundler, leiningen, maven, cargo and many others. They usually install the packages either into a subfolder of a project directory or into a home directory of a user.
In ancient times, it was okay to install packages into the system. It’s not the case now. Usually, every developer has a number of projects to work on, each with different dependencies. So it’s unwise to install dependencies of any of them into system location.
But the problem is that different projects want different package managers. If you’re frontend developer, you may have to deal with npm and leiningen. But when you switch a project, you may need to deal with pip, bundler or Maven just to be able to start the backend. Don’t say it’s easy. It is, sometimes. But as long as you find any issue you ping more experienced developer for help. That often consumes a lot of time in big projects.
During your career, you start using all package managers
Some of them you run for work. Several you may learn to find out if the new framework is good enough for next project. Few of them just to contribute to an open-source project. But you can’t be proficient with all of them because your profession is not a “package installer” you like coding more than installing the packages.
Another important thing for backend development is native dependencies. Many Python, Ruby, and Node.JS applications have some C libraries inside. They need to be compiled against the libraries that are different on Ubuntu 12.04 and CentOS 6.5 (which developers might have). In Python, there is a tool named zc.buildout to handle the issue. The tool is very python-specific and is not very popular for python developers. So most libraries ignore it and either build against system package or download an arbitrary version of C dependency.
Vagga to the Rescue
Vagga tries to unify package installation. For example to install python in Ubuntu you configure:
To install python in Alpine Linux use the following:
To compare with Dockerfile:
It’s okay, just remember do “update” and “-y”. Easy. But let’s compare a little bit. If you are CentOS user, do you know how ubuntu’s package manager is named? If you answer yes, ask yourself what package manager does Alpine Linux have? Can you install alpine packages with vagga? Yes, just type “Install” and list packages.
Do you want to install alpine? Yes, because an alpine container with “Nginx” occupies 8Mb. And empty Ubuntu container has at least 250Mb of stuff. Even if you run Ubuntu on production (so want environment as similar to production as possible), you want to have smaller containers for build tools, documentation generators, tools for tests, whatever.
In case you’re confused, vagga does not cheat; there is no mapping of package names. You just install whatever your distribution provides.
This was just to show you the unification. Most of the packages that are handled by your-favourite-language’s package manager are not in ubuntu repositories. So you had to install pythonic ones with pip (the python’s package manager):
This will automatically install python3 (vagga considers python2 and python3 different programming languages) from ubuntu packages. Download and install latest pip and install packages with pip. You can do the same in Alpine distribution too. Similarly, you install the NodeJS packages:
Similar enough, right?
Well, If you ask, it’s not going to replace “requirements.txt” or “package.json”. It’s a shortcut. You may use “requirements.txt” and “package.json” with all the same features.
Being in Control
I’m not trying to lie you that vagga will solve all the packaging problems. For example, to build C dependencies, you still have to remember to install a compiler, all the needed headers and maybe set the environment.
This is often not to help configuring the initial environment, but for getting it reproduced easily for all the team members.
Beyond the Unification
There is more than just package management. Here is what you get when you run bare “vagga” on one of the small projects of my employer:
Now just by running single command, you know:
- How to build the project
- How to compile the documentation
- How to run the project on the localhost
- How to edit and rebuild things (watch)
- How to run unit tests
- Where to look for list of dependencies (i.e. vagga.yaml)
Note, there is no “install dependencies” step at all. You don’t need to think about it. You don’t care if docs or tests require additional dependencies (if you care for some reason, just skim the vagga.yaml).
The README file is for important things, not for troubleshooting the installation
Another non-obvious thing here is that you don’t know if running the project (no. 3 above), runs a single application or the multiple ones. If you’re frontend developer, you don’t care. Just run the server and the watch and do your work. If you’re backend developer, you may take a look at vagga.yaml. But you may ignore it because fixing the bug in the service owned by you doesn’t interfere with anybody adding additional service that is started by “vagga run”.
One more thing that I’ve written about before, is that you don’t have to care when dependencies change. The images are rebuild automatically.
Still Not Convinced?
If you think that “npm install .” is a small price for dependency installation, it’s not. With “npm” (just like with almost any other package manager in 2015), you need to know when to re-run the command.
I mean if you run “git pull” you don’t know if dependencies have changed. You can run the command each time, but nobody does that. This means that sometimes you will get errors and try to debug errors that do not exist. Often your team members distract you on the issues. And sometimes you need to do “rm -rf node_modules” for some reason (it’s probably a bug that may be fixed, but we have this problem in practice on occasion).
Another issue is that you often have multiple sets of dependencies. The only case that currently covered by many package managers is “dependencies” and “devDependencies” (as called in npm). But there are more cases:
- NodeJS vs Frontend code
- Multiple sub-applications
- Migrating parts of application from one version of a library to another
- Multiple test suites (e.g. unit tests and selenium tests)
The #3 may need a little bit more explanation. But briefly: for frontend app you may migrate application page by page, but then you need a separate “package.json” and “node_modules” for old parts and new ones. The setup that’s easy with vagga but complicated with npm.
On large teams, all of the above scales badly. And the conventional techniques which are: keeping the project scope at the minimum, and keep more independent projects (should I say microservices?) which help well with the management of teams and management of bug fixes create even more problems with dependencies because you have a bigger number of projects with different dependencies in this case. And chances that you will have different technologies in different microservices. With each one having on its own set of quirks.
So vagga aims to provide the following:
- Common user experience for all the package managers and build tools
- Lower bar of contributing to a project no matter what technology does it use
- Seamless support of subprojects (with different dependencies)
And the tools to provide this are:
- Simple YAML’y config
- Proper versioning of everything installed
- Configure commands not just dependencies
- Commands are higher-level, may have multiple containers and may supervise multiple processes
The Current Status
Vagga does an excellent job for many projects. Its process management facilities are good enough. It can track list of dependencies and versions. It has built-in commands for Python and Nodejs. Everything that has no built-in commands can be installed with a couple of lines of bash. There are examples for Rust, C, Ruby, Haskell, Nim.
But we want to get this to the next level. Currently vagga versions the set of packages listed in vagga.yaml. I. e. if you want to “!NpmInstall [babel]” it have installed babel 5.x in October 2015 but in November 2015 it would install babel 6.x and get everything broken. While you could specify the version explicitly, it’s often overlooked. And it’s also tedious work for all the intermediate dependencies (well, installing babel + webpack in a simple project gets the tree of 1440 dependencies, so I would say it’s rather impossible).
The improved version is: during the first build we should generate “npm-shrinkwrap.json” and let it be committed to the git repository by the user. So every next build would use the same versions of all the packages until someone updates anything. This should, of course, work for all the supported package managers.
Vagga itself tries to be a higher-level package manager. So it needs some collaboration with every package manager to do its work well. Here is a quick round-up of what is needed:
- A way to extract plain list of dependencies from language specific file (if it’s a complicated format like setup.py or project.clj)
- A command to generate a tree of dependencies with exact versions installed (into custom file)
- Some means to update only the changed dependencies
- Simple way to override package installation and cache directory
Most of the things current package managers handle, but either have no interface for the thing or have poor support for some features. For example, most package managers always put the cache in a home directory (and even crash silently when the home doesn’t exist). Pip has “freeze” command which doesn’t work well with “git” repositories. Cargo has a “Cargo.lock” that is generated only as a side effect of the build process (and no separate install dependencies step). npm is unable to put package.json and npm-shrinkwrap.json under different names.
In the near future, I’m going to file the issues for the respective package managers to add or fix the stuff. This was partly the motivation to write the article.
I realized that vagga is more like zc.buildout (except not python-centric) only when writing this article. I mean it had all the concepts since the beginning. But my vision became clearer while writing the text. This is just a funny fact. But this realization will probably improve my roadmap in the future.
Overall, the package management is often a mess today. And I hope my work on vagga will both give the vagga users a great experience and contribute a lot to consistency between various package managers as a side effect.