The long journey of making PHP’s Composer memory-efficient and fast

Yanick Witschi
21 min readJan 26, 2022

--

I’ve been meaning to write this blog post for quite some time, but for us developers, writing code is often simply easier than typing a blog post. And so we put it off. First a few days, then a few weeks… you know the drill. But this blog post is really important to me. I’ve been working on making Composer faster and more memory-efficient for years and without this blog post, somehow this journey would not have felt like it was complete. I kind of had to put an official end to it. Also, I’ve been asked by a few people to share some insights into Composer.

So grab a cup of coffee or tea because for this one, I need to really reach back in time as my quest to improve performance in Composer started back in 2014.

Getting familiar with Composer internals

The reason I got interested in making Composer more efficient is a topic for yet another blog post but basically, in 2014 when I started to really use Composer a lot, a single composer update took me on average about 2 GB of RAM and it almost ran for a minute. At the beginning I would read through the Composer issues saying that "dependency resolution in Composer is complex" and thus needs these huge amounts of resources and that "you cannot compare NPM to Composer" (you really can't, more on that in a second). So at first, I thought that's just what it is and for a few weeks or even months, I would not bother much and just accept things the way they were. Like probably many of you did, too.

But I have always been a rather curious person. You know, the type of guy that wants to know why the sky is blue and then “Rayleigh scattering” is not enough and now there are a bunch of books on quantum physics on my bookshelf (and now I basically don’t understand anything in this world anymore but that too, is for yet another story).

Anyway, I began to spend the coming months and years getting to know the core of Composer, and I’ll try to share some insights with you which I’ve gained over the years.

Package managers are a dime a dozen. They exist in all kinds of programming languages and for all platforms. As PHP developers we all know Composer of course and most of you might also be familiar with “NPM” or “yarn”. For MacOS there is “homebrew”, for Debian “APT”, for Windows “Chocolatey” and so on and so forth — the list is virtually endless.

Compared to other programming languages, however, PHP is special in the way that in a PHP process, a class can exist only once. As a result, the same package cannot be loaded in different versions. Quite the opposite to JavaScript for example. This fact alone makes the dependency resolution in Composer significantly more complex than for JavaScript. In PHP, if a package pizza/margherita requires toppings/mozzarella in version 5 and pizza/napoli requires toppings/mozarella in version 6, then this must result in an error by definition. Both toppings/mozzarella packages cannot co-exist in both versions because they might offer the same class with different method signatures or even different methods altogether.

So the core task of Composer is to answer the following question:

Is there a package combination that satisfies all dependency requirements of all the packages?

If you express all package requirements in a single boolean expression and represent the installation of each version of a package as a boolean variable (install is true, remove is false), then the question becomes: Is there a set of true/false settings for all variables which makes the entire expression true? This problem is known in mathematics as the "Boolean satisfiability problem" and is solved with a so-called "SAT solver". "Boolean satisfiability" is also an "NP-complete" problem, which in simple terms means something along the lines of "it probably cannot be solved efficiently". If you are interested in the theory behind it, try to start at Wikipedia but it's absolutely not required to understand the rest of this blog post. I mentioned it only, so you know what I'm talking about when speaking about the "SAT solver".

If you feel like you haven't understood a word so far, don't worry, there are plenty of examples to follow.

Dependency Resolution in Composer

Dependency resolution in Composer works in such a way that for all require and conflict definitions of all the packages, the corresponding boolean satisfiability rules are built and finally passed on to the SAT solver. For further understanding we need to clarify a few classes within Composer and their respective responsibilities:

  • Package - represents a package in a specific version (e.g. pizza/margherita in 1.0.0)
  • Repository - represents a source (such as packagist.org or https://github.com/my/pkg, etc.) that provides Package instances
  • Pool - represents the collection of all relevant Package instances coming from all sources
  • Rule - represents a satisfiability rule for the solver (e.g. "pizza/margherita in 1.0.0 requires toppings/mozzarella in 3.0.0")
  • Solver - the SAT solver of Composer (which is by the way based on openSUSE's Libzypp)

The more package versions (Package) and more complex the dependencies, the more rules (Rule). And with each new version of a package published, these rules increase exponentially, since every possible combination must be considered. So we are not talking about a few thousand rules. Depending on the complexity of your project, we are talking about a few million!

These millions of rules are crucial for the memory consumption of Composer. Basically the Pool is passed on to the Solver, which then builds a huge array of rules (called RuleSet) and then starts to look for a solution. This is the point at which we reach absolute memory usage peak!

Why am I telling you all this? Because you probably feel the same way as I felt back a few years ago: You immediately think that there must be a way to store or represent these rules more efficiently or to make the solver itself more efficient.

So I started with rather simple pull requests, which in the end were only micro-optimizations. One day I tried to make function calls more efficient, then the other day I wanted to cache certain objects. I went as far as to try using the PHP extension ds in order to make the RuleSet storage more memory efficient and in 2018 I even wrote my own SAT solver in a few sleepless nights. That one, however, I've never published because it was not especially efficient.

So I remained relatively unsuccessful, because the real problem was not the Solver and also not whether the rules were efficiently stored or not. The solution was to try not to create millions of rules at all, but to optimize the Pool in such a way that it already contained significantly fewer Package instances before passing them on to the Solver, thus reducing the number of possible combinations and rules.

In other words, we’re going to largely ignore the Solver for the rest of this blog post and focus mostly on the Pool (spoiler: Nils found another smart way of reducing the number of rules without touching the Pool but we'll get to that too).

For the following sections, we need an example that we can use to understand the improvements that have been introduced over the different versions of Composer. We therefore define two packages, pizza/margherita and toppings/mozzarella and the following 10 versions (Package instances) shall exist in our Repository instances:

  1. pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/margherita in 1.0.1, requires toppings/mozzarella in 3.*
  3. pizza/margherita in 1.0.2, requires toppings/mozzarella in 3.*
  4. toppings/mozzarella in 1.0.0
  5. toppings/mozzarella in 1.0.1
  6. toppings/mozzarella in 2.0.0
  7. toppings/mozzarella in 3.0.0
  8. toppings/mozzarella in 3.0.1
  9. toppings/mozzarella in 3.0.2
  10. toppings/mozzarella in 3.0.3

I’m using pizza examples here because you’ll notice that it’s so much easier for you to remember and follow my explanations than if I had used something more general like package-a, package-b etc. It goes without saying that all of the pizza packages conflict with toppings/pineapple in every version, so I did not bother listing that extra for every package!

In the root composer.json of our project we require pizza/margherita in 1.* like so:

{
"require": {
"pizza/margherita": "1.*"
}
}

The Pool and Rule entanglement in Composer 1.x

Composer 1.x suffered from the crucial design issue that the Pool was not a self-contained, finite set of Package instances, but also knew about the Repository instances. Essentially that meant that when the Solver started to build all the Rule instances to start solving, the process was more or less something like this:

  1. Solver initiates building the RuleSet (all the Rule instances)
  2. Looping over all root composer.json require definitions, asking the Pool what Package instances exist for given constraints
  3. Building the rules for those packages
  4. Looping over all newly found Package instances, asking the Pool for their respective require and constraints
  5. Building the rules for those packages
  6. etc.

The same process of course also happened for all the conflict definitions, but we ignore those here for the sake of making this already pretty complex blog post a bit more readable.

In pseudocode, it looked something like this:

As you might see, there is no clear separation between building the Pool of relevant Package instances and the Rule instances for the Solver. The rules are built at the same time as we fetch all the relevant Package instances.

The idea behind this process was to make loading of the relevant package information “lazy”. An unnecessary complication, as the amount of data required to resolve dependencies did not reduce with the calculations done prior to loading additional packages. So overall execution time was not improved by lazily loading data, it really only mangled loading and dependency resolution. This made it impossible in Composer 1.x to optimize the Pool contents before creating all the Rule instances based on the pool.

Why impossible, you ask? Impossible because separating pool creation would have required changing the plugin API. Plugins in Composer 1.x could access the Pool via events and thus dynamically add repositories. Changing this logic would have been a break of backwards compatibility that required a new major version according to Semantic Versioning. We thus needed version 2.0 of Composer in order to improve the situation.

So in Composer 1.x, all 10 Package instances in our example were considered and thus we generated Rule instances for all of them, resulting in millions of rules for complex projects, making things slow and memory-greedy.

Composer 2.0 — here come the MultiConflictRule and PoolBuilder!

I mentioned that Nils had found a very clever solution to reduce the number of rules without even touching the Pool. I also said that we're going to largely ignore the Solver in this blog post but in order to understand his improvement, we need to look at the SAT solver a tiny bit more after all.

So we know that it works using Boolean algebra expressions and that every Rule within the Solver represents such an expression. In SAT solving terms, this "expression" is what's called a "clause" and - in our case - every package/version combination is called a "literal". It sounds a lot more complicated than it really is. Let's come back to our example and see, what rules Composer would actually create if we ran our example we've been working with all this time. Here it is again:

  1. pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/margherita in 1.0.1, requires toppings/mozzarella in 3.*
  3. pizza/margherita in 1.0.2, requires toppings/mozzarella in 3.*
  4. toppings/mozzarella in 1.0.0
  5. toppings/mozzarella in 1.0.1
  6. toppings/mozzarella in 2.0.0
  7. toppings/mozzarella in 3.0.0
  8. toppings/mozzarella in 3.0.1
  9. toppings/mozzarella in 3.0.2
  10. toppings/mozzarella in 3.0.3

In the root composer.json of our project we require pizza/margherita in 1.* like so:

{
"require": {
"pizza/margherita": "1.*"
}
}

In Boolean algebra we would use Greek letters and notations like ¬ for NOT, for AND and for OR. But as we're PHP developers and we're talking about Composer here, we'll just use the operators we can use in PHP. Instead of greek letters, we use integers (the Pool hands out an incrementing one for every Package), && and ||. For NOT we simply negate the assigned Package integer. So let's come back to our example and see what clauses (Rule instances) Composer would create for us:

  1. (-1 || 7 || 8 || 9 || 10)
  2. (-2 || 7 || 8 || 9 || 10)
  3. (-3 || 7 || 8 || 9 || 10)

Seeing that all three require statements generate a clause with a negated integer might seem confusing at first. After all we want to "require" something so why would the clauses say -1, -2 and -3? Why don't we use this:

  1. (1 && (7 || 8 || 9 || 10))
  2. (2 && (7 || 8 || 9 || 10))
  3. (3 && (7 || 8 || 9 || 10))

Ha! More logical to our brains but remember, all clauses are eventually merged into one big AND clause and only the ones that resolve to true are possible solutions for the SAT solver. The clauses with a positive integer, however, can never result in a solution because it would mean we want them all 1, 2 and 3 installed. That's not true, though. We only want one out of them plus their dependencies. So this is how you have to read the negated statements:

Either don’t install the package in this version, or also install one of the package versions which fulfill this requirement.

Now that we know how a require constraint is represented with Boolean algebra, let's look at how conflict rules are represented. As we don't have one in our example, we'll simply make one up by checking what would happen if pizza/margherita in 1.0.0 did not require toppings/mozzarella in 3.* but instead conflict with it. The rules then generated would be the following:

  1. (-1 || -7)
  2. (-1 || -8)
  3. (-1 || -9)
  4. (-1 || -10)

In the case of a conflict, the rule can be translated like so:

Either don’t install this package or don’t install the other one, but never install both at the same time.

And here is where you should see the difference between require and conflict! A require statement ends up in one rule. A conflict statement, however, can only ever be represented as a 2 literal rule. You cannot combine them in one clause like (-1 || -7 || -8 || -9 || -10) because that would mean that e.g. 7, can be installed, as long as 8, 9 or 10 are not installed. But that would be wrong because it conflicts with 1!

The details of Boolean algebra are not important, but it’s important that you understand that conflict Rule instances can only ever be represented by 2 literals and thus, are largely responsible for the huge amount of rules.

Now, did I mention how we ensure that Composer does not actually install multiple versions of the same package? So far we were only ever talking about how one package requires another one or conflicts with one. But what part of the Solver is responsible for ensuring that we don't end up with a solution where we have pizza/margherita in version 1.0.0 as well as 1.0.1 installed?

You guessed right, the Solver adds conflict rules! Lots of conflict rules. Given our three versions 1.0.0, 1.0.1 and 1.0.2 of pizza/margeritha, we would end up with 3 rules:

  1. (-1 || -2)
  2. (-1 || -3)
  3. (-2 || -3)

That doesn’t look too bad but for the ones amongst you familiar with combinatorics, that’s 𝐶(𝑛,𝑟) whereas 𝑟 is always 2 here. Or the nCr button on your good old pocket calculator. Remember those?

Let’s take a popular PHP library like symfony/http-foundation. At the moment of writing this blog post, this package exists in well over 500 versions! If we just calculate the number of possible combinations for 500 versions of this very package, that already results in 124,750 conflict rules. A gigantic number!

But we cannot do anything about that, can we? We’ve just learnt that conflict rules can only ever be represented as a 2 literal rule. However, something about the conflict rules to prevent the same package being installed in multiple versions is different. Can you spot it?

Yes! They are self-referencing! They do not conflict with another package but always with another version of itself!

This makes them special because we know that all these versions conflict with one another and it’s impossible for a version to conflict with just one of them but not any of the others! And so Nils came up with the MultiConflictRule, where he managed to represent all these rules with a single instance of MultiConflictRule per package. The new MultiConflictRule was now treated by the Solver as if it contained all the individual 2 literal conflict rules, but it was stored as a single MultiConflictRule. As you might tell from the numbers we've calculated before, a massive improvement!

Talking about reducing 124,750 rules to just a single one, you might ask yourself: Why did we even generate rules for all the package versions that existed? Surely, nobody required any version 2.* of symfony/http-foundation anymore by 2022 so why even load and generate rules for them?

Right, that’s where I joined the party and so I invite you to join me on the next deep-dive for yet another optimization.

The foundation for even more optimizations was laid by Nils too. He introduced the PoolBuilder. With that change, only the PoolBuilder now knew about all Repository instances and was thus responsible for building a finite Pool which now contained all Package instances referenced anywhere in the dependency tree. So the Pool was now a simple data container object. Your everyday collection class which looked like this in pseudocode:

However, the new PoolBuilder did not include any optimizations yet. In the previous section about the MultiConflictRule I've used symfony/http-foundation as a real-world example but for this part, we're coming back to our initial example to see how it evolved over the different Composer versions:

  1. pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/margherita in 1.0.1, requires toppings/mozzarella in 3.*
  3. pizza/margherita in 1.0.2, requires toppings/mozzarella in 3.*
  4. toppings/mozzarella in 1.0.0
  5. toppings/mozzarella in 1.0.1
  6. toppings/mozzarella in 2.0.0
  7. toppings/mozzarella in 3.0.0
  8. toppings/mozzarella in 3.0.1
  9. toppings/mozzarella in 3.0.2
  10. toppings/mozzarella in 3.0.3

So in this example, even though we had a PoolBuilder now, still all 10 Package instances would end up getting their Rule instances built for in the Solver.

The most obvious optimization would be to not even load any version of toppings/mozzarella that does not satisfy the constraint 3.* into the Pool as no other package requires it. That would allow us to remove packages 4) to 6).

This was my first significant pull request for Composer and it is probably the second most important performance improvement in regard to the dependency resolution process in Composer 2.0, coming right after the MultiConflictRule.

For this pull request, however, it was necessary to be able to determine whether a certain version constraint was already contained in another one. The reason for this, can be best illustrated with yet another example of the Pool building process:

  1. In our root composer.json we require pizza/margherita in 1.*
  2. The PoolBuilder asks the repositories for all packages matching pizza/margherita in 1.*
  3. Loading of dependencies of dependencies (which we call “transitive dependencies”) continues
  4. At some point, we encounter a requirement on pizza/margherita but this time it's ^1.1. At this point, the PoolBuilder needs to be able to tell, if ^1.1 is a subset of 1.* (which it is, yes) in order to not load any more data and continue.
  5. At some point, we encounter another requirement on pizza/margherita but this time it's ^2.4.6. At this point again, the PoolBuilder needs to be able to tell, if ^2.4.6 is a subset of 1.* (which it is not this time, no). So it has to extend the constraint of "already loaded packages" to 1.* || ^2.4.6 and load the remaining packages from the repositories.
  6. etc.

These version constraint subset logic basics were worked out by Jordi in another pull request for composer/semver. A very tedious task without which this optimization could've never been implemented.

Coming back to our initial example, we now have a really big optimization already! We are now left with 7 Package instances for which we'll generate Rule instances for the Solver:

  1. pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/margherita in 1.0.1, requires toppings/mozzarella in 3.*
  3. pizza/margherita in 1.0.2, requires toppings/mozzarella in 3.*
  4. [removed] toppings/mozzarella in 1.0.0
  5. [removed] toppings/mozzarella in 1.0.1
  6. [removed] toppings/mozzarella in 2.0.0
  7. toppings/mozzarella in 3.0.0
  8. toppings/mozzarella in 3.0.1
  9. toppings/mozzarella in 3.0.2
  10. toppings/mozzarella in 3.0.3

A first step was done! Compared to Composer 1.x, Composer 2.0 now only loaded the package versions that were referenced anywhere in the dependency tree. Before, Composer simply loaded all versions of all referenced package names.

Combined with considerably less conflict rules thanks to MultiConflictRule, Composer 2.0 needed much less time and memory than Composer 1.x. Strike! I was happy!

But as always when you deal with a problem intensively, you immediately see areas which could benefit from further optimizations. Even if it means, it will be complex and will take weeks to implement…

The PoolOptimizer changes the game for Composer 2.2

Let’s look at our example again. In a perfect world we would actually like to only keep the following 2 packages in our Pool:

  1. pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/margherita in 1.0.1, requires toppings/mozzarella in 3.*
  3. [keep] pizza/margherita in 1.0.2, requires toppings/mozzarella in 3.*
  4. toppings/mozzarella in 1.0.0
  5. toppings/mozzarella in 1.0.1
  6. toppings/mozzarella in 2.0.0
  7. toppings/mozzarella in 3.0.0
  8. toppings/mozzarella in 3.0.1
  9. toppings/mozzarella in 3.0.2
  10. [keep] toppings/mozzarella in 3.0.3

Or in the case of --prefer-lowest, we'd also like to end up with only 2 packages, but it should look like this:

  1. [keep] pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/margherita in 1.0.1, requires toppings/mozzarella in 3.*
  3. pizza/margherita in 1.0.2, requires toppings/mozzarella in 3.*
  4. toppings/mozzarella in 1.0.0
  5. toppings/mozzarella in 1.0.1
  6. toppings/mozzarella in 2.0.0
  7. [keep] toppings/mozzarella in 3.0.0
  8. toppings/mozzarella in 3.0.1
  9. toppings/mozzarella in 3.0.2
  10. toppings/mozzarella in 3.0.3

I have found a solution for this problem in my second significant pull request. The logic is based on the fact that in the PHP world we are lucky that many libraries are very well maintained and their respective maintainers frequently release new bugfix versions. In an extremely large number of cases only bugs in the source code are fixed and no changes are made to the composer.json. In other words, version 1.0.0, 1.0.1, 1.0.2 etc. all specify identical dependencies. These versions are thus the ones that are extremely well suited to be optimized depending on the --prefer-lowest setting as we can already pick 1.0.0 or 1.0.2. If they specify the exact same dependencies, we don't need all of them as only one is relevant for the dependency resolving process.

Determining whether a package has the identical dependencies is relatively trivial (pseudocode, it is a bit more complicated than that, but you get the point):

PS: Do not get confused by the term “hash” here. It’s just a string yes, but we’re using it as an array key and so technically speaking, PHP will turn it into a hash and take care of collisions.

So at first glance it seems like a relatively easy task. Why is the pull request so large and complex then?

What I have left out completely in the previous examples for reasons of comprehensibility: In reality toppings/mozzarella would likely also contain require and conflict definitions. And their dependencies would have dependencies again etc. (remember "transitive dependencies"?). In many cases they would also specify circular dependencies. So a package further down the dependency tree which then in turn requires some package, which we have already looked at. This is kind of hard to describe in words so let's look at yet another example:

  1. pizza/gamberetti in 1.0.0, requires pizza/quattro-stagioni in 2.*
  2. pizza/quattro-stagioni in 2.0.0, requires pizza/salami in 3.*
  3. pizza/quattro-stagioni in 2.1.0, requires pizza/salami in 3.*
  4. pizza/salami in 3.0.0, requires pizza/quattro-stagioni in <2.1.0

This package combination is actually solvable: 1), 2) and 4)

But if we wanted to optimize pizza/quattro-stagioni already and delete 2.0.0 (2)) from the Pool (because there is a newer 2.1.0 with the same dependencies), we could not fulfill the requirements of pizza/salami as the pizza/quattro-stagioni in <2.1.0 was removed. This would thus lead to an "unresolvable set of dependencies".

So blindly removing packages with the identical “dependency hash” from the Pool doesn't work. We have to make sure that we only remove those for which there is no specific require or conflict statement in any other package.

The new PoolOptimizer takes advantage of the fact that the Pool already contains all packages and knows there's nothing added to it anymore. It is therefore able to iterate over all packages and analyze the require and conflict statements of all packages in order to fix this issue.

This allows us to create “dependency groups”. What? Why? Yeah, I know. It’s example time again.

The previous example was well suited to show the issue of circular dependencies. But to explain dependency groups as straight forward as possible we imagine the following situation:

  1. pizza/margherita in 1.0.0, requires toppings/mozzarella in 3.*
  2. pizza/gamberetti in 1.0.0, requires toppings/mozzarella in ^4.1
  3. pizza/quattro-stagioni in 1.0.0, requires toppings/mozzarella in ^4.2

This would result in the following groups for require:

  • toppings/mozzarella -> 3.*
  • toppings/mozzarella -> ^4.1
  • toppings/mozzarella -> ^4.2

Of course, the same has to be done for conflict and we also have to consider aliases and replace definitions. But we'll leave that out of this blog post because it's not relevant to understand the logic. It's merely a reminder why things in Composer are never easy.

Cool, now we know that we can filter by “dependency hash” within these three groups! We loop over all the packages and assign every package to all the groups it matches. That way, every package is assigned to at least one group. If a group consists of only one single package, this means we cannot remove it (for example if a dependency required an exact version). For simplicity let’s look at only one package and assume that all the versions had the identical require, conflict, replace and provide definitions and therefore the identical "dependency hash":

  • 3.0.0
  • 3.0.1
  • 3.0.2
  • 4.1.0
  • 4.2.0
  • 4.3.0
  • 4.3.1

What has to remain in the Pool? Right! We can't only keep the latest 4.3.1, because we have to take the groups into account! So what must remain in our Pool is one match per group. One for 3.*, one for ^4.1 and one for ^4.2:

  • 3.0.2
  • 4.3.1 matches both (^4.1 as well as ^4.2 )

As you can see, 4.3.1 matches 2 dependency groups so in this case we would only end up having a total of 2 packages that remain in the pool. Great! But this depends on the policy! In the case of --prefer-lowest we now need these three:

  • 3.0.0
  • 4.1.0
  • 4.2.0

So while not the simplest solution, at least there is one! We can now reliably optimize the Pool because we now only remove packages that are really irrelevant!

In practice, this means that in most cases 80% (!) and more packages can already be removed from the Pool before the Solver even has to create rules for those! This provides an exponential reduction of the number of rules and thus reduces the memory consumption massively. Of course, fewer rules also means faster resolution, and so in most cases the additional resources that the PoolOptimizer requires should easily outperform what the Solver would require if no optimization was applied at all.

You can easily check the effect the PoolOptimizer has in your project by executing the following command:

composer update --dry-run --profile -vv

The output should then look something like this:

[67.0MiB/7.60s] Found 13,004 package versions referenced in your dependency graph. 11,130 (86%) were optimized away.
[35.7MiB/7.61s] Updating dependencies
[70.1MiB/7.92s] Dependency resolution completed in 0.031 seconds
[39.7MiB/7.94s] Analyzed 1874 packages to resolve dependencies
[39.7MiB/7.94s] Analyzed 51253 rules to resolve dependencies

To compare what would happen if the PoolOptimizer was disabled, let's just disable it using the environment variable COMPOSER_POOL_OPTIMIZER:

COMPOSER_POOL_OPTIMIZER=0 composer update --dry-run --profile -vv

The output then no longer contains information about the deleted package versions and should look something like this:

[60.8MiB/2.78s] Updating dependencies
[1182.2MiB/9.28s] Dependency resolution completed in 0.720 seconds
[132.9MiB/9.79s] Analyzed 13004 packages to resolve dependencies
[132.9MiB/9.79s] Analyzed 2096660 rules to resolve dependencies

As you can see, in my example, the PoolOptimizer removed 11,130 (86%) package versions from the Pool effectively reducing the number of Rule instances that needed to be built from 2,096,660 to 51,253! That's building and solving only about 2.44% of the original number of rules!

Also compare the resource profiling numbers in the brackets at the beginning of the output. The results speak for themselves.

My pull request was finally merged into Composer 2.2 and released on December 22nd, 2021. A Christmas present for hopefully everybody but especially for me. Finally, everything I wanted to contribute to the dependency resolving process had found its way into Composer!

Can we get even faster?

One question that you might have now, after reading this blog post: Given all these optimizations, is Composer now as fast and as efficient as it can possibly get?

Well, let me remind you that what you were reading about in this blog post was all related to the dependency resolution process only. While being one of the core tasks of Composer, there are many other tasks Composer needs to perform such as downloading and installing the dependencies. But that’s where Jordi and Nils already worked hard to improve the developer experience for version 2.0 like e.g. introducing parallel downloads or parallel package installation.

There are certainly further optimization possibilities (there’s always one thing you can find) but I honestly don’t think they will have any major impact anymore. In my opinion, I would say yes, folks: It is pretty much as fast as it can possibly get! We did it!

Closing words

I cannot express how happy I am to see all those years of trying and failing finally paid off. I know that my work will impact the lives of thousands of developers and the mere thought of it makes me proud. Not only will it save developers a lot of time, but it will also save loads of resources. Given how big the PHP community is and how many composer update runs are executed every single day, I'm pretty confident it has an effect on our planet. Not enough to really make a difference of course, but we all know it needs every little help it can get. If you fancy some numbers, I recommend you read the related blog post on contao.org.

As always with Open Source Software, this is of course not just my work, so it’s time to say thank you.

First of all, contributing complex code like this needs reviews by smart people. Code reviews but also conceptual reviews. I would like to take the opportunity to thank my dear friend Martin Auswöger and Christophe Coevoet in particular, who have always shown interest in progress, provided ideas and constructive feedback along the way.

The biggest shout out, however, goes out to Nils and Jordi, of course. You should have noticed while reading this blog post that my work almost always required some kind of foundation work, without which I would not have been able to work on my pull requests. The two of them also wrote countless unit tests for special cases that I didn’t consider. Without their knowledge and constructive feedback I don’t think any of my contributions would have been possible. As if that wasn’t enough, they also agreed to proofread this very blog post. The PHP community can truly consider themselves fortunate to have them in their ranks! Many thanks Nils and Jordi! For creating Composer, maintaining it with a lot of dedication and love and for putting up with my annoying questions over the past years! Composer has always been one of the best — if not the best — package manager(s) on the planet and this is largely thanks to you!

If you enjoy the latest performance improvements or want to show your appreciation for the many hours that went into writing this blog post, consider sponsoring me on GitHub, leave me a message with your performance comparison results on Twitter and/or become a Private Packagist customer in order to support the Composer ecosystem.

Thank you!

--

--

Yanick Witschi

Open-minded Swiss dev, creating stuff mainly with PHP. Especially interested in performance and APIs. When not at work, I like sports and Scotch!