How much third-party Java dependencies do you really need ?

Benoit Baudry
3 min readFeb 13, 2020

--

Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application’s code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challenges related to dependency management. In this article, we focus on one such challenge: the emergence of bloated dependencies.

Your project depends on a large number of third-party libraries : all the libraries that you declare as dependencies in the build file of your project, as well as all the libraries that these dependencies need to build. We consider one of these libraries to be a bloated dependency if the build tool can ignore it and still successfully build and run your project.

Depclean generates an alternative build file that gets rid of bloated dependencies in Maven projects

We have built a tool, DepClean, that automatically identifies bloated dependencies for any project that compiles to Java bytecode. DepClean also generates an alternative build file that gets rid of bloated dependencies. DepClean is built on top of the Maven dependency:analyzer to statically analyze the bytecode of the project, as well as all its direct and transitive dependencies. It is currently integrated within a Maven build pipeline.

We have used DepClean to analyze the presence of bloated dependencies in 9639 Java artifacts hosted on Maven Central. A vast majority of these artifacts have between 1 and 23 direct dependencies and 0 and 59 transitive dependencies. Some extreme cases include hundreds of dependencies. We analyze a total of 723444 dependency relationships. The figure below represents these dependencies, red are bloated, green are used. The nodes represent Maven artifacts, the size of the nodes is proportional to their number of usages.

The state of bloated dependencies for 9639 Maven artifacts. This graph includes 19139 nodes in total, each node is a Maven artifact, 9639 are the analyzed artifacts. Red edges represent a bloated dependency, while green edges represent a necessary dependency. The size of the nodes is proportional to their number of usages.

75% of the analyzed dependency relationships are bloated

Our key results are

  • 75% of the analyzed dependency relationships are bloated
  • 36% of the artifacts have at least one of bloated direct dependency
  • 86% of the artifacts share at least one of their transitive dependency that is bloated
  • Multi-module Maven projects tend to accumulate bloat in parent build files, which makes cleaning more challenging

Now, the question is: do these bloated dependencies matter for software developers? To get a qualitative sense about the importance of bloated dependencies, we selected notable open source projects, which we could build locally, with more than 100 stars on Github and with some bloated dependencies. For each of them, we submitted a pull request to modify the build file and remove some dependencies. We submitted 15 pull requests to remove direct dependencies, 14 of them we accepted and merged, removing a total of 68 dependencies. We submitted 8 pull requests to ignore transitive dependencies, but only 4 were accepted. The lesson learned here is that it is more difficult to understand the impact of transitive dependencies and of maintaining a list of excluded dependencies.

19/23 open source projects accepted the debloated build file generated by DepClean

With this study, we have revealed huge opportunities to reduce the number of dependencies in software projects. This can reduce the effort to maintain dependencies, the size of the binary files, the security and licensing challenges implied by large dependency graphs. We also observed that the decision to remove bloated dependencies can be challengings, especially for transitive dependencies.

DepClean is open source and more details about this study can be found here.

References

A Comprehensive Study of Bloated Dependencies inthe Maven Ecosystem.

Surviving software dependencies.

Collecting and leveraging a benchmark of build system clones to aid in quality assessments

--

--