Solid Engineering Tips on Maintaining Code Dependency

Emerson C Simbolon
Traveloka Engineering Blog
8 min readAug 19, 2019

Traveloka aims to help empower discovery to many people. Built with simple interface and straightforward user experience, there lies countless lines of code written by our engineers and open source libraries that we use as dependencies. Those dependencies are running in our production system and need to be managed. If not, this will leave us with many pain points that must be repaid. Before sharing the cost of code dependency, let us see the reason why many engineers behave this way.

Credit: monkeyuser.com

Code reusability has been the most advocated software engineering practice for many decades with the infamous jargon “don’t reinvent the wheel”. It encourages developers to reuse existing code, so that we don’t waste our time on recreating the same implementation, but rather focus on creating more impactful code instead. Also, code reusability is a way to prove that by adding more users to a class or a function, we add more reason on why that class important and thus, we can “boast” our decision and effort on creating such reusable and highly dependable instances.

In recent years of software development, any object has the potency to become reusable. Design pattern, framework, library, or even a piece of code from StackOverflow are known to be reused in personal and commercial projects. In an ideal world, we might be able to reuse everything, thus focus on inventing new things and have a moon base already. However, in reality, everything has a limit and a price to pay. Hence, engineers must be aware that every code that we reuse will introduce dependency, which may have consequences in the long run.

Analysis of Dependency’s Impact

Most of the time, engineers reuse codes, library, and software with inadequate regard about the consequences, which can potentially drag the velocity of software development down, should the malpractice continues. Here are the 5 dependency costs that engineers need to be aware of in order to avoid the pitfall:

Run Time Cost

A dependency will surely do a decent effort to get the job done. However, we wouldn’t know how efficient it does the job until it’s benchmarked so that we can ensure that it doesn’t run in exponential time and hog the available memory.

We also need to note that unnecessary logic will not only waste CPU time and increase billing cost for instances that run the software, but it will also potentially compromise system performance and increase service latency.

Build Time Cost

Before a codebase can be built, all dependencies will be downloaded, which requires time to download and install. The codebase then compiled in a sequential or parallel manner depending on the built environment. The more we add external libraries as code dependency, the longer both activities will complete.

Imagine that if 1 engineer does in average of 10 builds a day, whereby 1 build in average is 10 minutes longer due to unoptimized dependency, and of which there are 200 engineers, the organization, therefore will cumulatively waste in average about 2000 minutes or 33 hours, only to wait for respective codes to be built.

Disk Usage and Network Transfer Cost

More dependencies will not only increase the binary size, but it will also cost us in these 3 areas:

  • Storing old production binaries will cost more disk space.
  • Deploying binaries will take longer due to their transfer size.
  • Fetching external dependencies during built will also take longer to transfer.

If 1 service with 200 MB worth of binary size is released in average 4 times per month and there are 1000 services, then it will take about 800 GB worth of disk space a year to store the services‘ binary alone. Yes, we can delete old binaries. But the effort will never be needed if we have optimized the services’ binary size in the first place.

Maintenance Cost

Even though many of the library dependencies are not written by us, we still need to maintain the compatible version of the library, so that it doesn’t break our own binary during build time. Most of the time, the library’s version is untouched due to the fear of breaking compatibility. This version gap eventually becomes the grave of software maintenance itself. Any new feature stops being added due to old library support, it becomes hard to maintain, and eventually, the cheapest way leads to software decommissioning.

Note that constant library update and refactoring will help the project to flourish. Be aware, however, to maintain the latest dependencies’ versions that still have large maintenance’s support.

Transitive Dependency

This illustration captures the exact condition of transitive dependency that not only happens in an NPM module but also, in our day to day software development in any programming languages.

Credit: monkeyuser.com

Using one dependency will make us prone to having a transitive dependency, meaning all dependencies cost above will be propagated. We may only need to depend on 1 library. However, that single library may, in fact, depends on 3 other libraries, which will increase the runtime cost, build time cost, maintenance cost, and the disk usage cost even more.

The dependency’s impact analysis above is not trying to discourage us to use dependencies. Instead, it tries to make us aware of whether or not we should use someone else’s code and justify the associated cost that burdens the codebase. A good practice is to introduce a policy, where any addition of a dependency to a project codebase, needs to be justified during code review. The author could explain why the dependency matters, or else, the code reviewer could challenge the author, why must we introduce new code dependency for just that ‘specific’ scenario.

Tips to Optimize Code Dependency Awareness

These tips below are far from exhaustive. We hope that it will generate some discussions, which can spawn other tips that we think will also benefit the software development in our organization.

As Library Owner

Keep Library Purpose to be Specific and Size to the Minimal

Don’t publish library that does many functions. The user will never use all the functionalities, so it will be a waste of resources. A library which contains many functionalities will also tend to contain other dependencies that will make its size even bigger. Be specific in what our library offers. For example, it could be as simple as “distance computation” or “HTML to PDF generator”. By having a specific purpose, it will help us to remove excessive dependencies that leech the users' disk space and computation power.

Not Use an External Object as a Part of the Library Contract

When we publish a library contract, we must only expose object or class that resides in the library (i.e. part of the manageable code). It is important to defend our library behavior from external factors. For example, if we use objects from an external library, then it has the potential to change in the future without our knowledge. In addition, by having external objects as part of contracts, we introduce unnecessary dependency that has been mentioned in the previous section.

Inform Library Updates Proactively to Users

As Library owner, it is important to regard the users as our own customers. If a product is changed without the customer’s knowledge, they will be frustrated and will stop using it. It is the same with library users. Inform any change such as deprecation notice, deletion notice, implementation change that will cost them some CPU and memory, or new library’s features’ announcement. All of that information will be beneficial for the users to update their codebase with the new version as necessary and help them to adjust to their code properly.

As Library Users

Copy, Instead of Depend on the Original Instance

If a specific code is useful but is not made as a maintainable library, think about copying it instead. The module, where the instance belongs to, might already have a lot of dependencies that we don’t want. So don’t bother to properly declare a dependency on the instance. A copy may introduce code duplication, however, we can justify it during the code review process. Also, don’t forget to put attribution on the copied instances and pay attention to the source code’s original license whether it allows for duplication or not.

This is a quoted Wikipedia article about this copying practice:

Code reuse results in dependency on the component being reused. Rob Pike opined that “A little copying is better than a little dependency”. When he joined Google, the company was putting heavy emphasis on code reuse. He believes that Google’s codebase still suffers from results of that former policy in terms of compilation speed and maintainability.[10]

Exercise Objective Reasoning on Choosing Dependency

We may find that there are many libraries that satisfy business needs. In order to make the right dependency choice, we need to be objective and focus on what matters:

  1. Correctness
  2. Fast execution
  3. Memory usage
  4. Lower size of compiled binary

We can benchmark a library with our own implementation that may have the chance of besting the external library in terms of the criteria described above.

We can also utilize dependency analysis to get to know the impact of transitive dependency that will be introduced when we want to add just one library to our codebase.

Shift focus on why we use code dependency when the company and the products become more mature.

There are different reasons in different phases on why we have other people’s code in our codebase.

Phase 1: We need to get the product launched as fast as possible

In this phase, dependency awareness might be not strictly enforced since there are deadlines to meet.

Phase 2: Product is launched to the market

In this phase, the company will start to create the non-MVP feature to make its product leads the market share. More features are planned to be released. During the development, code reusability is still favored to have it launched in production as fast as possible. However, it uses more strict guidelines to encourage engineers to reason about the cost and benefit of the decision.

Phase 3: Cost and performance optimization

In this phase, a dependency is used to lower the maintenance cost. For example, a library is used due to its open-source community support. For a casual feature development, an internal implementation may be more favored since the cost is more visible and controlled. So, not reusing external dependency and reinvent the wheel may be preferred if we reinvent a better wheel that we can put in production.

Most of Traveloka’s product is still in Phase 2. Since we still launch more features for our users. Our development utilizes lots of external and internal module dependencies. To keep us sane in maintaining the code base and the system that running the software, we employ many quality checks during code review, especially for any dependency change.

We are aware that this article has not fully addressed all the important things about code dependency. In case you have more tips on dependency awareness, please share it in the comment section down below.

Further Readings

--

--