Published in


Census II: Another tool for securing the software supply chain

In just the past few months, SBOM (software Bill of Materials) has become one of the hottest acronyms in the world of software security. Which is a very good thing.

An SBOM is an inventory of the software components an organization is using. Without it, as has been preached at security conferences for well over a decade, you don’t know what you’re using. If you don’t know what you have, you can’t protect it by keeping it up-to-date. Nor can you trust it to protect you, since if vulnerabilities are discovered in those unknown components, you won’t know that you need to apply patches or updates to fix them. You could have multiple time bombs in your software supply chain.

So the call by President Biden in his May 2021 executive order on “Improving the Nation’s Cybersecurity” for federal agencies eventually to be banned from using or purchasing software products that don’t include an SBOM, is laudable.

Because every company, tech or not, depends on software. It’s not just, as venture capitalist, Mosaic creator and Netscape cofounder Marc Andreessen wrote in 2011 that “software is eating the world.” It’s that software is powering and running the world.

Still, as any security expert will tell you, an SBOM isn’t a silver bullet. Software components aren’t all equally important, nor is the risk from all vulnerabilities the same.

Nor are they all easy to find. Software that an organization creates itself or buys from a commercial vendor is much easier to track than open source software, which is created by volunteer communities. And open source now makes up the large majority of just about every modern codebase for understandable reasons — it’s free and can be used any way users want, as long as they comply with licensing restrictions.

The 2021 “Open Source Security and Risk Analysis” (OSSRA) report by Synopsys found that of 1,546 codebases analyzed, 98% contained open source, and the average amount was 75%.

So while an SBOM is a very good start, it doesn’t necessarily provide the context that helps an organization manage risk effectively.

A selective list

Hence the need for Census II of Free and Open Source Software (FOSS) — Application Libraries, published last week by the nonprofit Linux Foundation and Harvard’s Lab for Innovation Science. It identifies the top 500 of the most widely deployed application libraries. (Synopsys participated in creating the report.)

As the authors put it, “It is difficult to fully understand the health, economic value, and security of FOSS because it is produced in a decentralized and distributed manner.”

Actually there are eight Top 500 lists, because of how different software components are packaged and how different versions of them are catalogued and identified.

Mike McGuire, security solutions manager with the Synopsys Software Integrity Group, says packages and versions are a bit like the different model, year, and trim of a car. “If I told you I drive a Toyota Camry, you still don’t know exactly what I drive. Is it the 1999 version or the 2022 version? It’s important to know this when ordering parts, getting service, tracking recalls, etc.”

So Census II includes 4,000 of those application libraries. Which might sound like an extensive list. But it doesn’t even amount to a rounding error out of the tens of millions of FOSS projects that are built into the software that individuals and organizations use every day.

In other words, it is a highly selective list — perhaps a bit like a government, during a pandemic, creating a list within the general population of those who are at especially high risk. It’s helpful to know which thousands of people among hundreds of millions need added protection.

Popularity isn’t everything

The Census II goal, according to the authors, is to “inform actions to sustain the long-term security and health of FOSS.” But they acknowledge that popularity is just one metric — it doesn’t necessarily mean critically important nor inherently riskier.

The census represents “our best estimate of which FOSS packages are the most widely used by different applications … It does not try to measure the risk profiles of that software. There are many indicators that could be used to suggest risk and different organizations may weight factors differently,” the authors wrote.

McGuire agrees with the caveat, noting that TV is enormously popular, but it’s not critical for the health and safety of those who watch it. “The same goes for software,” he said. “Tons of apps can be using a specific Java GUI framework, making it very popular, but it may not serve as a critical part of the software should something happen to it.”

He added that what is considered critical “is going to be unique to each organization based on how their apps are built.”

Still, measuring risk profiles is “easier to do once the most widely used software is identified,” the Census II authors wrote.

Coming up with those lists was not easy. Just one example is that not all FOSS packages had what the team called a “unique identifier,” so they had to create them.

It is also a labyrinthine task to track all the so-called “dependencies” in software components, which can stretch multiple levels deep and create an exponential increase in the number of components in a single application

All dependencies are not the same either. As the authors put it, “A direct dependency exists when a piece of code … includes a specific call to that package or component. However, each of these direct dependencies may in turn rely on other packages or components, known as indirect dependencies.”

There are other limitations to the data of Census II that, collectively, led the authors to this disclaimer: “The findings of this report are indicative but cannot — and do not purport to — be a definitive claim of which FOSS packages are the most critical.”

The authors also encountered a number of hurdles that they believe could serve as a guide to improving the way software is identified, catalogued and maintained.

Among the “lessons learned” were the following.

The need for a standardized naming schema for software components

This lack, they wrote, “threatens to stymie efforts by industry and government to better protect themselves from software-based incidents.” They add that until one exists, “organizations will remain categorically unable to communicate with each other on the large scale — particularly, the global scale — necessary to share such information.”

Package version complexities

The team wrote that they encountered an unexpected problem — companies were “maintaining internal versions of a package and were not contributing their changes back to the official repository. In one instance, they observed version 2.87 of a package multiple times, but the official repository only went up to version 2.26.”

That means that if an SBOM “can’t distinguish between a ‘main’ version and a variant […] it will be difficult for the purchasers of such software to know if they are vulnerable to newly discovered vulnerabilities.”

Spotty code maintenance

The OSSRA report labels it “the price of popularity.” It’s well-known in the open source community that some of the most popular FOSS has only a few — sometimes only one — developers to maintain it. The authors wrote that in one review they did, “94% of projects had fewer than 10 developers accounting for more than 90% of the lines of code added […]These findings are counter to the typically held belief that thousands or millions of developers are responsible for developing and maintaining FOSS projects.”

According to the OSSRA report, “As an open source project grows in popularity — with no corresponding growth in people maintaining the project — the consequence is often developer burnout, and many open source projects are abandoned.”

And if projects are abandoned, that means bugs don’t get fixed.

Individual developer account security

Individual accounts generally aren’t as well-protected as organizational ones. That, the authors wrote, means “changes to code under the control of these individual developer accounts are significantly easier to make, and to make without detection. Further, a related issue could occur if the individual developer went on a long hiatus, or was hit by the proverbial bus, preventing updates to the code from occurring.”

That’s not the only risk. Others include solo developers removing or deleting their projects, breaking hundreds to millions of packages that depend on it.

Persistence of legacy software

We’ve all heard of companies declaring that they are ending support for older versions of operating systems or applications. But that doesn’t mean everybody stops using those older versions for any number of reasons, including familiarity with it, that “it works fine,” and that it would take too much time or money to upgrade.

“Many organizations will find it difficult to justify switching to different packages, since there are financial and time-related costs for switching to new software when there is no guarantee of an added benefit,” the authors wrote.

Indeed, the OSSRA report found that 85% of the codebases examined in 2020 had open source dependencies that were more than four years out-of-date, even though there were newer versions available — sometimes many newer versions.

But that can be dangerous. One of the reasons for newer versions is to fix bugs in the older versions. And you can be sure that hackers are looking for those still using the older versions.

McGuire said while Census II doesn’t propose a feasible way to fix that problem, “it can raise awareness of just how many organizations are leveraging old and outdated versions.”

All of which suggests that the Census II team knows they have created a useful tool, but also that there is a considerable way to go to call it a revolution in software supply chain security.

McGuire said Census II “provides a view of popular FOSS and some observations about relative complexities.” But he said its real value “manifests before an SBOM is even built. I believe it’s best suited to the phase where organizations are planning and choosing which open source components they want to build their apps on top of.”

About Our Publication

Dark Sky Technology helps companies identify malicious threats, untrusted code, and cyber attacks in open source software. Our products use advanced analytics on open source packages, protecting the software supply chain and enabling our customers to deploy secure, reliable, trusted software with confidence.

Finally. Trust in Open Source.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store