Report: Take care of open source software and it will take care of you
You might as well embrace open source software. Because you can’t avoid it. It’s in virtually every software product in existence and makes up the large majority — 77% — of the software in those products.
There’s no good reason to avoid it anyway. It’s every bit as good as proprietary or commercial software, and it’s generally free to use and modify to fit your creative needs. Those advantages have made it not just the foundation but also the building blocks for innovation in every application we rely on today, which run into the millions.
But those good things come with a large caveat. Being as good as other software means it’s also as flawed as other software — it is made by imperfect humans, after all. Which means open source needs to be tracked, tested, and maintained. If you take care of it, it will take care of you — potentially help make your business prosperous. But if you don’t, it could break you by letting cyberattackers into your systems and products.
And that’s why you need the OSSRA — memorize that acronym because it can help you reap the benefits of open source while avoiding, or at least mitigating, the risks.
The annual “Open Source Security and Risk Analysis” report by the Synopsys Cybersecurity Research Center, now in its ninth iteration, is dedicated to “helping security, legal, risk, and development teams better understand the open source security and license risk landscape.” (Disclosure: I write for Synopsys) Its findings cover all the bases — they are based on analysis of anonymized data from 1,067 commercial codebases across 17 industries in 2023.
The key question in this year’s report is the same as in past years: Do you know what’s in your code? But this year the question is more urgent. The omnipresence of open source in software products means it is the dominant element of every software supply chain.
As the report puts it, “With the prevalence of open source and the rise in AI-generated code, more and more applications are now built with third-party code. Without a complete view of what’s in your code, neither you, your vendors, nor your end users can be confident about what risks your software may contain.”
A matter of trust
In short, if you can’t trust your software, your customers can’t trust you or your products.
And ensuring that open source software is trustworthy requires a different level of awareness and maintenance, for several reasons.
- Open source projects are created and maintained (or not maintained) by volunteers, which means security updates are not “pushed” to users but have to be “pulled.” So if an organization doesn’t know it’s using a component, it won’t keep it up-to-date. If that component has a vulnerability it won’t know it needs to apply a patch, even if one is available. And that remains a huge problem. This year’s OSSRA report found that 91% of the codebases assessed for risk contained components with a version 10 years or more behind the current one. It also found that 14% of those codebases had vulnerabilities more than 10 years old.
- Many popular open source projects have thousands of volunteers helping to maintain the code, but millions of less popular projects have fewer than 10 people maintaining them. Some have been abandoned. According to the report, 49% of the codebases analyzed had components with no development in the last two years. That is a major improvement from 91% the previous year, but for that problem to affect nearly half of the codebases is still ominous — no development means “no feature upgrades, no code improvements, no discovered security problems fixed,” according to the report.
- Although using an open source component usually doesn’t cost money, it’s not free of obligation — users have to comply with any licensing provisions, which can vary according to the type of license. In 2023, 53% of codebases analyzed had open source license conflicts. That’s a significant decline from several years ago, but about equal to last year, which means that more than half contained violations of license terms that could cause legal liability or possibly require that proprietary code in an application be made public.
The results of that lack of awareness and maintenance are predictable, and evident in the data. It shows that 84% of the codebases assessed for risk had at least one open source vulnerability, and 74% had high-risk vulnerabilities, up from 48% the previous year. High-risk vulnerabilities are those that have been exploited, already have proof-of-concept exploits, or are classified as remote code execution vulnerabilities.
Obviously that’s asking for trouble. So the goal of the OSSRA report is to show users to avoid trouble. Open source software can lead to much more reward than risk if it’s tested and managed well. So the report doesn’t just document what’s wrong — it offers recommendations on well-established ways to fix the persistent failure by too many organizations to answer the “Do you know what’s in your code?” question in the affirmative..
Indeed, you can get to yes with the report’s guidance on three initiatives: Secure your software supply chain, test your code, and if you use artificial intelligence (AI), treat the code it generates with extreme care.
Secure your software supply chain
Securing the software supply chain starts with knowing what’s in it. As experts have said for decades, you can’t protect something if you don’t know you have it. And knowing starts by creating and maintaining a Software Bill of Materials(SBOM), another acronym that has gained considerable currency since President Biden made it a key component of his May 2021 “Executive Order (EO) on Improving the Nation’s Cybersecurity.” That EO called for federal agencies to stop buying software products that lacked an SBOM. Unfortunately, nearly three years later, that has yet to take effect.
But it shouldn’t take a presidential EO to convince open source users of the value of SBOMs. They amount to lists of ingredients in software products. The list includes who made it, when, who maintains it (or not), its licensing requirements, any known vulnerabilities in it, and if so, their patch status.
That’s not something that can be done manually — the report found that the average number of open source components in a codebase was 526. But it can be done with a software composition analysis (SCA) tool, which automates identification, management, and mitigation of open source and third-party software security defects.
An SBOM is not a magic bullet, however. It is essential, but not sufficient. Besides SBOMs, other recommendations from the report on managing the open source software supply chain include
- Stay informed. Look for newsfeeds or regularly issued advisories that provide advice and details about problems affecting open source components in your SBOM.
- Perform code reviews. Examine the code of downloaded software before including it in a project. Check for any known vulnerabilities and consider including a static analysis of source code to check for unknown security weaknesses.
- Be proactive. Just because a component isn’t vulnerable today doesn’t mean it won’t be tomorrow. Intentionally malicious packages may never even be discovered as “vulnerable.”
Test your code
Most organizations test the code they write themselves. They need to do the same for third-party code (open source and commercial). The report found plenty of evidence of what happens when you don’t — predictably, you end up with vulnerabilities.
One of its more ominous findings is that 8 of the top 10 vulnerabilities last year mapped to a single Common Weakness Enumeration (CWE) a so-called “pillar.” The CWE is one of the popular lists maintained by the MITRE corporation that identifies and classifies security weaknesses and vulnerabilities in software.
According to the report, CWE-707 “concerns security requirements that are not being met before data is read from an upstream component or sent to a downstream component. Failing to properly neutralize input can lead to exploits such as cross-site scripting (XSS) and SQL injection.”
Mike McGuire, senior software solutions manager with the Synopsys Software Integrity Group, said that “user input is being read and used to execute tasks, which sounds natural, but the problem is that when it comes in, it’s not being checked or sanitized. So it can manipulate otherwise trustworthy hosts into performing malicious tasks. The end target of most XSS attacks isn’t the host itself — it’s other users of the same web application.”
In short, as noted before, if you allow vulnerabilities to undermine trust in your software, your customers can’t trust you. Testing it helps keep it trustworthy.
Handle AI with care
There is no escaping AI — in the 15 months since OpenAI’s ChatGPT was launched, AI has invaded everything from university research papers to deepfake porn and yes, software code. Indeed the lure of AI-generated code is almost irresistible. It can generate it faster than any human or group of humans. It never gets tired; doesn’t demand a salary, benefits, or vacations; and never tries to unionize.
But like open source in general, it can hurt as well as help. In a twist on the “you are what you eat” cliché, AI chatbots are what they have been fed, and since their dataset comes from imperfect code, what it regurgitates isn’t perfect either.
Not only can it generate code with bugs, that code — even tiny snippets of code — can have licensing obligations that it fails to flag.
A federal lawsuit filed in November 2022 by four anonymous plaintiffs against GitHub’s Copilot chatbot is based on that exact complaint — that Copilot allegedly “includ[ed] the plaintiffs’ code [but] did not include, and in fact removed, copyright and notice information required by the various open source licenses.”
McGuire said organizations should “treat AI generated coding tools like junior developers.” In other words, supervise them — very closely.
Indeed, a Synopsys team tested Copilot last year, using it to generate some code and then mapping it back to a specific source project. That project, McGuire said, “had a GPL [general public license], which is very restrictive. So if we’d used that in production, we could have gotten in trouble.”
He said the team used snippet analysis, “a line-by-line analysis of source code to see if it matches back to a database of known open source projects. It’s as slow and tedious as you would think, which is why it’s not a scan you would run every time you check code in — maybe nightly or weekly — but it’s extremely effective in making sure there are no IP conflicts.”
Of course, all these recommendations come with a caveat as well — you can do them and you still won’t be bulletproof, just as no amount of physical security guarantees that nobody could ever break in to your building.
But follow these recommendations and you will get a lot closer to perfect. Which puts you on the “make” side of “make or break.” And that is much less stressful, and likely to be much more profitable.