PART ONE - WHAT'S WRONG

I had to split this article into two parts. Part one describes what I think is wrong and part two, which is coming next week, is a proposed architecture to improve it. Discussions and comments for this can be found on my LinkedIn page.

Finding and fixing vulnerabilities is a staple of the entire security industry. At one end of the spectrum are hard core security researchers, spending months obsessing over finding new exotic holes in things like satellites. At the other end of the spectrum are suits, coffee in hand, feet up on their desk, staring at a pretty report to see if their automated patch management system has applied the recommended updates to their fleet of Windows desktops.

Everyone else is somewhere on the spectrum, just like me.

What we all have in common is the need for a trusted place to record and reference vulnerabilities. Without it we are all fuxored, and in today's world of open source and supply chain security attacks, we are just that, fuxored.

Vulnerability databases as we know them today simply don’t work for open source code supply and chain security attacks and here is why.

The Quick History of CVE and NVD

There has been a lot of scar tissue in the industry over the years figuring out what works and what doesn’t for vulnerability disclosure. There have been the OG mailing lists like Bugtraq to disclose and then discuss vulnerabilities, and even an IETF RFC for responsible disclosure.

In September 1999, David Mann and Steven Christey from MITRE published a paper called Towards a Common Enumeration of Vulnerabilities which led to the creation of the Common Vulnerabilities and Exposures or CVE system. You can find a full history at the CVE site here. If you don’t know what a CVE is then this article is not for you and I recommend reading this one first. I don't know David but do know Steve, and despite my ongoing criticisms of CVE, I think we all owe Steve a debt of gratitude. The work he has done on Internet security over the years has had a massive impact and moved things forward. He's also a lovely human. He hasn’t worked directly on CVE for a long time. I think a lot of Steve and his work.

1999 also saw the birth of the National Vulnerability Database or NVD. NVD is run by the National Institute of Standards and Technology or NIST and was created in 1999, initially called Internet - Categorization of Attacks Toolkit or ICAT. It is part of the US government at https://www.nist.gov.

From the NVD website

The NVD performs analysis on CVEs that have been published to the CVE Dictionary. NVD staff are tasked with analysis of CVEs by aggregating data points from the description, references supplied and any supplemental data that can be found publicly at the time. This analysis results in association impact metrics (Common Vulnerability Scoring System - CVSS), vulnerability types (Common Weakness Enumeration - CWE), and applicability statements (Common Platform Enumeration - CPE), as well as other pertinent metadata.

As CVE became popular, tangential and related problems surfaced and were solved by a variety of people. There is a common taxonomy called Common Weaknesses and Exposures or CWE. Others like NIST pitched in with a way to describe the subjects of the vulnerabilities, Common Platform Enumeration or CPE and First created a way to score vulnerabilities called the Common Vulnerability Scoring System or CVSS.

Today there is a total ecosystem around CVEs and the NVD including things like the Security Content Automation Protocol or SCAP, Common Security Advisory Framework or CSAF, the Vulnerability Exploitability eXchange or VEX, and even a Bill of Vulnerabilities or BOV by the excellent CycloneDX folks.

On top of this of course is the ecosystem consuming all of this, tools and humans using the data. All host and network vulnerability scanners use the data, all SCA scanners use the data, threat intel tools use the data and of course there are hundreds of thousands of security pros all using the data and looking at the noisy alerts.

CVE / NVD and the ecosystem around it is no doubt well intended, but it ignores the elephant in the room which I can sum it up with this one liner.

Garbage in, garbage out.

It's not about the way we consume and pass around vulnerability data. There are no doubt ways to improve all of that, but as you can see from above, there are a load of smart people working on it already. The reality though is, just like the AppSec Letter Bomb Problem, that if the vulnerability data you are relying on is garbage, then it just doesn't matter. It’s garbage in, and garbage out.

The importance of this has never been more acute than today. We are literally living in a world of open source and supply chain security attacks. CVE / NVD doesn’t work for open source and supply chain security.

The Problems Today

CVE / NVD was designed for a different era

In 1999 when CVE and NVD was created there was no AWS, no iPhone and developers reusing open source code wasn’t really the thing we know about today. Github wasn’t even created until 2008. We lived in a world where the majority of software was released by relatively few vendors. The first CVE was against Free BSD and if you look at all vulnerabilities from that era, here is 1999 for instance, you will see they are almost all against Linux distributions and Windows.

If there was an issue in Internet Explorer (or what became known as Internet Exploder (link NSFW)) then it was easy to point to MSFT and the versions that it has released. There weren’t clones of Internet Explorer being used on the Internet.

This also meant that descriptions of exactly what software was affected was a tractable problem.

CVE / NVD data is often incorrect and not technically verified

NVD itself says it doesn't technically verify data.

The NVD does not actively perform vulnerability testing, relying on vendors, third party security researchers and vulnerability coordinators to provide information that is then used to assign these attributes.

At SourceClear we initially took CVE / NVD data verbatim. When we built our vulnerable methods technology, something most SCA’s are only just starting to release today, we had to have our researchers pull apart NVD’s, figure out exactly what libraries were affected and what the actual vulnerable method was. What we found was that all too often, the NVD entry described the wrong system or library and the vulnerability as described wasn't the actual vulnerability in the first place.

I remember one case where the NVD was about the Java Spring Framework, citing versions of Spring that were vulnerable. When we dissected it, it was in fact the Apache Xerces parser. The implications of that were big. Not only did the majority of Spring apps we saw, not invoke the Xerces parser in a way that made them vulnerable, many many other applications using the Xerces parser were themselves vulnerable, but they didn’t know it of course.

With funding from https://www.iqt.org, we set out to build an experimental system to build a global call graph across public open source distribution systems that could show us what libraries depend on what other libraries. We called it SGL or the Security Graph Language. You can see me on this Youtube video doing a live demo at Hack in the Box in Singapore during a talk Finding Vulns And Malware In Open Source Code At Scale.

Using SGL we estimated that at the time, there were probably 245,000 vulnerabilities in Java, Python and Ruby when CVE listed around 8,000 if I am not mistaken. This brings me to the next point.

CVE / NVD can't possibly deal with the rate of vuln ingestion

At one point in SourceClear history, we started filing CVE’s for vulnerabilities that we found including what we called half-days (see below). We would sometimes find ten and even fifty a day, and a surge when we added a new library and a deep clone to Commit Watcher. When we looked at the number of CVE’s being processed, the team at CVE who were doing a stellar job with the limited resources they had and the rate at which we could find issues it just wasn’t practical to report them. We also didn’t agree with the rules about disclosure of issues that were already public in the commit logs if you knew where to look.

This unfortunately leads to my next bone of contention.

The CNA scheme has become a way for vendors to hide issues

The CVE Numbering Authorities or CNAs were designed to scale things.

CNAs are software vendors, open source projects, coordination centers, bug bounty service providers, hosted services, and research groups authorized by the CVE Program to assign CVE IDs to vulnerabilities and publish CVE Records within their own specific scopes of coverage.

You report a vulnerability in a particular piece of software, let's say the Java Spring Framework and its passed to Pivotal for triage. The CVE record can then be marked as RESERVED, DISPUTED, REJECT and if you play nicely by commonly accepted disclosure rules you abide by the period of silence those labels dictate. From the CVE site;

After your announcement has been publicized, contact the MITRE CNA-LR via the CVE Request web form. Select "Notify CVE about a publication" and provide the following information:

The CVE ID(s) assigned to the vulnerabilities being publicly announced.
Links to the public forum(s) or advisories where the announcements can be found.
(Optional) A description for each vulnerability to be used in the official CVE List.

Until this information is provided to the MITRE CNA-LR, only a reserved CVE Record may be recorded on the CVE website. No description or details of the vulnerability will be made available in the CVE Record until the vulnerability has been publicized.

Sadly what some software vendors do and I know this for a fact, is use this as an opportunity to mark a vulnerability as RESERVED and effectively hide it for a period. I am far from the only person this has happened to. I believe the time period is two years, but I can't find the rules. The usual reason given is that they have long release cycles and need to coordinate disclosure to their customers. While true for some, it's just bullshit for most.

Of course when a vague reserved CVE is published, hackers do what we did at SourceClear. Unleash the bloodhounds, pour through the commits (see below), find the vulnerability and build the exploit. According to PANW in this article, 80% of public exploits are published before the CVEs are published.

On average, an exploit is published 23 days before the CVE is published.It is also true that when people buy software composition analysis tools they have Fear Of Missing Out or FOMO, and often judge trials by the number of reserved CVE’s (and of course private vulnerabilities the tool can find.

I once got threatened by the legal team of a big software vendor who produces an open source Java framework, for reporting a vulnerability that was clearly in a commit. They changed the commit history but I had a deep clone. Idiots. They backed off when they knew their dirty tricks could be easily exposed.

Most developers don't care about reporting vulns

Developers have a lot of things to be concerned with. Security is one but it's rarely top of the stack. That’s reality. You can argue it should be until you are blue in the face, and I did for years, but you won’t change it.

You have to look at the incentive model for developers to submit CVEs. The only incentive is the developer doing the right thing, and that is not a strong one. In fact for many developers it's a disincentive. Your boss finds out you were writing code and found out that it had a vulnerability. You mean you write shitty code? Your code caused us public embarrassment? No blame is a romantic idea but rare in capitalist cultures.

When I was the CEO of SourceClear, the first pureplay SCA company acquired by Veracode in 2018, we realized this was a problem and built a tool to operationalize it, Commit Watcher.

It’s not maintained and sadly hasn’t been marked as abandoned. What it did was analyze commits on open source projects, looking for hidden vulnerabilities being silently patched. We backed it up internally with some special sauce, including using graph neural networks, releasing an academic whitepaper about it, Enhancing Security Patch Identification by Capturing Structures in Commits that you can download here. We used to classify vulnerabilities found using Commit Watcher with a simple taxonomy, one such category we called half-days. Half-days were things that were exposed but generally not known about, things in commit logs, copy and paste vulnerabilities and embedded vulnerabilities from transitive dependency graphs. You can read about that here. Over time the majority of vulnerabilities we published in our vulnerability database were found this way.

CVE / NVD is a badge of honor and are increasingly fluff

Some people, especially bug bounty hunters, collect CVEs to use on their resumes. Given how easy it is to submit a CVE it's not surprising that many are just pure fluff.

I am easily amused, and this one made me ROFL this summer. Yes, 2022.

CVE-2022-38392 - Certain 5400 RPM hard drives, for laptops and other PCs in approximately 2005 and later, allow physically proximate attackers to cause a denial of service (device malfunction and system crash) via a resonant-frequency attack with the audio signal from the Rhythm Nation music video. A reported product is Seagate STDT4000100 763649053447.

I guess that explains why NVD doesn't validate them eh? You can’t play the audio signal from the Rhythm Nation music video in a government office after all.

I have often toyed, after a Belgian beer (or five), if I should submit a ridiculous entry for April fools day. Maybe we should have an equivalent of the Eurovision Song Contest? I never have for the record. I get to four beers and stop myself.

Vulnerabilities have a value, like it or not

Vulnerabilities have a financial value. That’s true if you are the NSO Group and their million dollar exploits, or a SCA vendor looking to get a competitive advantage. Submit your vuln to a public database and that financial advantage is wiped out. If you look at the incentives for people to do it you have to go back to one of my favorite quotes Upton sinclair quotes, “It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

Vulnerabilities also have intelligence value. We know that foreign state actors view the supply chain as a great vector for attack. Just look at the Solar Winds story. Huawei trying to backdoor the Linux Kernel ? The vectors to do it and hide it for financial and nation state value are huge.

CVE / NVD doesn’t have the right data to reproduce an open source vulnerability, yet alone automate detection

While good efforts across the ecosystem have been made around operationalization and automation, it's a giant case yet again of garbage in and garbage out. This is perhaps the biggest issue I see that needs solving, and why, as you will see in part two of this article, I argue that we need specialized vulnerability databases. In particular we need a specialized database for open source libraries.

As I described above, CVE / NVD data is often incorrect, and not technically verified. There is a lot to describe if you are to make a vulnerability report reproducible and automatable.

First up you can’t rely on semantic versioning. We have learned this in dependency pinning. Yes, it's a double edged sword and I have written about that before in my article Dependency Pinning Only Works if you actually review the updates. We need to know the commit hashes of every vulnerable version if we are to be sure we are comparing apples to apples.

We also need to have a call graph (or an AST at a minimum) to know how the vulnerable method could be called. At SourceClear we used inter-procedural call graphs. We never implemented full data flow graphs, but I can see where you may need that. You can’t just describe some high level code constructs in humanly readable english. Can you imagine a unit test written in prose?

If you want to make risk decisions you also need developer reputation scores. Was this a critical vulnerability, marked as reserved, that has been just submitted originating from code committed by a guy or gal with a long history of shitty code? Was it Huawei or the Chinese government trying to backdoor something? IFTTP - If This Then Panic.

Last, certainly not least, and perhaps one of the hardest problems to solve is that CVE / NVD doesn’t describe clones and forks. Let's take the Log4Text vulnerability, CVE-2022- 42889. Credit where credit is due, this NVD entry has perhaps the best level of technical write up and analysis I have seen, so clearly things are getting better, but I also chose it because I believe it still illustrates my points.

CVE-2022-42889 affected Apache Commons Text. Apache Commons Commons text has been forked 240 times, including by SonaType, who created SonaType Lift Apache Commons Text. That itself has been forked 210 times, and its main branch is 203 commits behind the Apache Commons Text “Main” branch at the time of writing. There may be reasons for all of this, SonaType are smart people, but the details are not the headline here. The headline is that open source is a graph of code and just tagging one node on that graph is the tip of the iceberg.

A lot of the world doesn’t trust the US government

I am a dual British and US citizen, and openly tell people that ask that I have, and will continue to work with the US and NATO Intelligence, but reality is that after the Snowden affair, the CIA Vault Seven compromise and recent global politics, especially the years of Trump, the US has lost its position on the worlds stage. The NVD is run by NIST at nist.gov. That's not me playing politics, that's just a fact. I have zero reason to ever believe any impropriety, but others will.

The Results of Those Problems

The results of the issues above, manifest themselves in many ways, but in my opinion can be summed up as creating a lack of trust in the security of open source code, an inability to create true end-to-end automation from detection to patching and an arms race to create the best private vulnerability databases. This makes the entire internet less secure than it could be.

There are already applaudable efforts to tackle some of these things like the OSV creating a better vulnerability schema and the work of the Cloud Security Alliance Global Security Database so we are not doomed but I don’t think it’s enough.

In part two of this article, that I will try and publish next week, I will lay out what I think we need to do to improve this problem, including creating a truly distributed vulnerability database system, data integrity measures, specialized vulnerability databases with specialized data, and a central public vulnerability database managed by the United Nations. And there is more.

Footnote : I know CVE does not describe itself as a vulnerability database but that's a whole other blog so please hold your comments on that one.