CVE / NVD doesn’t work for open source and supply chain security

PART TWO, SOME IDEAS TO IMPROVE IT

A few weeks ago I published an article, CVE / NVD doesn’t work for open source and supply chain security - Part one, what's wrong. The article got quite a bit of attention and judging by the comments, a near universal reaction of nodding heads. Not universal, but near from my vantage point. There were also some very kind comments about my writing. Knowing people appreciate your writing is very encouraging, so thank you.

Part one was very much me slinging mud. It’s easy to do, I get that. I always remember one of the partners at MSFT telling me “Don't come to me with problems, come with solutions”. I am not going to pretend I have the answers, or some sort of magic wand to fix the CVE system. Any system that has such widespread adoption, that is so critical to the Internet, and that is dealing with such a complicated topic, will be very hard to “fix”. Fix is the wrong work in my opinion and so this article is what I consider a set of ideas to improve it. I suspect the best we can hope for is continuous improvement in incremental steps, and I will certainly take that.

I also want to be clear that while I believe what we have today is not good enough for what we need today, and certainly not good enough for what we need in the future, without it we would be in a far worse place and credit and gratitude should go to what we have today.

I also understand and want to be clear that I have never been involved in the creation side of CVE / NVD, but do consider myself to have been a power user, spending five years building a product that was a heavy consumer. I will update this article at the bottom, if I discover ideas below are already addressed, being addressed or assertions that are not factually correct.

To recap, in the first article I used the phrase “garbage in, garbage out” as my top level summary and then broke that down into the following sections.

- CVE / NVD was designed for a different era - CVE / NVD can't possibly deal with the rate of vulnerability ingestion - The CNA scheme has become a way for vendors to hide issues- Most developers don't care about reporting vulns - CVE / NVD is a badge of honor and are increasingly fluff- Vulnerabilities have a value, like it or not - CVE / NVD doesn’t have the right data to reproduce an open source vulnerability, yet alone automate detection - A lot of the world doesn’t trust the US government

The ideas in this followup map across those issues and so this article is broken up into the following sections.

- Change the economics of low hanging fruit public vulnerabilities - A better understanding of what is critical open source and how it affects organizations - A new governance model - Specialized distributed databases, standards based protocols and schemas - Incentivize people to validate and fix vulnerabilities and not just find them

Change the economics of low hanging fruit public vulnerabilities

The fact that vulnerabilities have real monetary value, is simply a fact of life and you are never gonna change the world so that all vulnerabilities are public, but we can raise the bar in certain areas and at least make sure the low hanging fruit is commoditized.

There is a common adage that by raising the cost to exploit something will mean the delta between the cost and the reward becomes smaller, and therefore people will move to high value things, or move to other things all together. It's cheap to phish people but expensive to backdoor hardware, so the former can be automated and run out of virtual sweatshops, while the latter is largely the domain of foreign governments.

As I explained (maybe confessed) in the first article that at SourceClear back in the day, we would find the majority of our vulnerabilities using a tool called Commit Watcher that monitored commits for indications of silent patches and details about reserved entries. I forgot to mention in the article that we had a similar tool to watch public Jiras for security issues and issues suddenly marked as private. So cunning you could put a tail on it and call it a weasel! After being analyzed, these issues then formed a large part of our commercial vulnerability database, and in their aggregate, were very valuable to us in sales and marketing, and of course as content to fuel the users of our product.

Many people purchased SourceClear just because of this, FOMO sells and “Beyond CVE” sold subscriptions. We never hid the fact that we knew about the vulnerabilities, but we did hide the details of the vulnerable methods, the special sauce technology behind our paywall and the thing that we had spent real money on via our researchers. I continue to be proud that we built vulnerable methods in 2015, and that most other SCA tools are only just catching up in 2022.

The value of having vulnerabilities that very few people knew about, or to be accurate knew the details about, was of course at the expense of others and that was anyone that was not one of our paying customers. It was the right financial thing for us to do at the time but I do feel uncomfortable about it now.

I think that if the security industry organized itself today, we could change the economics of this practice for the benefit of everyone, by simply commoditizing it. We could do this by building a scalable version of Commit Watcher (long abandoned) and a scalable team to process the findings and submit them into a public database.

Maybe the neutral governance authority that I suggest below could take this on. This would not stop the practice of non-disclosed vulnerabilities but this would at least level the playing field a bit.

A better understanding of what is critical open source and how it affects organizations

The most valuable vulnerabilities to adversaries and vulnerability brokers (that includes SCA companies), in every sense of the definition of valuable, are the ones that will have the highest impact if exploited. I don’t think we have a good enough understanding of what they are.

I have a lot of respect for the work on the Linux Foundation and the Open Source Security Foundation. I know a lot of people there, they are smart and good humans and in a relatively short period of time they have had a massive impact. The OSSF alpha-omega project was one such project.

“Alpha” will work with the maintainers of the most critical open source projects to help them identify and fix security vulnerabilities, and improve their security posture. “Omega” will identify at least 10,000 widely deployed OSS projects where it can apply automated security analysis, scoring, and remediation guidance to their open source maintainer communities.

The work by Harvard LISH is the basis of where the Alpha Omega project deploys its funds and to date I believe they have deployed over $2M to teams and plan or a lot more. That is truly brilliant but the question for me will always be “is the money being applied to the best place?”.

Identifying what is the most critical open source is obviously a massive challenge and while Census II of Free and Open Source Software — Application Libraries by Harvard Lish in partnership with the OSSF is far better than anything else I have ever seen, and a fine starting point, we need better.

Firstly I don't believe that NPM is the most critical package system, a focus on the study. It's no doubt the most popular package manager today, but apps built with NPM typically have a different security profile than those built with other technologies. The majority of financial services apps still in production for instance are built with Java. Java developers are also older and more conservative about updating code bases.

Secondly the data that the survey was based on was from three commercial vendors, Snyk, Synopysys and FOSSA and I think it's reasonable to say there are obvious biases here based on their customer profiles.

However, in order to ensure the privacy of our data partners and to protect any proprietary aspects of their SCA services, some details have been obscured.

It's a privately funded report that has been used to do good so who am I to criticize but it is also clear to me that we need to do a better job of really understanding what is critical and what isn’t.

If you believe the venture investors and investment analysts then the app sec tools market only has a 5-10 % penetration of tools. I am usually skeptical, but it is likely directionally correct. This also implies that the current customer bases will be skewed as they are by definition “early adopters”. The overlay here is that some of the biggest issues in recent years like Heartbeleed have caused such chaos, were embedded into older products and or created by companies with a lower appetite for new tech.

Much like my overall comment on CVE, my point here is this is an area that is radically under invested and we need better. I know for a fact that open-source packages are part of satellite systems, missile systems and all sorts of critical infrastructure that is not scanned by those commercial tools. It is not a matter of summing up library use alone, but a complex model of weighted scores based on how and where they are used. It's also worth pointing out that what is critical to one industry may not be critical to another. No one size fits all.

When <Heartbleed, Shellshock, Log4Shell, Log4Text> hit, many CSO’s had to deploy teams of humans to go out into their companies to find out where it was, and how it was being used. SBOMs tell you whether it is being used in a repo but as of today there is no effective SBOM provenance that can tie a build to a production instance. I spoke about this in a previous article “The SBOM Frenzy is Premature”. Hundreds of thousands of hours were wasted with people finding libraries in test directories, on lab hosts, and of course, as with most vulnerable libraries, the vulnerable methods were never actually being used.

We have been working on a solution to this that can be used in conjunction with SBOMs. We plan to open-source it in the new year. It’s frigging brilliant.

There are two parts to determining how critical a library is. It's how it is being used on the Internet, the global view, and it is how it is being used in your environment, the local view. The global view does not always correspond with the local view.

I also think that there should be better and more widely accepted models to understand the value of vulnerabilities to attackers and yes and value to anyone else who profits from them. If we understand this we can determine what to invest the inevitable limited resources in. If zero days have half-lives then early disclosure would kill the value.

All of these things would help inform those funding, remediating and generally improving software security quality, where to focus their efforts.

A new governance model

One of the issues I called out in the first article was that CVE and NVD are funded by the US Government. Trust in the US government is not what it once was, especially after the Snowden affair and the Trump era. If we are to encourage the world into sharing vulnerabilities, then we must have a neutral body, as free as possible from world politics. I have always been disappointed by the United Nations inability to act rather than bluster, but I can not think of a better place to host a global vulnerability database.

The United Nations was created in 1945, following the devastation of the Second World War, with one central mission: the maintenance of international peace and security.

The last part of that mission, in my opinion, lines up with the mission of a global vulnerability database. Each member nation already contributes financially including funding the UN security council. If there are other better neutral locations I would love to hear them.

The second issue that must be addressed is the cronyism of the CNA scheme. You simply can not let the foxes guard the chicken coup. It’s clear that there are real advantages of a distributed model that includes the original software producers. They know their code best to both verify and fix any issues, and while punishing all for the bad behavior of a few seems unnecessary, having appropriate checks and balances is always a good thing.

Note: we are facing the same problem today at OWASP with some of the derivative OWASP Top Tens, some of which look shockingly similar to sales data sheets of vendors' products. Fair warning, I am coming for them in early 2023 and will name and shame if needed.

One way to improve this is to have an appropriate balance in the governance structure for each CNA or equivalent, meaning it has to consist of a number of neutral parties that can not be outvoted by the vendor with a vested interest. This could be achieved by each CNA committing a number of people of which only a portion of them work inside of their companies CNA. Distributed analysis and coordinated disclosure does not have to be local.

Specialized distributed databases, standards based protocols and schemas

This is by far the most important part of improving the ecosystem in my opinion, and the one that would have the highest impact and most immediate improvement. It is also the one that has the most amount of current work and attention. That is brilliant. There are excellent efforts including osv.dev and GUAC by Google that do some of what I talk about in this section and elsewhere in this article.

Support for public, private and shared vulnerability databases

Public and private vulnerability databases are a fact of life, and there are many legitimate reasons to maintain a private database beyond the questionable practice of vendors hoarding them for money. Companies for instance create and maintain their own libraries, fork and extend open-source ones adding proprietary functionality. Vulnerabilities in these are not of legitimate interest to the public, unless the code is republished.

Likewise there are many security information sharing initiatives like the FSISAC and Infraguard, for whom sharing intelligence often goes through a cycle of initially being restricted and eventually being public. The same is true of cross company or cross community teams working on projects like those in the Apache Foundation. My experience is that an awful lot of the analysis and sharing of issues is buried in private Jira instances or email lists, with effectively only a free form, private, text based audit log.

First class support for distributed vulnerability databases that can normalize and dedupe issues, provide access control on the aggregate and down to a field level, privately, publicly and with a restricted audience, would be valuable in improving the speed and quality of analysis.

An official standard based protocol and schema

It's true there are various vulnerability exchange formats, VEX being the most widely used, but a VEX document is a form of a security advisory that indicates whether a product or products are affected by a known vulnerability or vulnerabilities. This is excellent and certainly needed, but I believe we need a standards based protocol to connect and maintain vulnerability databases.

If we think about DNS as a standards based protocol, it is able to manage top level domains, subdomains, allows people to provide their own DNS servers for efficiency, and even run private IP address space. Many lessons have been learned from DNS as a delegation protocol resulting in things like DNSSec. To be clear I am not saying use DNS but I am saying we have lots of knowledge and prior art from things like electronic voting and DNS that could be considered when designing a scalable distributed system.

I have always believed that if you do anything for the lowest common denominator, then you resign yourself to mediocracy. That's true for most areas of life, from building teams, content, and software to things like my cycling club. When we have a “no drop” ride, meaning the faster riders have to wait for the slower ones, the faster ones simply don’t turn up. If you try to please all the people, you usually end up pleasing none of the people.

When creating document formats or schemas I have seen this lowest common denominator approach used, with the same result. In the vulnerability world this is dangerous, because it results in missing data and or data that is not machine readable. For instance, the requirements of data for browser vulnerabilities are fundamentally different from those for open source libraries. In the former you need all the browser versions affected whereas in the open source world you need all the versions and all the corresponding commit hashes of those versions (see below).

I believe the right approach is a common set of top level fields that includes operational data such as ensuring global naming uniqueness, identity and authenticity of the database, reporters, analysis team etc and then a domain specific schema for the subject, for instance open source supply chains.

In an open-source supply chain schema I would want to see at a minimum the data we put into vulnerability reports at SourceClear including

- The analysis history (a log of who did what and when to look at the issue) - Ratings such as exploitability tightly coupled with the report - The repo or repos where the original code was - The commit or commits where the vulnerability was introduced - The commit or commits where the vulnerability was fixed (if fixed) - The vulnerable methods and call chain information - A lot of this data could then be used to automatically and dynamically enrich the report, and be used for additional research.

For instance, using the repo location, you can look at the forks and clones to determine if the same issue applies. There are 1,500 hundred forks of Log4J-2!

There are computer science techniques like code similarity that are very effective at figuring out plagiarism, and if you have ever done a computer science degree, and tried to turn in code that you “borrowed” from a friend, you will know what I mean. Good ones even understand the common code refactoring algorithms.

We also need to be able to better understand who is committing vulnerable and malicious code by profiling developers. It doesn’t need to be a blame game, but the reality is that some developers are better than others and some are malicious. Like investments, past performance is not an indicator of future performance, but you wouldn't put your life savings in the hands of a fund manager with a bad record, or no record at all, so why use a library with similar conditions.

Capturing this data in vulnerability reports will allow us to automatically enrich new reports, go back and enrich older reports and use the information to go out and find new issues.

Incentivize people to validate and fix vulnerabilities and not just find them

Finally I think we need to always reward people that work hard and do good. They should be celebrated, and rewarded. Bug Bounties like those run by HackerOne and BugCrowd have changed the assessment landscape. Companies can post targets and how much they are prepared to pay for a bug report, and let anyone eligible go fishing.

I believe that companies that rely on open-source should offer rewards to both validate and enhance vulnerability reports and help fix the subjects of the reports. The platforms to make this happen already exist and the process to make this happen already exist but we do need to create a model that lines up incentives of those funding such work, just like the Alpha-Omega project.

As I pointed out in part one, the NVD does little to no verification that the issues are valid and we know that there is a lot of data missing to operationalize the finding, including the vulnerable methods. This type of work would be ideal for a bounty style approach, where existing researchers can pick up an additional knowledge domain, and we can improve the quality and completeness of new submissions.

Just as bug bounties have surfaced a global army of bug finders, we could mobilize a global army of bug fixers.

I took weeks grabbing an hour here and there to write this article, and in truth am not happy with it. This is such a huge topic and clearly re-designing a global vulnerability disclosure system takes more than a few hours, but I do hope these ideas are interesting, and prompt discussion about practical ways we can improve what we have today.