Why SCA for Security is Really Hard

Last week I wrote about the SBOM frenzy in which I discussed the challenges in understanding what dependencies you have in your software and if you are actually using them. This post focuses on using your SBOM and vulnerability data to determine if you have vulnerabilities in your software and explains why most vulnerability data is not up to the job. In 2013 I founded the first pure play Software Composition Analysis or SCA security company SourceClear, acquired by Veracode in 2018. We originally set out to build fast intra-procedural static analysis, but found religion for SCA when we realised that no one at the time knew what OSS was in their code.Today there are many SCA companies and tools to choose from and seemingly every week there is another startup being funded to build yet another. The SBOM frenzy and the supply chain security frenzy is well and truly upon us. SCA is one of those problems that looks easy on the surface, but that turns out to be very hard to solve. The same problems that made it hard back in 2013 make it hard today. From what I read and hear, some of the new tools companies like Semgrep are doing a great job addressing some of the challenges but the biggest fundamental challenge of all, that of poor vulnerability data remains largely untouched and it's probably because it's simply not solvable by security tools vendors. SCA is a no-brainer to improve the security of everyone's software development process and everyone should be doing it, but we should all be aware of the very real gaps that exist today, and not have a false sense of security.

CVE’s often cite the wrong dependencies

At SourceClear our dedicated research team started pulling apart CVE’s to look for the vulnerable methods. What we learned was that over and over again CVE’s were not accurate. A report for a CVE in a framework like say Spring often turned out to be a vulnerability in a supporting library like the Xerces XML parser. It just so happened that the researcher stumbled over an exploitable call chain to the parser from the framework. As well as crude SCA tools that simply grep for the use of the Spring library in the build file, a key implication of this is that the vulnerability in the Xerces parser would go unreported and every other framework or dependency that used it was blind to the problem.The only way around this is to staff up a team of researchers and verify every single relevant CVE. In 2021 there were 19,796 CVEs, that's over 54 vulnerabilities every day including 7 critical severity CVE’s daily. Only a handful of those are OSS dependencies and many of the dependencies reported are not widely used, so being pragmatic you are left dealing with a subset of the CVE population but that subset is still not insignificant. We found great researchers can typically verify an average of 2 CVE’s a day, determining the actual dependencies, finding the vulnerable methods and then verifying which versions had the issues and which ones had fixed the issue. Even if you ignore retrofitting your vulnerability database, we found that we needed a team of around 10 full-time researchers to keep up. We never actually had 10, we worked backwards using the resources we had. If the average security researcher salary today is $132,781 (which seems incredibly low to me) that is 10 x $132,781 = $1,327,810 a year to ensure you have accurate basic data.

Most vulnerable methods are not known

It's great to see companies like Semgrep add vulnerable method detection to their products. Really great. We did this at SourceClear in 2015 because we knew that over 90% of the times Java developers were including vulnerable libraries, that the vulnerable method was not being used and an even greater percentage in other languages like JavaScript and Python. That meant that without vulnerable methods detection, 90 to 95% of all issues being reported were actually false positives. It was busy work. The technology to solve this, inter-procedural static analysis is hard engineering but well understood, however to actually solve the problem means that you need to know where the vulnerable method in the vulnerable dependency was in the first place. As you saw in the previous section, CVE’s are often wrong and they very, very, rarely have detail about the vulnerable method.

Silent Patches

Many dependencies silently fix security issues. You either find this out buried in the bottom of a release note or by watching every issue and every commit. We used to be sneaky and built a tool called commit watcher (now abandoned) that looked for commits that mentioned keywords in the comments like security or vulnerability. We also noticed that when issues went from being public to private it was often a sign of an issue being classified as a security issue and built tools to detect that as well. Despite policies of organisations like Apache, hiding security fixes is a very common practice.

Clones and forks

When a dependency is cloned or forked it carries with it the same code DNA. There are a number of scenarios where that DNA may or may not be lost but in my experience, in many of the cases, when a parent dependency is vulnerable then the child dependency is as well. These are the forks of Log4J2https://github.com/apache/logging-log4j2/network/membersAs Github says Someone with more time than me could calculate all the versions that were forked from 2.0-beta7 to 2.17.0, excluding 2.3.2 and 2.12.4 and determine how many vulnerable versions have not been fixed. If you were to then look at the people that have stared those versions on Github, you would have a good list of hacking targets.

Many vulnerabilities are private

SCA vendors like SourceClear understand that FOMO sells, and that by having your own stash of vulnerabilities is highly valued by some customers. Our policy was to publish them all in our database after responsible disclosure but not to go back and claim a CVE. Other vendors have different practices. Today no one vendor has a complete and authoritative database of known vulnerabilities. Excellent projects like OSV are setting out to improve that situation but unless you pay for and use all the tools you are missing out.

Software vendors hide CVE’s beyond CNA’s

CVE numbering authorities were created to enable vendors of products to coordinate their own responsible disclosure. They effectively became the clearing house for CVE’s reported in their products. The result was that a vendor could mark a report reserved and under some circumstances take up to two years to disclose it, effectively silencing the vulnerability. I don't know if this still happens but it certainly did.

Dependencies that are not maintained create a developers dilemma

When we found a vulnerability at SourceClear and reported it to a dependency maintainer, we found that over 50% never responded to repeated notifications (we made 3) and around 10% let us know they had abandoned the code. When faced with that situation, a developer has two choices, fix the vulnerability themselves and effectively become the library maintainers or move to a different library.As you can see there are very large, deep rooted and very real issues with vulnerability data about open source dependencies. There are applaudable efforts by organisations such as the OSSF and Google to improve this situation and we are already seeing change, but even with their resources it will take time to see radical improvement, if ever.As I said earlier in this article, SCA is a no-brainer to improve the security of everyone's software development process and everyone should be doing it, but we should all be aware of the very real gaps that exist today, and not have a false sense of security.