We Need Modern Product Security Certification And We Need It Now

Certification blog

To restore trust in the claims of security vendors, we need independent testing

If you have been reading my articles, then you will already know that I am on a mission to try and clean up the security industry. I am not alone. We have bullshit paid awards to make desperate startups look like they are successful, all the time lining the pockets of unscrupulous marketing companies. I don’t know who on either side of that equation thinks they are kidding. It is like a local public toilet buying an award from the local toilet cleaning company. 

We have unethical marketing teams claiming that their products have zero false positives. They magically perform flawlessly against industry benchmarks, that of course are not actually industry benchmarks. I have cases of ‘public’ benchmarks being written by and fully loaded by a vendor going back to 2017.  I have another case of a DAST benchmark that one vendor wins against, being ten years old.

We now of course, have vendors claiming their advance use of AI and LLMs, to make them ‘next generation’ technology. 

The problem is in fact so bad, that even after we pushed the website (that may look like is in a lull but it is just getting going, trust me), some startups(1 and 2)  that are complicit in participating in silly awards themselves, and even one that runs a directory for people looking for them called The Security Awards Guru, tried to get in on the action and pledge to stop it. The absolute gall of it. It's like Shell and Exxon signing a pledge to be environmental campaigners. 

Dan Cuthbert posted a thought provoking tweet yesterday about how we really have no idea about what security practices take place during software development. This and benchmarking research I have been doing for my talks on the Myths of Software Security, the next public one in Brussels next week, got me thinking. How can consumers ever have trust in the products and services being sold in an industry with this kind of marketing behavior and this kind of lack of transparency about how products are built?

It seems to me the only way to do this, is to have a certification scheme that can create a scalable way to provide assurance. The downside? I have no fucking clue how this can possibly work in reality today, but I have a toy proposal that I think is worth a discussion, and at least could raise the bar from the bottom of the Marianna Trench to perhaps sea level. Raise the bar is the key phrase. This would by no means solve the problem, but if we wait for perfect or even good solutions, we will find it will be just like the story about Windows NT below, and what we come up with will be irrelevant by the time it's operational. 

To judge this proposal on its merits, or tear it apart, it's worth looking at some of the ways previous certification schemes worked, to understand why much of the prior art is not relevant today.

Since 1994 we have had the Common Criteria, which I first learned about as the Orange and Red Books when I was studying Information Security at Royal Holloway in 1997. To my absolute amazement there are still products being certified today. I guess people are lining up to buy certified network security products from Huawei. Some people will never learn. There are many other schemes but this is the one I know most about so I will use it as an example to illustrate my points. 

Putting aside the relationship with the US government, one of the first problems with the Common Criteria was the complexity of review and the time it took. For instance, when Windows NT was first certified, it took years from submission to certification. By the time the certification was complete, the version with all of the relevant patches that was suitable for deployment, never matched the original target of evaluation. Without wanting to sound like a big head, I’ll do it anyway, I proved, cryptographically, that  the Windows password scheme was broken for my Masters thesis. It took me two weeks. PHAC107 was my Vax username if you have the Computer Security book and look at the Windows section. I did it at the same time L0pht built L0pht crack, and Alec Muffet built Crack, which by the way I had been running on a very powerful machine at the time, which wasn’t mine. 

That is the first of several confessions in this article. Bless me industry for I have sinned, but at least I am honest about it and know where the bodies are buried. 

This challenge of the time for results, is of course unworkable in todays environment when DevOps means we are pushing software constantly. If you certify one version of the code, the version that is running and that you care about is almost definitely not the version that is in production unless the certification was 100% automated. 

Another related problem (they are all related), was what is referred to as the target of evaluation. At the time I was studying, Windows NT was certified against what was called the Orange Book criteria, essentially a computer with no networking, versus the Red Book which was a system with networking. That was because the simplicity of evaluation, i.e. the target, was significantly smaller than a system connected to a network, where the network stack and what passes over it has to be considered. This in turn led to a faster evaluation process, but one which I think was totally useless. If you have an air gapped system, then so many threats go away in lieu of physical protection, so that's what you would do.

The target of software, especially online applications, is incredibly complex today. Not only do you have to consider the infrastructure, not only in terms of it as a dependencies, recommended reading : What the bloody hell is an application?,  you have to consider infrastructure as code. You also need to consider all of the API’s and other applications being consumed. The Orange Book criteria of no networking, probably starts to look like an understandable option at the time. 

Another problem with certification is who does it, and here I have my second confession to make. When I ran the product team for a very, very large subscription business that also processed credit cards to pay for it, we were PCI certified, whatever that actually meant. I used to buy two tests each year from two services firms, one very good and one totally useless. I knew we would always pass with the totally useless one, so I got the good firm to do their review first and the useless one last. Every year, the good one produced meaningful results for my team which we took action on, and while expensive, was good value for the money. The useless firm produced nothing meaningful, always passed me and because they were dirt cheap, were amazing value for money. Blame the game, not the player. 

I learned this trick from being on the other side of the equation years before, when I ran a consulting team. Yet another confession. PCI certification was, and is such a commodity, that if you wanted to make money from  it, you did the bare minimum with your cheapest and least expensive staff, interns. We had a human checklist, an interview script and a set of Perl scripts to do the technical evaluation. 

Another issue with the Common Criteria was that you needed to put the source code into escrow for review. While this sounds like a relatively simple process, it's not, and wasn’t even back in the day. Getting software that can build so you can match the source and object code is a bastard in itself. Before that you need to know what code is needed in the first place. Back in the day that was a million build scripts, today it's probably a load of Terraform, Ansible, Python and shell scripts. Yes, shell scripts. Reproducible builds for the vast majority of people are a pipe dream today. 

As far as I recall, and I may be completely wrong on this but don’t have the time to really look it up, the Common Criteria had no notion of evaluating the practices used to create the software. Sure it was an evaluation of a target and not the supply chain, but as we know the supply chain has a significant effect on the assurance of the software that pops out of the other end. 

So what does all this teach us?

  1. Who (or what) does the evaluation matters. Different people will produce different results and you can’t make the evaluation have an economic component or it will be gamed. 

  2. Time matters and so any scheme today would need to happen fast. Very fast. 

  3. The target of the evaluation is critical. The source and the target need to be able to be matched. You need to bind the deployed software to the code repo and the build.

  4. The process that software is built on, can not be directly bound to specific issues, but it is indirectly related to the software quality and so does provide a level of assurance.

So given that we have nothing whatsoever in my eyes that's practical for everyday use today, what could we do that would at least raise the bar?

  1. I think there could be a scheme administered by a truly independent body, a World Bank type of thing. I am sure there are better options and for total transparency, I worked in the security team for IBRD in my twenties, so it was a high integrity organization that immediately came to mind, but I think a body that has a vested interest in raising the bar for the global economy is a better option than something with ties to the industry. You can't have the payment card industry trying to transfer risk to business. You can’t have the US government who many people don't trust. If a truly independent body set out the rules for a scheme, and provided oversight, it would be a step forward. We could allow people to self-certify with random audits, and of course a giant black-eye for anyone found gaming the system. If we had procurement requirements from large buyers that required certification, and say Oracle were caught gaming the system, it would have massive financial ramifications for them. 

  2. We need to accept the limitations of automated tools, and accept that speed of evaluation trumps accuracy and completeness. They are, unless they are total garbage, directionally correct. Being able to run and produce a result when a build runs and bound to 3 (see below), would again raise the bar. To understand even the directional correctness of automated tools, we would of course need reasonable benchmarks to be able to judge the merits of the tool in question. That's not a DAST company using a benchmark repo abandoned ten years ago, and producing near perfect results. No one should ever buy a product from a company with that level of ethics, and if you do some research on them, you will find an independent test shows the results they claim are bullshit anyway, and that's results against a loaded test bed. 

  3. You have to know what's being evaluated. This is no easy problem to solve as very few online services will ever want to share their source code, so the best we could probably hope for is a scheme that would allow a ‘trace back’ in the event of a prosecution or official audit. Once again, for the sake of full-transparency, we are about to release an open-source tool, and commercial offering solving the code provenance problem for organizations. It's for different use cases, knowing what code is deployed where, so you can focus on what matters, i.e. what's in production and ignore the rest. Know where vulnerable libraries are deployed versus trying to update archived software. Know who owns code in production and more. 

  4. Creating a way for software producers to attest to the security practices they follow, and a way for that to be audited, would at least provide some level of assurance that when compounded to the other factors would be useful. You could determine basics like is there a security team and if they do practices like threat modeling and on what targets. I know of well known startups producing online security tools that manage very sensitive data, with no security program and zero security staff. We have Rich Smith, former CSO of Etsy in case you were wondering.

Many buyers already produce vendor questionnaires asking vendors and service providers about their own security. The Cloud Security Alliance and their STAR scheme provides a way to pre-publish repeatable answers to a central registry, with the goal of not having to repeat the same effort over and over again. I think the concept is solid, but the CSA is, as far as I am aware, self-funded, and has all of the implications of needing to attract and retain submitters. 

  • Would this provide assurance that software is being produced with security in mind? No.

  • Would this provide assurance that software is being produced that is free of defects ? No.

  • Would it be better than what we have today and at least provide some basic level of transparency into the security of how the software is produced and its security quality? Yes. 

  • Is that better than what we have today? Yes. 

Will it happen? I doubt it, although as a teaser, maybe we could take it on under the new software security foundation that is being setup now. Yes it's happening! 

Certification blog