This week is code ownership week for us, and this is the first of three articles that sets the scene on why code ownership matters, what is needed in a code ownership system, and why code owners files, the defacto feature in GitHub and Gitlab today falls short of being a code ownership system itself. It then goes on to explain why adopting code owners files as a first step in building a code ownership system, still makes sense for most organizations.
The next article in the series is a platform walkthrough, showing how to use a Crash Override campaign to make sure that you have code owners files for all the code that you have in production, the very first thing you should do when creating a code ownership system, and then to wrap up this short series, will be a deep technical dive into how we automatically infer the code owners from Git commit histories, solving one of the deficiencies in using vanilla the code owner files today. Inferred code owner files offer automated, high fidelity and always up-to-date code ownership data.
It should go without saying that code ownership is essential for any DevOps team and needed far beyond knowing who to assign bugs to. When a production service goes down, you need to know who the developers are that you need to talk to in order to diagnose the issue and bring the system back online. If you have an automated on-call system, you certainly want to make sure you are paging the right person in the middle of the night. When your security tools find a critical vulnerability in production, you want to know who the right person is to get it fixed, and help them avoid making the same mistake again. If you are a developer, you want to know who to talk to when a service that you rely on starts throwing errors, and takes down the service you are responsible for along with it. The list goes on.
On the surface it seems like such an easy problem to solve, and you would be forgiven for thinking everyone already does it and does it well. At a small company this may be the case because you can manage things manually or in your head, but when you move beyond a few hundred repos, and a hundred or so developers, it quickly becomes an absolute bloody nightmare, and it's a problem that gets worse and worse as companies and code bases grow older.
I know of a well known technology company that builds developer tools, who themselves took days to find out who owned the code that was causing a service outage. It's one of those things, like eating your vegetables, that everyone knows they need to be doing all of the time, but most people don’t until it's too late, then when they get caught with their pants down, they try and do better, only getting bored or sloppy or busy with other things over time, and, it’s rinse and repeat. Pants down by your ankles once again.
I am not going to try and tell you that we have solved code ownership, in fact quite the contrary, but I think we have solved a few valuable first steps, and from working with design partners we now understand enough to to feel we are on a path to a complete solution. What we know from feedback is that what we have shipped so far, and you will see in the campaign walkthrough article coming next, is that the code ownership file campaign feature has already helped our early users do significantly more with their code ownership efforts, with much less resource than they had been able to achieve before. It has saved people tearing their hair out, getting frustrated and wasting countless days of busy work.
There are two main parts needed for a scalable code ownership system:
- A Catalog
- A Change Ledger
If you don't have both of these then you simply don't have a code ownership system. You will see below, if you just have static code owners files, then you just have, well, a set of static files with some somewhat relevant, but likely stale data in them, but it’s definitely not a code ownership system. It’s crawl in the crawl, walk and run analogy.
The Catalog
A code ownership system must have an up-to-date record of the code and where it is deployed, and particularly it must have this for the code that is running in production. This is part of a Catalog, a core feature of the Crash Override platform. In the Engineering Relationship Management parlance the Catalog is the data records.
With thousands and even hundreds of thousands of repos that we frequently see and hear about in big companies, many of which are just forks of each other, hackathon projects, abandoned ideas, or clones of public open-source projects, you have to be able to cut out the noise and focus on what matters. Scoping your ownership target down to production code is a pragmatic initial approach. There are cases where you need to go beyond that such as retiring dormant repos but for now we will focus on live systems.
Where the catalog starts to get interesting is that it’s not just a map of what repo is deployed to what cloud service, it’s knowing exactly which commit is running where, and has run where at any point in time. There is no point in contacting Fred who works on an experimental branch, and is named in the code owners file, if he doesn't really know what's happening in main, and can’t debug production when you call him. This tracing and mapping of what code, not just what repo, gets further complicated with mono-repos, something we increasingly see.
It is painfully obvious you can’t map code to services manually, although some Internal Developer Portals still try. Heaven forbid people use spreadsheets but we know they do. Talk about busy work that you could never finish.
You can roll this code-to-cloud map by hand using Chalk (chalkproject.io), but we have done it all for you and wrapped all the features you need to operationalize it at scale already so you don’t need to.
Now you know more about some of the things you need to know about the code itself, you also need to know about the people working on it and determine which of those are ‘the owners’. That's the code to people mapping.
The Change Ledger
It's worth starting by saying code ownership isn't actually a great term. It’s really a developer activity system. The pointy haired boss may be considered the owner of an application but he probably doesn't know how it works so you could describe him as the ceremonial owner and not the real owner. My kids were the ceremonial owners of our dog until the real owner, Mrs C. emerged as the actual owner. She's the one that walks her and takes her to the vets. Code owners is also too broad of a term. Developers might be considered owners of repos, owners of files or even owners of functions, but the term is so widely used, let's just roll with it.
The only way to determine ‘who did what’ is to maintain a change ledger of all activity ledger across your entire DevOps process, and use that information to derive the ‘owner’ for a particular use case, a repo, a file or something else from a change or a set of changes.
A change ledger contains everything about the DevOps process and for us Chalk serves as the primary engine that collects data for our change ledger. It’s what happened inside a build, where containers were pulled from and what scripts ran post build and lots lots more. It's all the meta-data that can be captured in a Chalk report and correlated against your source code management system and cloud infrastructure.
A code ownership system must have an up-to-date record of who did what to that code. If it doesn’t it is just a‘white pages’ book, a directory of people and how to contact them. You need to know who they are, but you also need to have all the information about them, normally their activity, because as soon as you contact someone with an ask, that’s the first thing they will say, “Sure I know about that, what's up?’.
It's also important to understand how much you can trust the data in a catalog. Most developers commit code with their public GitHub account. It's a single Git config on their local filesystem and they then flit between their personal and their work orgs, using little more than a GitHub handle and an email. Some orgs may enforce signed commits, but let's face it that is the letter bomb problem. Knowing that a letter came from the Unabomber ain’t that helpful unless a signature is validating an identity you trust. Krasnow committed this cryptographically signed nasty code, but who the hell is Krasnow?
As well as authentication and authorization, if you don’t force your GitHub org to use your IDP, then you also have no idea if a person you cite in a code owners file is still with the company and should be considered a valid owner. This is a massive problem, not only for access provisional and de-provisioning but supportability after engineers have left a team. A big security value proposition in my opinion for paying for GitHib Enterprise, is to leverage SSO, in order to track users back to corporate identity providers and validate it is an identity you trust.
This is hardly a product requirements specification for a catalog that has a lot more connected data such as code, infra, builds, tools, technologies, all connected together and associated with the changes across everything, but it should give you a flavor of what is needed, and sets me up well to explain why code owners files aren’t a code ownership system or even close. The TLDR; is they are just a file.
What is a code owners file and why using them it isn't the same as having a code owners system
Githubs help page for code owners says ‘You can use a CODEOWNERS file to define individuals or teams that are responsible for code in a repository.’ Gitlabs help page is a little more specific.
Essentially a code owners file is a static file that is used to manually (by default and without using 3rd party tools) list a set of people who are responsible for a repo. There are specs for where it is located, its format and even how it is protected. You can think of it as a special type of README file.
Among the challenges with this are :
- They are static (unless you do additional things) so they go stale fast
- It's a loosely defined syntax. It's easy to mess up
- They are not part of a default repo unless created with policies and not widely used
- They are not tied to real identities (see above) or IDPs.
- They are only designed for simple ownership info and not developer activity
And that is the subset of the next article.
So why should you still use code owners files as the first step in building a code owners system?
Code owners files don't just to serve as white pages. They can be used for Git workflow automation including
- Automated Review Requests: Automatically request reviews from relevant contributors when changes are made to specific files or directories.
- Protected Code Areas: Enforce code ownership to ensure that critical parts of the codebase are not modified without approval.
They are also a feature in GitHub and Gitlab and so developers know about them and embrace them. If you want a small win to get a code ownership system in place it's a relatively easy first step. A win is a win. Advanced teams and technologies are also building automation around them such as connecting them to pager systems, so as much as they are deficient for code ownership, you can look at them as a small component or building block that can provide some data that moves the ball forward.
I view it similar to SBOMs that list components, but themselves don't solve use cases like vulnerability management or library updating. They can be used for this but it isn’t the solution. SBOMs are of course way more sophisticated than code owners files, with detailed specifications for their creation, exchange and much more, and they are far more widely adopted, go figure.
We worked with design partners to learn they wanted to take the first step into building a code ownership system and that was making sure they had code owners file for everything in production. The campaign you will in the next article allows teams to quickly filter their entire repos down to the ones in production, see which ones do and don't have code owners files and then easily raise a PR to get them created. When doing this we also automatically infer who the real code owners for the repo and files within it, the technology we will dive into in the third article of the series.
We would love to show you what we have built. We’re not going to bend your ear or twist your arm to sell you a solution you don’t need. Instead, we’ll take 30 minutes to demonstrate what Crash Override can do and how it solves DevOps challenges.
Book a demo with us here https://crashoverride.com/vip-tour
