So after taking a brief break to write about Twitter, because that’s everyone’s new favorite hobby, I wanted to revisit part of my central thesis in my posts on platform engineering – that it’s hard to find places with actual cross-functional teams capable of doing everything needed to build and run an application or service from concept to being used in production. I’m not totally sure why this is something that organizations don’t want to do, but I still don’t platform engineering is the solution (or as I’m sure some companies will try to spin it, “compromise”).
Part of this is likely because some people that are part of the software development process but not heavily involved in every single ticket. For example, a technical writer documenting a bugfix, or a small change to an existing feature. Putting 1 per team can seem like overkill when 1 writer can support multiple teams. Another part of this is that there are some roles that are best served being organization-wide instead of per-team, like UI/UX designers in order to keep the look and feel consistent across everything.
Operations seems like it falls into both of those buckets, especially if you’re not deploying at the end of every sprint (or more often). You’re not deploying every spring, so it’s easy to assume there’s not as much for them to proactively do. Having “paved roads” is a good practice (and what you get in an organization that focuses on getting platform engineering right), and that does require operational expertise sitting outside the day-to-day development. The problem with this thinking is that operations is such an integral part of day-to-day software development, particularly with the rise of the “you build it, you run it” philosophy. With developers actively supporting their applications in production, it makes no sense to isolate them from operations. They’re generally still on the hook for support, and going to have to be called anyways if there are issues that can be traced back to code. Even if developers have visibility on what’s happening in production – the team responsible for the software in question still lacks the ability to do anything about it.
Separating development and operations puts a wall between people writing code and that code running in production. In theory, code only crosses that wall through a process that is supposed to enforce quality. In practice, you can replace “process” with “checklist,” and the only effect is that it makes getting changes deployed slower, and more painful, without necessarily increasing quality. Siloing off the ability to deploy and run code from the team responsible for building it doesn’t change the level of testing that goes on before trying to push to production. It just adds emails, meetings, and delays to getting changes that make users lives better out into production where those changes can actually make money. The reason DevOps started taking off is because when companies actually embraced it, code was written with operational concerns in mind, and processes were adapted to enforce quality throughout (rather than a formal handoff with checklist).
Often people will point out that the solution to this problem is continuous deployment or platform engineering. Continuous deployment requires a level of operational maturity that most companies don’t have. and platform engineering is likely to be implemented about as well as Scrum and DevOps, so I’m not optimistic about it actually working, despite it’s clear potential for improving software development. Embracing platform engineering also doesn’t excuse companies from having application and service developers work closely with the operations team building and maintaining the operations platform. After all, platform engineering is about building an “internal development platform,” and that means making sure you understand what your developers need in order to run their code efficiently and reliably so you can actually build it.
Putting every single role that’s involved in releasing software onto each and every development team is a lot harder than it seems at first, because some roles actually do better as a separate team serving the entire organization, some roles don’t have enough work to justify being at the team-level, or because some companies just refuse to put necessary people on teams under the mistaken belief that separating people who need to work together helps ensure quality. Operations tends to be the third category. Even if you push your development teams to own their own code in production, you still need to make sure those teams are equipped with the expertise to do that well. Current software development trends certainly encourage operations teams working at the organizational level, but that’s not actually a license to “wall off” developers from operations, nor does it mean you’re adopting platform engineering. Separating development and operations back into 2 teams is something that only works after different teams with both developers and operators have worked together long enough that they’re able to establish organizational best practices and figured out a way to turn those into reusable tools, libraries, and application templates.