I’m generally more a “blame the tools” than “blame the people” - depending on how the system is set up and how the configs are generated, it’s easy for a change like this to slip by - especially if a bunch of the diff is autogenerated. It’s still humans doing code review, and this kind of failure indicates process problems, regardless of whether or not laziness or stupidity were also present.
But, yes, a second mitigation here would be defense in depth - in an ideal world, all your systems use the same ops/deploy/etc stack, in this one, you probably want an extra couple steps in the way of potentially taking a large public service offline.
But, yes, a second mitigation here would be defense in depth - in an ideal world, all your systems use the same ops/deploy/etc stack, in this one, you probably want an extra couple steps in the way of potentially taking a large public service offline.