> clusters should be treated like cattle, not pets. Off-topic, but is this reall...

slovenlyrobot · on March 4, 2020

The most common thing I've heard is "blast radius reduction", i.e. the general public are not yet smart enough to run large shared infrastructures. That seems something that should be obviously true.

People had exactly the same experiences with Mesos and OpenStack, but k8s has decent tooling for turning up many clusters, so there is an easy workaround

yongjik · on March 4, 2020

I still feel like that would only work in very niche cases.

I mean, if people aren't smart enough to run a large shared infrastructure, how can I trust them to run a large number of shared clusters, even if each cluster is small. The final scale is still the same.

iampims · on March 5, 2020

Updating 100 clusters bares less risk than updating a single giant one.

GauntletWizard · on March 4, 2020

And no SRE would allow you to run your application in a single cluster. Borg Cells were federated but not codependent - Google's biggest outages were due to the few components that did not sufficiently isolate clusters from one another.

Clusters are probably still pets to most orgs, but the lessons about how to manage complexity still apply. Each of my terraform state files is a pet and I treat it like such... but I also use change-control to assure that even though I don't regularly recreate it from scratch, I understand all that was there.

skboosh · on March 5, 2020

There are potentially quite a few benefits of being able to spin up clusters on demand [1]:

* Fully reproducible cluster builds and deployments.

* The type of cluster (can be) an implementation detail, making it easy to move between e.g Minikube, Kops, EKS, etc. After all, K8s is just a runtime.

* Developers can create temporary dev environments or replicas of other clusters

* Promote code through multiple environments from local Minikube clusters to cloud environments

* Version your applications and dependent infrastructure code together

* Simplify upgrades by launching a brand new cluster, migrating traffic and tearing the old one down (blue/green)

* Test in-place upgrades by launching a replica of an existing cluster to test the upgrade before repeating it in production

* Increase agility by making it easier to rearchitect your systems - if you have a pet, modifying the overall architecture can be painful

* Frequently test your disaster recovery processes as a by-product for no extra effort (sans data)

* Reduced blast radius

[1] https://docs.sugarkube.io/#benefits-of-sugarkube

shaklee3 · on March 5, 2020

I think for one, you cannot easily have Masters span regions without risk of them falling out of communication. Similarly the workers should be located nearby. If there's a counterexample to this I'd love to see it.