Kustomize – Templating in Kubernetes

Benjamin_Dobell · on April 20, 2019

There seems to be some confusion here, in no small part due to the title of this article. But...

> kustomize lets you customize raw, template-free YAML files for multiple purposes, leaving the original YAML untouched and usable as is.

That's the very first line of Kustomize's README[1].

Kustomize is not a templating DSL. The conversations here mocking templating DSLs are not relevant.

The next sentence[1]:

> kustomize targets kubernetes; it understands and can patch kubernetes style API objects. It's like make, in that what it does is declared in a file, and it's like sed, in that it emits editted text.

Kustomize is a patching framework. It takes valid k8s resources, and allows you patch them, with partial k8s resources. The entire point of Kustomize to not invent something new.

[1] https://github.com/kubernetes-sigs/kustomize

haolez · on April 20, 2019

> It's like make, in that what it does is declared in a file, and it's like sed, in that it emits editted text.

Is that different from a templating engine?

Benjamin_Dobell · on April 20, 2019

Yes[1].

[1] https://en.wikipedia.org/wiki/Template_processor

mkobit · on April 20, 2019

`kustomize` is an interesting and useful tool, and it is cool to see it available directly in `kubectl` in 1.14. A lower barrier to entry is nice to have for tools like this. It is useful on its own, but I'm also looking forward to additional features like being able to reuse a single patch on multiple targets (https://github.com/kubernetes-sigs/kustomize/issues/720). I'd also like to see a clear schema for what the YAML needs to look like and what the keys do. I wish it folowwed a similar pattern as `kubernetes` resources where each resource had an `apiVersion`, and the `kustomization` itself had an `apiVersion`.

Some newer documentation can be found at https://kubectl.docs.kubernetes.io/ which is still a little bit barren, but I expect it to improve. Some of the older documentation can be found at https://github.com/kubernetes-sigs/kustomize/tree/a5bb5479fb... (before the tool was integrated directly as a subcommand)

jcastro · on April 20, 2019

Joe Beda (one of the k8s cofounders) just did a livestream on kustomize last week and I learned a bunch of stuff: https://www.youtube.com/watch?v=NFnpUlt0IuM

Full Disclosure: I work with Joe @ VMware

Benjamin_Dobell · on April 20, 2019

I just did the big 1.x -> 2.x refactor myself, and whilst it's not exactly pretty you can still reuse patches to some degree. It's just that every patch that will be applied independently now needs to be part of its own "base". You can then pull in the "base" wherever you want the patch applied. It's far from ideal, so I do hope the process improves, but it does at least work

EDIT: Although, after properly reading that linked issue, I see you mean applying the same patch to different resources. I interpreted "target" to mean build target (e.g. deployment/app), you can use the same patch across different deployments (apps) and overlays (environment). However, I see your use case is a bit different.

> I wish it folowwed a similar pattern as `kubernetes` resources where each resource had an `apiVersion`, and the `kustomization` itself had an `apiVersion`.

As of 2.0, this is now the case[1]:

> apiVersion: kustomize.config.k8s.io/v1beta1

> kind: Kustomization

[1] https://github.com/kubernetes-sigs/kustomize/blob/a5bb5479fb...

dilyevsky · on April 20, 2019

Honestly it would probably make more sense to have it versioned by git tag/branch (a la terraform modules) for straight forward gitops integration.

GauntletWizard · on April 20, 2019

I have to bring up Jsonnet [1] in every one of these discussions. Jsonnet isn't the tool you want for managing a complex stack of configurable objects, but it is the tool you need. Jsonnet gets big and ungly and unweildy at a certain point, and that is also coincidentally the point where you need to have rethought your model and refactored your config, just as much as your code.

Simple yaml is just fine, even repetitive simple yaml, and you're often better off building tooling to make repetitive updates rather than condense your config. Jsonnet is for building when you've got dozens rather than handfuls of objects.

Well defined Kubernetes objects don't actually take that much configuration, don't generally need templating. Typically, my production and staging environments are precisely the same deployment definition, except image and replicas. Replicas isn't defined in the document - it's part of the working state rather than something that gets "configured" (i.e. that field is owned by an HPA, even if it's min and max values are the same)

All configuration is stored in Configmaps and secrets. Those are more likely to be templated, but still probably not. If two values are the same in production and testing (and they should be - service names are the same, with namespace defining where they point), why configure it? Use a sane default.

[1]https://jsonnet.org/articles/kubernetes.html

markbnj · on April 20, 2019

We built and deployed a new pipeline based on kustomize and have been gradually moving some helm things over to it. The feedback from app developers is quite good, and there's no question that it is much less obfuscating of the underlying resources than helm is (if you follow all the chart conventions, etc.). And then as noted in other replies it is in kubectl as of 1.14... which I guess doesn't mean that much except it feels sort of like canonizing the approach.

That said its design is quite opinionated and committed to a declarative model, and there are some things you just can't do without falling back on either generating a patch or performing unstructured edits. A good example is something as simple as tagging an image with a string that isn't known until build time (such as the commit sha). Another is shared configuration. You can't glob object names and apply the same environment patch, for example, to more than one resource. These constraints can be considered features, but they're nonetheless constraints.

outside1234 · on April 20, 2019

Another interesting project in this space is Microsoft’s Fabrikate (https://github.com/Microsoft/fabrikate). I like that approach better in that it separates config from resource manifests and allows you to establish higher level components from lower level ones.

sheeshkebab · on April 20, 2019

Yaml engineering - now with inheritance, encapsulation, and polymorphism. Soon, we’ll move on to functional and immutable configs augmented with environment variables.

Good times.

gilbetron · on April 20, 2019

I felt a part of my soul drift away when I saw an ex-coworker put a for loop in a yaml file. Joe Beda (Heptio) talks about the problem here: https://www.youtube.com/watch?v=M_rxPPLG8pU&t=2960s

Just use Python (or Lua if you want something simpler/lighter). Or, like Brigade, even JS. Hell, use BASIC. But these bizarre template-language-DSL bastardizations are horrific when it comes to maintainability. Incremental evolution from a config file to a full language is a path wrought with peril, and only ends in damnation!

sciyoshi · on April 20, 2019

Agreed on avoiding yaml templating at all costs, but I've also found that you don't normally need a full programming language for these types of config files - if you do find yourself reaching for those tools you might be better served by something like Dhall https://github.com/dhall-lang/dhall-lang (which has its own K8s bindings as well)

gilbetron · on April 20, 2019

Huh. When I first looked at it my reaction was, "this is stupid!", which I've figured out over the years is my inner-ape reacting to the alien, and is an indication that something is the opposite of stupid. So I read more about it and it seems really cool - possibly too much for most people, but that makes it even more interesting! Thanks for pointing it out!

mkobit · on April 20, 2019

YAML templating is something I've come to truly hate. It seems that tools and services want the configuration to be "simple" by using a format that is fairly easy for humans to parse while not trying to force a language specific tool onto the users. However, when they move into the inevitable phase of wanting to be more configurable, they can move into confusing, unintuitive, and surprising territory. It also doesn't help that everybody keeps inventing their own way to do it. A few examples of YAML bastardization that give me headaches trying to understand:

* saltstack Jinja templating [1]

* GitLab CI `include` directive for including and merging external CI files together [2]

A few tools that I have seen take a decently pragmatic approach:

* Kubernetes resources that use ConfigMap and `envFrom` that declaratively say where to resolve a value from [3]

* Circle CI commands which offer some reusability with its "commands" and "executors" type features [4]. To me, Circle CI has both good and bad aspects with some templating and some clever patterns

On the other side of things, there is essentially fully programmable type configurations like Jenkins Pipeline Groovy `Jenkinsfile`, which can be a nightmare, too.

I think it is tough to find a sweet spot between making it configurable and expressive for users while retaining a low barrier of entry and not turning the configuration into a complete program itself. Tools like Terraform are trying to find that sweet spot as they slowly introduce more programmatic ways of configuration while still being declarative, like the fairly recent introduction of if statements and soon (I think) to be released for loops. As soon as users of a tool and service have more complex use cases, there needs to be some way to solve that. The most common way (it seems) is taking the easy and familiar of introducing templating.

[1]: https://docs.saltstack.com/en/latest/topics/jinja/index.html

[2]: https://docs.gitlab.com/ee/ci/yaml/#include

[3]: https://kubernetes.io/docs/reference/generated/kubernetes-ap...

[4]: https://circleci.com/docs/2.0/configuration-reference/#comma...

vorg · on April 20, 2019

> there is essentially fully programmable type configurations like Jenkins Pipeline Groovy `Jenkinsfile`, which can be a nightmare, too. I think it is tough to find a sweet spot between making it configurable and expressive for users while retaining a low barrier of entry and not turning the configuration into a complete program itself.

Nowadays, Jenkins pipelines can be configured in either the Jenkins-provided "Declarative Syntax" [1] or the Apache Groovy-based "Scripted Syntax", with the Declarative Syntax used as the default for examples on the Jenkins website. I guess they've found the best way to not have users turn the configuration into a complete program is to provide declarative syntax only in the default option. It's good to see Kustomize is built with this in mind, too.

[1]: https://jenkins.io/doc/book/pipeline/syntax/

founderling · on April 20, 2019

Often when I talk to founders who use Kubernetes, they could simply use docker and a bit of shell scripting instead.

Those of you who use Kubernetes: What is a functionality it brings to the table you would miss if you automated your docker handling without it?

vitalus · on April 20, 2019

- Simple DNS based service discovery

- namespace separations between resources

- optimized resource utilization and container scheduling

These are the biggest 3 for the smaller orgs I’ve been a part of

founderling · on April 20, 2019

Interesting. Can you give an example of why DNS based service discovery was needed?

markbnj · on April 20, 2019

> Can you give an example of why DNS based service discovery was needed?

Not to be flippant, but any case where things need to connect to other things. At scale they usually need to connect to other things through some sort of load balancer. You get that out of the box with kubernetes for services hosted inside the cluster, and there are straightforward solutions to ingress for clients outside it. Another important feature is pod scheduling. Yes you could wire up a few machines using docker compose and any of a few different networking approaches, but if one of your VMs dies are those workloads going to move to a healthy instance by themselves?

seabrookmx · on April 20, 2019

Need is a strong word.

But you don't need containers either.

Service discovery just makes it easier to link up services if your architecture is truly microservice. We used to have an incredible amount of config that tightly coupled our API's.. now we use a combination of service discovery and an API gateway (Ambassador) to decouple the services, cut down on the number of random endpoints in our config, and we also get the added benefit of load balancing, rate limiting, and additional logging.

There's always a tradeoff with scale. If you have four servers than obviously all of this stuff is overkill.

andrewstuart2 · on April 20, 2019

> If you have four servers than obviously all of this stuff is overkill.

I disagree, actually. I have four servers at home, and have some pods that have been running little tinkery things, and a bunch of open source software, with ridiculous uptime and little or no effort, even when I reboot one of those "servers" to do some gaming on Windows.

Now, do I need that uptime for all of those services? Not really. But for some of them, I want it, and it'd be annoying if I had to go figure out why they'd stopped running. The reality is, things just keep ticking without me worrying when they're on k8s.

These skills transfer into very in-demand job skills as well, and if I ever build anything that gains traction, I already have all the tools, configs, and knowledge to deploy that app across 500 generic cloud servers.

jrockway · on April 20, 2019

I agree with you. I think people tend to associate Kubernetes with the other underlying problems they're having with their infrastructure when they start thinking about using it. Just like it's tough when moving from 0 pieces of software in production to 1 piece of software in production, it's just as tough moving from 1 piece to 2 pieces. But if you do that transition correctly, then the 2 to infinity part is easy. I think you will find it just as painful to make that move with any orchestration system. (CloudFormation? Convox? They're not easy, and you get the feeling that nobody else is using them.)

I wouldn't recommend Kubernetes if you only have one application you run in production. Just rsync your production image to production whenever you remember to do a release. But if you have more than 1 thing, it's time to start thinking about it, because the "do whatever" that works great with 1 thing starts to break down when 1 becomes 2. That is not Kubernetes's fault. That's just the nature of the beast.

andrewstuart2 · on April 20, 2019

Because I can run `ping` in a busybox container that knows how to do all the same service discovery as a more complicated fully fledged microservice. No extra libraries. No smattering of supporting services (on top of DNS, which you need anyway, of course). Setup and debugging is 10x simpler than any of the more complicated service discovery solutions.

And it just works. Simply and intuitively. With everything. Because it's been an IETF standard since 1983.

Benjamin_Dobell · on April 20, 2019

Unless you're already a k8s guru, then I don't really recommend k8s for most start-ups. Also, even if you are contemplating moving to k8s, I'd suggest waiting until https://github.com/kubernetes-sigs/cluster-api-provider-aws stabilises.

In saying that, I've just performed a reasonably significant migration to k8s, and despite the significant time investment, I am quite happy with it.

However, prior to this we spent 5 years on Dokku[1], which is freaking fantastic project and was for the most part more than enough to meet our needs. It solves pretty much all the same issues start-ups use k8s for, with a lot less overhead.

The reason for the migration was simply that we've reached the point where our clients demand improved reliability (redundancy) and somewhat coincidentally we'd outgrown some other infrastructure; which on its own would require a large migration. So we moved to k8s at the same time as rejigging our infrastructure, for redundancy and future-proofing purposes.

[1] https://github.com/dokku/dokku

solatic · on April 20, 2019

We're a B2B company with a microservice-architecture SaaS that wants to start to sell an on-prem offering. Kubernetes is a no-brainer.

Even if we had a monolith instead, and no plans for an on-prem offering, ultimately I still think that a managed Kubernetes offering makes sense. Efficient resource utilization is all the more important for small companies, and once you have more than a handful of servers, Kubernetes makes it much easier to right-size your fleet by handling all of the scheduling for you.

If you have a couple of servers and that's it, then sure, Kubernetes isn't giving you much. But if you're building a professional offering then you're likely to outgrow that couple of servers pretty quickly.

founderling · on April 20, 2019

Are you sure you can't simply run it all from a single server? Computers are pretty fast these days.

solatic · on April 20, 2019

It's a big data company doing an amount of traffic per day that exceeds what you can fit in RAM, including those fancy x1.32xl's with 2 TB RAM and a monthly cost of ~$6,000 for a reserved instance.

Yeah, I'm pretty sure we can't simply run it all from a single server.

marcinzm · on April 20, 2019

And then the server goes down and you've got downtime. Or you want to release a new version and you have downtime (ie: no rolling upgrades). At some point you end up re-inventing enough of k8s to makes it sort of silly to not use the actual thing.

marcinzm · on April 20, 2019

I run a data infrastructure on terraform and kubernetes. Things it gives me out of the box:

* DNS. This lets all the various components talk to each other across nodes (Airflow, Spark, Zeppelin, etc.). I can also, via VPN, connect to things to look at what's going on.

* Load-balancing. I create pods and then they run on the node where there's capacity without me thinking about it.

* Auto-scaling. I spin up pods and nodes get created to handle the load.

* Helm charts for things so I don't have to figure out how to run them myself.

* Built-in support in the tools I use. Airflow will spin up k8s pods to run tasks. Spark will spin up pods to run a job. Etc.

jhgaylor · on April 20, 2019

I feel like getting to the point that my container restarts when it dies is hard from just userdata loaded in when the instance is created. Everyone will let me spawn a base image with docker installed and shove in a userdata script but I want to say "please run these 3 docker images and if any of them die restart it" Additionally in k8s I can say "oh and update my dns provider to point to this pod and by the way stick TLS in front of it". I like those last bits a lot too.

jjeaff · on April 20, 2019

I only run on a 2+ node cluster.

-Rolling deployments

-auto scaling pods and nodes

-health checks and self healing

quickthrower2 · on April 20, 2019

That's a question I ask myself. I use Kubernetes because I inherited a Terraform/Kubernetes project. I like the way it all works now I have learnt it, but I wouldn't have been brave enough to set that lot up on my own. I'd probably use a VM and Shell script. But I'm glad for the experience.

speedplane · on April 20, 2019

> I use Kubernetes because I inherited a Terraform/Kubernetes project... but I wouldn't have been brave enough to set that lot up on my own. I'd probably use a VM and Shell script.

If you use a shell script to handle a cluster with a dozen docker images and nodes, with intermittent crashes, out-of-memory issues, running out of disk, or network failures, your shell script will be so complex that you will basically need to recreate kubernetes from scratch.

quickthrower2 · on April 20, 2019

Good point. The setup is quite simple in terms of number or containers/nodes. We are using Azure so would probably use cloud features to do the things you mention. A load balancer and a couple of VMs for example, with auto scaling set up.

speedplane · on April 21, 2019

> The setup is quite simple ... We are using Azure so would probably use cloud features ... [a] load balancer and a couple of VMs for example, with auto scaling set up.

Load balancing and auto-scaling are some of the more basic and popular services that kubernetes provides. If your platform already provides them for you, it makes the case for kubernetes weaker. That said, kubernetes can be run on multiple cloud providers, and provides far more features.

Once the complexity of the cluster grows, and vendor lock-in issues increase, kubernetes makes more and more sense.

cyansmoker · on April 20, 2019

Adding to what was already said:

- being able to force better design when engineers have to work with multiple services depending on each other (using start-up containers, readiness and liveliness, etc)

hueving · on April 20, 2019

Resume boosting. Using shell scripts is lame and straight forward. If you have a declarative configuration though you can scale to thousands of nodes with nary a thought.

marcinzm · on April 20, 2019

If you have a single service that runs of thousands of nodes then okay. If you have dozens of services that run across even dozens of nodes, that interact with each other, that have resource constraints, that need rolling updates, depend on external volumes, require security policies, etc. then you eventually recreate enough k8s to make it simpler to just use k8s.

jrockway · on April 20, 2019

What service did you scale across thousands of nodes with a shell script?

skboosh · on April 20, 2019

Network policies, RBAC, the ecosystem...

kjgkjhfkjf · on April 20, 2019

Consider instead writing a small program in your favorite programming language that generates the YAML, and then piping the output of that into `kubectl apply`.

jazoom · on April 20, 2019

Why would I want to write my own Kustomize when this one already exists?

Benjamin_Dobell · on April 20, 2019

Yeah, I replaced a custom resource patching solution with Kustomize. I was very concerned my custom solution would inhibit on-boarding, and wasn't thrilled with Helm. Back in November I was pretty happy when I stumbled across Kustomize, it claimed to do exactly what I wanted it to do. In reality there were a few hiccups, but it's certainly moving in the right direction.

Granted, I still do have a bunch of "opinionated" scripts, but they delegate the heavy lifting to Kustomize, and all the configuration itself is just stock-standard k8s resources.

bryanlarsen · on April 20, 2019

We just switched from a mess of incomprehensible kustomize to this, much nicer. Jkcfg.github.io makes this a lot easier. JavaScript may not be anyone's favorite, but everyone knows it.

state_less · on April 20, 2019

Thanks for the pointer to jkcfg. Our helm chart templates are getting too complex for the tooling and it's making me wish we were using a more powerful programming environment. One where we could reuse code more easily across projects. We could still write our configuration code in a declarative style, that's a matter of code structure.

I do like helm's interface around packages: install, upgrade and test. I like that packages have a tree structure that you can compose (if you opt to do so). It'd be nice to extend the interface further to have other common actions you could apply to a more advanced package (e.g. backup, restore, chaos test, add tenant, remove tenant, etc).

nahname · on April 20, 2019

Recently refactored some of our yaml/erb and it worked out quite well for us. With the addition of fixtures to show current vs next form of the entire yaml, it has made altering our cluster much easier to follow and keep changes consistent.

bobske4 · on April 20, 2019

A DevOps guy walks into a bar. He replaces the bartender with a docker-container then replaces the whole bar with Kubernetes. He sits down and orders one drink.

vbsteven · on April 20, 2019

I was looking at Helm the other day for merging common base values with per-cluster specific overrides but this looks like a simpler solution and integrated into kubectl.

skboosh · on April 20, 2019

That's one problem sugarkube[1] aims to solve. But it goes further and allows you to spin up k8s clusters from scratch and install all your stuff onto them. It can bootstrap an AWS account first (e.g. to create S3 buckets for kops/terraform), handles templating files, and gives you a hierarchical configuration. It's still under development but should be very flexible once it's at MVP in a month or so.

The aim of all this is to allow you to avoid tricky in-place cluster upgrades and to just spin up a new cluster, direct traffic to it and tear down the old one. An extra benefit is that it would allow you to give each dev their own k8s cluster in parity with your live envs, but they could select only a subset of charts.

Check out the sample project[2] for more of an idea about how to use it (but the actual sample is probably broken since it's under heavy development at the moment).

[1] https://github.com/sugarkube/sugarkube/

[2] https://github.com/sugarkube/sample-project/

dilyevsky · on April 20, 2019

Which part of helm (I’m assuming v2) is integrated into kubectl? It has its own client and installs its own server component (tiller) which is difficult to secure.

dlor · on April 20, 2019

I think the op is saying that kustomize is integrated into kubectl, not helm.

dilyevsky · on April 20, 2019

Apparently solution to heaps of unmanageable yaml is... templated unmanageable yaml

dqpb · on April 20, 2019

There really isn't any need to templatize kubernetes. They have auto-generated client libraries for most popular languages which let you define configurations as code, and serialize to json/yaml of that's what you want.

shaklee3 · on April 20, 2019

Can you give some examples? I don't understand what you mean by autogenerated client libraries, unless you mean the python and go clients. Those aren't for kubernetes yamls, though.

dilyevsky · on April 20, 2019

Yeah client-go typed clientsets are codegened. It would be really labor intensive (and error prone) to write go code for every deployment.

May I suggest a middle ground =) github.com/stripe/skycfg (checkout //_examples/k8s that I added)

happythought · on April 20, 2019

I’m interested in learning that model, but a readme in there that explains how to go from nothing to a deployment might help.

Edit: also the fact that you have a 4 month old outstanding pull request isn’t giving me any warm fuzzies about the maintenance and velocity of skycfg.

dilyevsky · on April 21, 2019

nginx.cfg and app.cfg have an example deployment for nginx and the rest is covered in main README.

Yeah not sure what is going on with that pr, tbh i forgot about it b/c we implement it in our own runtime. I got a bunch of others merged quickly since then.

happythought · on April 21, 2019

Thanks, I saw those and they seem clear enough. So the go code is unrelated to nginx.cfg and app.cfg?

dilyevsky · on April 22, 2019

The go code under _examples/k8s is a demo client that will actually stuff `main` output into Kubernetes api endpoint.

dqpb · on April 20, 2019

It would be less labor intensive and less error prone.

dqpb · on April 20, 2019

The python one is a good example: https://github.com/kubernetes-client/python

This is auto generated from a swagger API spec. Of course the intent is that you define your service and launch it with the client, but if you want a yaml file you can dump the objects to yaml - it would be far better than templating.

shaklee3 · on April 20, 2019

This is what I meant. This is in no way related to kustomize or what it does. One is to make hierarchical templated deployment files, while the other is to connect and interact with the API server. The client library is more analogous to kubectl, which consumes yamls.

dqpb · on April 20, 2019

I'm telling you it is. Look at the models in the repo here:

https://github.com/kubernetes-client/python/tree/master/kube...

or the docs here: https://github.com/kubernetes-client/python/tree/master/kube...

These are python classes that serialize to dict which you can dump to yaml. In other words, you can define your kubernetes spec as python classes, with variables, inheritance, and all the other good things that come with a programming language.

Once you have your kubernetes spec, you can either deploy it with the endpoint helpers (https://github.com/kubernetes-client/python/tree/master/kube...)

Or you can serialize to yaml if what you want is to generate a bunch of yaml. There is no need for a yaml template language.

shaklee3 · on April 21, 2019

It's not the same. I have written custom scheduler with the python client API. It is missing many things from the go API, despite it being an officially supported language. You will encounter things that you just can't do with the python client, and end up hacking in your own custom yaml. what you pointed to will definitely work for most of the basic resources, but it certainly won't work for everything.

Although, it's good that you pointed it out since I think it will help people that need fairly basic resources without doing something more complicated.