Much different. When you are reporting histograms you can combine them and see t...

nvarsj · on May 14, 2022

Can you elaborate a bit? You can do the same in Prometheus by summing the bucket counts. Not sure what you mean by “true p50” either. With buckets it’s always an approximation based on the bucket widths.

spullara · on May 14, 2022

Ah, I misunderstood what you meant. If you are reporting static buckets I get how that is better than what folks typically do but how do you know the buckets a priori? Others back their histograms with things like https://github.com/tdunning/t-digest. It is pretty powerful as the buckets are dynamic based on the data and histograms can be added together.

gttalbot · on May 14, 2022

Yes. This. Also, displaying histograms in heatmap format can allow you to intuit the behavior of layered distributed systems, caches, etc. Relatedly, exemplars allowed tying related data to histogram buckets. For example, RPC traces could be tied to the latency bucket & time at which they complete, giving a natural means to tie metrics monitoring and tracing, so you can "go to the trace with the problem". This is described in the paper as well.

teraflop · on May 14, 2022

That is also possible in Prometheus, which is why I made the comparison.