Agreed. I think many people do a good job of logging when something goes wrong, and maybe they are good about logging inputs/outputs of a system, but I think that logging important decisions often falls off the table. Ideally I would like to take a request ID, grep the logs, and get an entire story of what happened to that request. In reality, this rarely happens!
> take a request ID, grep the logs, and get an entire story of what happened to that request. In reality, this rarely happens!
If that's the situation you find yourself in, I cannot praise centralized logging with a good frontend highly enough because I frequently find myself trying to figure out what happened to a request, and it's like night and day.
Needing to ssh anywhere and run grep against log files is functional if there's only one or a handful of VMs, but it gets complicated with a handful of machines, and even just SCP-ing the logs off becomes time consuming if there are a lot of machines. Then once the logs are off, 'grep' quickly becomes inadequate. (And I should know, I've built some truly horrible regexps to try and grep for dates because I didn't know any better.)
All that friction means that answering the original question; figuring out a detailed internal reason for why my customer received a 500 http status response error, is just too toilsome for all but the most (as you noted) doesn't happen in .
With centralized logging, I'm able to search for a request ID and see the logs, and this is a reality as often as I need, in order to debug complex multi-system issues.
Word of advice: the 'jq' tool for handling JSON files (couple with a glob like '*.log' or something fancier with xargs or parallel) will absolutely save your bacon in those situations. It's way more powerful than it appears on the surface.
We had a series of Docker json-file driver log files. It's done as a raw list (no array around it) of JSON objects -- which is a bit annoying to sort and filter based on properties of the objects.
'jq '[inputs]' (asterisk).log > combined.json' was my favourite command today; it combines all the files inputs and wraps them in an array correctly. No awk needed!
Combine that with its cute:
jq '.someProp as $var | test("some search"; "gi") as $r | if $r then ($var + $__loc__) else null end' (asterisk).log | grep -v "^null$" > filtered.json
And you're away to the races. Can then load the file directly in and group_by(.somePath) and it will all magically work!
Edit: had to remove the actual asterix symbols as they screw with formatting but are used for globbing the file names. Replace with the real character
True but even with a centralized logging system, if the logs are not good enough you can find yourself still wondering what the hell happened. Grep here is just the tool to extract the "story".
Very good. I would add that I like my git commit messages to be narrative of the development process. Hmmm...I wonder if someone could write a science fiction short story using only git commit messages?
That's where I was going when I wrote this post. Logging and log analysis is a large part of the continuous delivery tooling that I've been working on for the past 7 years.