So much this! I've long wanted something where by I can chain, into a sort of tree, to more easily follow or hide large subcomponents and subservices. For example, say I make two calls in my service A to two other services B and C, and all report logs to a centralized location; that could get rendered as,
─ info: GET /foo/endpoint
├─ info: user <user ID> valid auth
├─ info: request to service B
│ ├─ debug: opening new connection
│ ├─ debug: success, result = ...
│ └─ info: request took 1 seconds
├─ info: request to service B
│ ├─ debug: opening new connection
│ ├─ debug: success, result = ...
│ └─ info: request took 1 seconds
├─ info: preparing result took 1 seconds
└─ info: http request took 3 seconds
with the ability to hide sections, perhaps grep for only certain messages (particularly if you keep the formatting and the message separate, this should be doable, I think), attach metadata to messages…
As it is, we have a fairly standard shove everything into syslog, and then pipe to a downstream logging system and a local file. But the downstream system is not very good at search (this is probably mostly our fault) and requires the message to be in JSON, so the stuff in the log file is _also_ JSON, b/c that's what syslog got. There are definitely better ways with our existing tools, but it sure makes one dream up what the perfect logging solution could look like.
> As it is, we have a fairly standard shove everything
Assuming "everything" is "static log statements" which is the first issue I have with this, a developer must have thought before deployment that this could be useful, which usually results in a lot of useless garbage and a lack of actually useful information. And little by little you build up actually technically useful data collection, and rather than being spread throughout the codebase all of the probes are centralised and readable.
I've been thinking about using dynamic instrumentation tools (bcc/dtrace) for that purpose instead, you know you need something when you actually do need it, at that point you can add it to the probes/instrumentation (which is external to the program and deployable separately), and all information would be collected in a structured form in a database you can interact with (probably not something relational).
If you're seriously interested in building this and would be open to employment for that purpose – please drop me a line (contact info in my profile). This absolutely deserves to exist and it would be a natural extension of what we're building at Scalyr. I'd love to talk.
Tracing calls across threads/servers is exactly the kind of stuff that should be easy and out of the box. If we had this built into a kind of universal logging protocol, you can imagine some viewers getting extremely sophisticated with network-level analysis, parsing & plotting times for distributions, anomaly detection, etc. Other viewers might be more geared towards mobile development, which might emphasize where network requests were generated in the code and its effect on the UI. Both should be possible from the same input data stream.
This is what I did my MSc thesis on. I should consider open sourcing the framework, even though it’s relatively “homework grade”
A couple of places to look though:
- opentracing.io
- Fonseca et al’s X-Trace work
I don’t have a link handy, and the visualizations haven’t aged well, but you can probably find a copy of my thesis or conference paper under Anthony Arkles in google scholar. I extended the X-Trace protocol a bit to make it easier to reassemble function calls that potentially had parallelism.
As it is, we have a fairly standard shove everything into syslog, and then pipe to a downstream logging system and a local file. But the downstream system is not very good at search (this is probably mostly our fault) and requires the message to be in JSON, so the stuff in the log file is _also_ JSON, b/c that's what syslog got. There are definitely better ways with our existing tools, but it sure makes one dream up what the perfect logging solution could look like.