-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured logging #1373
Comments
Some comments after musing on this topic for a day:
Lazy evaluation could be as simple as wrapping every log invocation in a |
What if we would change our logging statements to something like:
Metadata would include data about location of the call and some context such as:
Here is one example which illustrates the idea:
We can define macro to include ?LINE and ?MODULE for convenience if we want to. Since all of the arguments are key-value pairs we can use any serialization format (depending on logging infrastructure used):
There might be a need to define expected format for every argument. For example in which format we would send dates? We need to think about it we we design the FormatWhichSupportsBindings language or choosing the library. |
The metadata inclusion is inspired by Elixir logger. Which has following fields: https://hexdocs.pm/logger/Logger.html#module-metadata
We can include |
We also need to include timestamp in metadata or pass it as a separate argument. In order to have exact timestamps which correspond to the time when log function is executed rather then when we format the message in the logging backend. |
@iilyak have you looked at the Erlang 21 logger library? It seems like a decent target. |
API for Erlang 21 logger looks good. Logging macros
LOG_ERROR macro injects additional info to provided Metadata map. The injected info:
As you can see the logging macro uses Additional features
Implementation details
Default handlers are implemented as gen_servers. Overload protectionDefault handlers ( If the handler process is getting overloaded, the log event The ConclusionThe Logger library has good API and useful features. Implementation is robust and should provide good performance. The only place which could be optimized further (based on one hour review) is configuration module. Since it is implemented using ets which is the second fastest way (the fastest way is generated module). |
iirc we switched from lager because 1) it was too coupled with couch codebase and we wanted ability to change backends (apache/couchdb-couch-log#4) 2) we didn't want lager's overload protection, not with how it handle it with just quietly dropping the messages 3) and it was cumbersome to deal with all together, with its use of I like the idea of structured logging, but I'm not keen on proposed here change to I don't know how Elixir logger works, but personally I like how logrus went about idea of structured logging. They keep all meta or "fields" separate from a message and encourage to simplify the message itself, so it wouldn't need to include meta info. Quoting: "We've found this API forces you to think about logging in a way that produces much more useful logging messages.". Sure one still can use This is how I'd go about adding structured logging to
A bonus section:
|
Can you clarify how you are planing to use Args? Is it the same as we currently do:
|
@iilyak yes, exactly. If we think about message as just another attribute in log's meta, then it's not formatter's job to modify attributes, but to format the whole event to a string, JSON, key=vals or whatnot. For lazy evaluation and reduce cost of |
I am trying to understand what JSON formatter would do with |
You seems to miss my point, so I'll try again. Log message is just another attribute in log data, nothing more. It's not some special thing that every formatter have to know how to modify, that modification would happened before passing data to formatter, around the same step where we'll have time, level and the rest of auto-attributes injected. Formatter then only responsible for representation of log message. If you want some attribute been concatenated into message it need to be presented in both For example for text formatting instead of log message "Compaction started by pid <0.11.0> for shard \"shards/00-ff/mystic.123456\"" you'll get "Compaction started. pid=<0.11.0> shard=\"shards/00-ff/mystic.123456\"" - it's still readable and actually better greppable and you don't need to deal with I hope that makes sense. |
Thank you @eiri. In order to make it work we would need to forbid
Only if we agree to pass key-value list in Args. |
I'm seems to be very bad at explaining :( No, we do not forbid or change anything, quite the opposite! Default formatter then essentially is this step extracted and it gives back to a writer the same string as we have now to write. JSON formatter serialize the proplist and gives writer JSON to write. key=val formatter creates a string similar to my example above and passes it to writer. And so on. We also adding a new function With this we have a transparent migration, turning structured logging "on" is just a question of configuration and we can update existing code moving attributes from log message into Meta at our pace. Where with format forbidding or semantics changing approach we are facing a massive update of all log calls across the whole codebase which doesn't sounds very appealing. |
I think I understand you proposal. There is one contradiction in your description which prevents me from understanding it 100%. You've said that we would pass |
Back to my proposal (combined with ideas from erlang 21 Logger). I want to cover upgrade case. We would allow both to coexists for some time.
Somewhere in the formatter we can deal with legacy format (we can detect it by checking The updated calls would look like
Two bins are useful to avoid name clashes. Imagine that you want to have argument named with the same key used in meta-information. Log events in JSON for non updated callers would look like:
|
There are no contradiction really, it just serialization method of text formatter happened to be
Well... yes? If we can get structured logging with minimal changes, no migration and without re-educating everyone on how formatting in logs works then I believe it's a Good Thing.
Ah, but you have name clashes only because you are introducing two data bins. 😄 Here is what I don't like about it:
I guess the goal of your proposal is to have both structured and "informative" logging at the same time? I agree that it's tempting to make structuring kind of invisible, yet in my opinion the change is rather steep. Also that automatic A side note on those macros and semi-automatic updates - I think we can use |
No. The goal is to pass args as is in JSON (I'll be talking about JSON but it can be any format). In fact the more I think about it the more I am convinced that the message in log shouldn't have any embedded data at all. I wanted to see
The
I think the first approach is better for the following reasons:
I agree it could be a problem
Agree that same pid can end up in both. I think we should discourage putting any info in the message as I mentioned earlier.
On the contrary. If we make it clear that we are not doing io_lib format at all it would encourage people to remove any printf style formats from their messages. As you can see from the example (I copied the example here from my previous comment) there are no formats in the message:
This is changed see example with
We could but in this case I would propose to bump supported erlang to at least 18.0 so we can use merl. Writing parse_transfrom using erl_syntax is ugly and tedious. Overal there is not a big difference between our proposals.
As you can see the differences are:
I agree with you that the biggest problem with my proposal is that the developer would have to think where to put args either into Args or into Meta. I would be OK with merging two bins if we would find a way to detect keys collision. In this case the only differences between proposals would be:
|
Then you lost me again. If you are not using args to format message, then why are you insisting on spreading data across two distinct name spaces? Your first answer was "Two bins are useful to avoid name clashes" and second: "The goal is to pass args as is in JSON". Those are not "why", those are consequences of that decision. Both args and meta are data, list of key-values or JSON as you call it - what benefits are there from arbitrary splitting it?
There are quite a difference between discouraging and crippled functionality. I don't think breaking basic expectation of how logging works will win it a lot of fans.
Yes, except mine not requiring an ideological revolution and northern river reversal effort to implement. I know you said you don't think it's a big deal - well, I do. Massive re-writing of code base and enforcing people to re-learn things happens, for example PSE, but we can avoid it here and get exactly same result - structured logs.
Actually the real difference here is that you think It's about user-friendliness really. Purest form of structured log is Level = "info",
Msg = io_lib:format(Fmt, Args),
couch_log([{message, Msg}, {level, Level},..]). We can write this as |
The implementation would be less than 300 lines (most likely less than 50).
What are the benefits of doing |
I'm not talking about implementation, I'm talking about scale of change - all the code you are suggesting to modify and all public API you are suggesting to redefine.
Not benefits of doing, but benefits of allowing to do, in my opinion software should be helpful to programmer, not limiting and prescriptive. |
I really like the idea to use macro, because this way we would be able to embed additional meta data about location easily. However if it is one of the most important concerns for you we always can implement the desired functionality via parse_transform.
In my opinion if we would keep the current approach (when we call |
Compromise solutionAll developers would agree on
Implementation
ExamplesOld style logging call
line formatter output
json formatter output
New style logging
line formatter output
json formatter output
|
I think a move to structured logging will be very helpful. As @eiri points out, at least in my experience it makes one think more deeply about useful log ... content I'll call it 😄 FWIW I have tended to think of the "message" as the human readable bit that will show up by default in tools like Kibana or Splunk. In this thinking should therefore be a useful one line to scan. So for the compaction example, I would likely choose to output the message of "Compaction started for shards/....." rather than just "Compaction started.". I think this is my only main insight that I haven't seen in the awesome discussion on this ticket so far :) While I think @eiri is right that message is "just another field", most solutions appear to output a For the other fields, I've tended to think about what would be useful from a searching perspective. So "action: compaction_start" might be a useful one that these examples miss out. Broadly: what should I include that means that a search doesn't have to include free-text searches of the message field. In practice this usually means duplicated information between the message and the fields. For the API specifics:
|
Thank you @mikerhodes.
We definitely missed
Unfortunately Erlang
|
I think an On another note, here seems to be a general believe that If we are insist on treating |
@iilyak said:
Yes, I'd have it in extra fields. Not all log messages need it. Make it convention rather than enforced. @eiri said:
I can totally see this. I'd prefer to start with seeing if it's a problem in practice. Hopefully we are not starting the same process in many places 😄
That's a really good point. Looking at other things I've used in the past, they do split up the idea of the format args for the message from the extra structured fields args. I think @iilyak's proposal takes that into account:
I, at least, found this a very readable code stanza. |
I’d like to change the way we generate and format log messages inside CouchDB. Currently we use “printf” style invocations of the form
couch_log:Level(Format, Args)
whereLevel
is one of the standard syslog levels. In production I find that this approach makes it rather difficult to quickly find log messages that match some criterion. We have multi-line messages, individual values in messages aren’t always labeled, etc. I’ll sometimes end up grep'ing for a bit of text that I know to be unique for a particular message type, but … really? We can do better.The idea behind structured logging is that the application sends along a set of key-value pairs as an entry to the logging framework. The logger can then invoke one or more handlers to generate different message formats. A dev handler might write out the key-value pairs in a readable key=value format, while in production you might turn on a handler that produces newline-delimited JSON messages which can be easily consumed by any decent logging solution.
I plan to take a pass at converting our existing logging invocations to a more structured format, then see about making the
couch_log
framework less printf-focused in its handler API interface. Comments and ideas welcome.The text was updated successfully, but these errors were encountered: