Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured logging #1373

Open
kocolosk opened this issue Jun 5, 2018 · 26 comments
Open

Structured logging #1373

kocolosk opened this issue Jun 5, 2018 · 26 comments

Comments

@kocolosk
Copy link
Member

kocolosk commented Jun 5, 2018

I’d like to change the way we generate and format log messages inside CouchDB. Currently we use “printf” style invocations of the form couch_log:Level(Format, Args) where Level is one of the standard syslog levels. In production I find that this approach makes it rather difficult to quickly find log messages that match some criterion. We have multi-line messages, individual values in messages aren’t always labeled, etc. I’ll sometimes end up grep'ing for a bit of text that I know to be unique for a particular message type, but … really? We can do better.

The idea behind structured logging is that the application sends along a set of key-value pairs as an entry to the logging framework. The logger can then invoke one or more handlers to generate different message formats. A dev handler might write out the key-value pairs in a readable key=value format, while in production you might turn on a handler that produces newline-delimited JSON messages which can be easily consumed by any decent logging solution.

I plan to take a pass at converting our existing logging invocations to a more structured format, then see about making the couch_log framework less printf-focused in its handler API interface. Comments and ideas welcome.

@kocolosk
Copy link
Member Author

kocolosk commented Jun 6, 2018

Some comments after musing on this topic for a day:

  • We'll need an interface that keeps the current lazy evaluation behavior so that we're not sitting around generating debug messages that won't get sent. The current printf style is efficient in this respect.
  • The reason I'm worried about the above is that a future JSON formatter won't be happy receiving aribtrary Erlang terms, so I was trying to figure out the right place to transform them. Doing it at the point of the logging invocation runs the risk of wasted effort if the log level is set higher.
  • We exhumed lager from the codebase a few years ago. Why? Was it simply that we thought it was a lot of unnecessary code?
  • Erlang 21 adds a new logger inspired by lager: http:https://erlang.org/documentation/doc-10.0-rc1/lib/kernel-6.0/doc/html/logger_chapter.html. The new Erlang logger does allow for maps and/or key-value lists to be logged directly.
  • The legacy error_logger also has the concept of a report which is a list of atoms and 2-tuples. A report is not necessarily JSON-compatible because each value is an arbitrary term.

Lazy evaluation could be as simple as wrapping every log invocation in a fun() -> ... end and only invoking the function once we've checked the log level. That should be efficient and cheap, but probably means reverting to macros instead of direct M:F/A calls ...

@iilyak
Copy link
Contributor

iilyak commented Aug 14, 2018

What if we would change our logging statements to something like:

couch_log:Level(FormatWhichSupportsBindings, MetaData, Bindings).

Metadata would include data about location of the call and some context such as:

  • nonce
  • user
  • namespace (if available)
  • full URL of the request (if available)
  • component

Here is one example which illustrates the idea:

couch_log:Level("Something {db_name} something else {view}", 
   [{line, ?LINE}, {module, ?MODULE}, {user, #user_ctx.name}],
   [{
       {db_name, DbName},
       {view, View}
   }]).

We can define macro to include ?LINE and ?MODULE for convenience if we want to.

Since all of the arguments are key-value pairs we can use any serialization format (depending on logging infrastructure used):

  • JSON
  • message pack
  • BSON
  • Cap’n Proto
  • plain string as we currently use

There might be a need to define expected format for every argument. For example in which format we would send dates? We need to think about it we we design the FormatWhichSupportsBindings language or choosing the library.

@iilyak
Copy link
Contributor

iilyak commented Aug 14, 2018

The metadata inclusion is inspired by Elixir logger. Which has following fields: https://hexdocs.pm/logger/Logger.html#module-metadata

  • application - the current application
  • module - the current module
  • function - the current function
  • file - the current file
  • line - the current line
  • pid - the current process identifier
  • crash_reason - a two-element tuple with the throw/error/exit reason as first argument and the stacktrace as second. A throw will always be {:nocatch, term}. An error is always an Exception struct. All other entries are exits. The console backend ignores this metadata by default but it can be useful to other backends, such as the ones that report errors to third-party services
  • initial_call - the initial call that started the process
  • registered_name - the process registered name as an atom

We can include {commit, ?COUCHDB_GIT_SHA} as well.

@iilyak
Copy link
Contributor

iilyak commented Aug 14, 2018

We also need to include timestamp in metadata or pass it as a separate argument. In order to have exact timestamps which correspond to the time when log function is executed rather then when we format the message in the logging backend.

@wohali
Copy link
Member

wohali commented Aug 15, 2018

@iilyak have you looked at the Erlang 21 logger library? It seems like a decent target.

@iilyak
Copy link
Contributor

iilyak commented Aug 15, 2018

API for Erlang 21 logger looks good.

Logging macros

?LOG_ERROR(FunOrFormat,Args[,Metadata])
-define(LOG_ERROR(A,B,C),?DO_LOG(error,[A,B,C])).
-define(DO_LOG(Level,Args),
        case logger:allow(Level,?MODULE) of
            true ->
                apply(logger,macro_log,[?LOCATION,Level|Args]);
            false ->
                ok
        end).
-endif.

LOG_ERROR macro injects additional info to provided Metadata map. The injected info:

-define(LOCATION,#{mfa=>{?MODULE,?FUNCTION_NAME,?FUNCTION_ARITY},
                   line=>?LINE,
file=>?FILE}).

As you can see the logging macro uses case to find out if the event needs to be logged.

Additional features

  • Process specific metadata can be added using logger:set_process_metadata(Meta).
    Which is implemented using process dictionary (good).
    We can use it to set full uri used by the user to access endpoint.
  • There is concept of filters. Filters can be applied to logger and to logger handler
  • There is some overload protection in few default handlers which is based on message queue size (drop on the flour)
  • There is a notion of formatters to be able to use existent handlers (file or stdout)
    Formatters are called as Formatter:format(Log,FormatterConfig) (see logger_h_common.erl)

Implementation details

  1. the log statements is wrapped in a case to check if it needs to be logged (ets:lookup)
  2. then we dispatch it to logger_backend.erl via logger.erl
  3. logger_backend.erl does the following:
    • logger_config:get(Tid,primary) (ets:lookup)
    • apply filters
    • call handlers. Every handler does
      • logger_config:get(Tid,Id,Level) (ets:lookup)
      • apply filters
      • dispatch the call using Module:log(Log1,Config1)
      • if the call to Module fail the handler is automatically removed from configuration
      • it is up to the handler to decide how to log event (single process or parallel and in which format)

Default handlers are implemented as gen_servers.

Overload protection

Default handlers (logger_disk_log_h and logger_std_h) has overload protection. Which is implemented here.

If the handler process is getting overloaded, the log event
will be send synchronously (to the handler process) instead of
asynchronous (slows down the logging tempo of a process doing
lots of logging. If the handler is choked, drop mode is set and
no event will be sent (to the handler process).

The logger_h_common provides helper function for gen_server based handlers to flash all log events from the message queue in case of overload. It also provides functions to detect overload case.

Conclusion

The Logger library has good API and useful features. Implementation is robust and should provide good performance. The only place which could be optimized further (based on one hour review) is configuration module. Since it is implemented using ets which is the second fastest way (the fastest way is generated module).

@eiri
Copy link
Member

eiri commented Aug 15, 2018

@kocolosk wrote:

  • We exhumed lager from the codebase a few years ago. Why? Was it simply that we thought it was a lot of unnecessary code?
  • Erlang 21 adds a new logger inspired by lager:..

iirc we switched from lager because 1) it was too coupled with couch codebase and we wanted ability to change backends (apache/couchdb-couch-log#4) 2) we didn't want lager's overload protection, not with how it handle it with just quietly dropping the messages 3) and it was cumbersome to deal with all together, with its use of parse_transform and none-intuitive buffers and handlers

I like the idea of structured logging, but I'm not keen on proposed here change to couch_log:Level(FormatWhichSupportsBindings, MetaData, Bindings). I don't see benefits in introducing two separate bins to hold attributes and then also enforce change on both formatting template and arguments' switch from list to proplist.

I don't know how Elixir logger works, but personally I like how logrus went about idea of structured logging. They keep all meta or "fields" separate from a message and encourage to simplify the message itself, so it wouldn't need to include meta info. Quoting: "We've found this API forces you to think about logging in a way that produces much more useful logging messages.". Sure one still can use sprintf if necessary.
There are also idea of configurable implicit default fields, akin of mentioned ?LINE and ?MODULE and automatic treatment of level, time and msg as just another fields.

This is how I'd go about adding structured logging to couch_log:

  • Extract formatting out of writers and make it into own entity. I.e. this function will be in its own module and accept not a record #log_entry, but a map or a proplist. Depending on a formatter the output would be the same log message string as we have now, JSON object or a message + key=value pairs (this called TextFormatter in logrus).
  • Add function couch_log:Level(Format, Args, Meta) to allow to pass additional fields. To emphasize - formatter doesn't modify msg - the message is a message, not a template.
  • Formatters will be configurable in [log] section with a default to what we have now. Or default can be dictated by a writer module, I'm thinking about couch_log_writer_syslog case
  • ...
  • Profit!

A bonus section:

  • Macros sounds like a good idea, maybe not a complete switch everywhere, but they'd be nice to have. I'm using following two in my dev a lot, I've added them to my local couch_db.hrl, so they available almost everywhere.
+-define(l(F, A), couch_log:info("~p:~b - " ++ F, [?MODULE,?LINE] ++ A)).
+-define(r2kv(Rec, Name),
+    lists:zip(record_info(fields, Name), tl(tuple_to_list(Rec)))).
+
  • We can make should_log more sfisticated to turn things on and off based on meta. Something like dropping a function to logger server state and then use it as a predicate for lists:all/2 to run over meta.

@iilyak
Copy link
Contributor

iilyak commented Aug 15, 2018

@eiri
Add function couch_log:Level(Format, Args, Meta) to allow to pass additional fields. To emphasize - formatter doesn't modify msg - the message is a message, not a template.

Can you clarify how you are planing to use Args? Is it the same as we currently do:

couch_log:debug("Something ~p~n", [Term], Meta).

@eiri
Copy link
Member

eiri commented Aug 15, 2018

@iilyak yes, exactly. If we think about message as just another attribute in log's meta, then it's not formatter's job to modify attributes, but to format the whole event to a string, JSON, key=vals or whatnot.

For lazy evaluation and reduce cost of io:format call we can match Args with empty list and\or go with @kocolosk idea of wrapping that Fmt + Args into fun and call that after should_log evaluation.

@iilyak
Copy link
Contributor

iilyak commented Aug 15, 2018

I am trying to understand what JSON formatter would do with Args and how we can access them in splunk (or any other system)?

@eiri
Copy link
Member

eiri commented Aug 15, 2018

You seems to miss my point, so I'll try again.

Log message is just another attribute in log data, nothing more. It's not some special thing that every formatter have to know how to modify, that modification would happened before passing data to formatter, around the same step where we'll have time, level and the rest of auto-attributes injected. Formatter then only responsible for representation of log message.

If you want some attribute been concatenated into message it need to be presented in both Args and Meta, but the point of structured logging is that you don't need to do that to get informative logging.

For example for text formatting instead of log message "Compaction started by pid <0.11.0> for shard \"shards/00-ff/mystic.123456\"" you'll get "Compaction started. pid=<0.11.0> shard=\"shards/00-ff/mystic.123456\"" - it's still readable and actually better greppable and you don't need to deal with io:format's Fmt Compaction started by ~p for shard ~s. Then as JSON it'll be just {"msg": "Compaction started ", "pid": "<0.11.0>", "shard": "shards/00-ff/mystic.123456"}

I hope that makes sense.

@iilyak
Copy link
Contributor

iilyak commented Aug 15, 2018

Thank you @eiri. In order to make it work we would need to forbid ~ in the Msg. Either technically or enforce it on review stage. This means that Format needs to renamed. Maybe proposed API would be better described as follows:

couch_log:Level(MessageTag, Args, Meta).

@eiri but the point of structured logging is that you don't need to do that to get informative logging.

Only if we agree to pass key-value list in Args.

@eiri
Copy link
Member

eiri commented Aug 15, 2018

I'm seems to be very bad at explaining :(

No, we do not forbid or change anything, quite the opposite! couch_log:Level/2 stays exactly as it is, we just have a a step at which we are creating proplist with (this is pseudo code!) [{level, Level}, {msg: io_lib:format(Fmt, Args)}, {time, os:now()}... ] and then passing this proplist to configured formatter.

Default formatter then essentially is this step extracted and it gives back to a writer the same string as we have now to write. JSON formatter serialize the proplist and gives writer JSON to write. key=val formatter creates a string similar to my example above and passes it to writer. And so on.

We also adding a new function couch_log:Level(MessageTag, Args, Meta) which is different in just one aspect - it has Meta proplist to be merged into our "auto" proplist.

With this we have a transparent migration, turning structured logging "on" is just a question of configuration and we can update existing code moving attributes from log message into Meta at our pace. Where with format forbidding or semantics changing approach we are facing a massive update of all log calls across the whole codebase which doesn't sounds very appealing.

@iilyak
Copy link
Contributor

iilyak commented Aug 16, 2018

I think I understand you proposal. There is one contradiction in your description which prevents me from understanding it 100%. You've said that we would pass [.... {msg: io_lib(Fmt, Args), ..} to configured formatter. Then you mention that default formatter would do the step we currently do.
Which also includes the call to io_lib(Fmt, Args). Also I think one of the goals of your proposal is to avoid modifying all places where we call logger. I don't think it is a big deal. We always can write scripts to update all callers in semiautomatic manner.

@iilyak
Copy link
Contributor

iilyak commented Aug 16, 2018

Back to my proposal (combined with ideas from erlang 21 Logger). I want to cover upgrade case. We would allow both to coexists for some time.
Currently we have couch_log:Level(Format, Args) we would replace it with LOG_ERROR2(Format, Args). We can use search and replace to make this change. Updated log statements would use LOG_ERROR macro instead. The macro would be defined as follows:

-define(LOG_ERROR2(Msg, Args),?DO_LOG(error, [Msg, couch_log:to_kv(Args), [{format_vsn, 1}]])).
-define(LOG_ERROR(Msg, Args, Meta),?DO_LOG(error, [Msg, Args, Meta])).
-define(DO_LOG(Level, Args),
        case couch_log_util:should_log(Level) of
            true ->
                apply(couch_log, macro_log, [?LOCATION, Level | Args]); 
            false ->
                ok
        end).
-endif.

-define(LOCATION, [
    {module, ?MODULE},
    {function_name, ?FUNCTION_NAME},
    {frunction_arity, ?FUNCTION_ARITY},
    {line, ?LINE},
    {file, ?FILE},
    {log_event_pid, self()}
])

couch_log:to_kv would receive the list of values and return [{"arg1", value1}, {"arg2", value2}].

Somewhere in the formatter we can deal with legacy format (we can detect it by checking format_vsn == 1) and call io_lib:format.

The updated calls would look like

?LOG_ERROR("Compaction started", [
    {compactor_pid, Pid},
    {shard, Shard}
], [{user, #user_ctx.name}]).

@eiri I don't see benefits in introducing two separate bins to hold attributes

Two bins are useful to avoid name clashes. Imagine that you want to have argument named with the same key used in meta-information.

Log events in JSON for non updated callers would look like:

{
     "msg": "Compaction started by '<0.11.0>' for shard 'shards/00-ff/mystic.123456'",
     "args": {
         "arg1": "<0.11.0>",
         "arg2", "shards/00-ff/mystic.123456'"
     },
     "meta": {
        "module": "couch_bla_bla",
        "function_name": "compact_shard",
        "frunction_arity": 4,
        "line": 1234,
        "file": "src/couch/ebin/couch_bla_bla.beam",
        "log_event_pid": "<1.2.3>"
     }
}

@eiri
Copy link
Member

eiri commented Aug 16, 2018

@iilyak : There is one contradiction in your description which prevents me from understanding it 100%. ...

There are no contradiction really, it just serialization method of text formatter happened to be sprints (or rather io_lib:format, because erlang). And it is what we are doing right now: we are formatting message here first and then in a writer we are packing rest of log entry attributes.

Also I think one of the goals of your proposal is to avoid modifying all places where we call logger.

Well... yes? If we can get structured logging with minimal changes, no migration and without re-educating everyone on how formatting in logs works then I believe it's a Good Thing.

Two bins are useful to avoid name clashes.

Ah, but you have name clashes only because you are introducing two data bins. 😄

Here is what I don't like about it:

  1. It makes search queries more complicated. With two data bins you have to guess if you need search for meta.pid = <0.11.0> or for args.pid = <0.11.0> because programmer felt like packing pid into args instead of passing it to meta.
  2. It duplicates data. The same pid info can end up both in args and in meta. Not to mention that it is also written in the message.
  3. It encourages to continue to pack meta into message instead of promoting thinking about logs as structured data.
  4. It require inventing new sprint format and enforcing everyone to learn it.

I guess the goal of your proposal is to have both structured and "informative" logging at the same time? I agree that it's tempting to make structuring kind of invisible, yet in my opinion the change is rather steep. Also that automatic couch_log:to_kv/1 conversion doesn't look too helpful to me, it's not really a structure if you have a lot of attributes args.arg1, args.arg2 meaning different things.

A side note on those macros and semi-automatic updates - I think we can use parse_transform to propagate auto-meta, iirc lager was doing that under the hood.

@iilyak
Copy link
Contributor

iilyak commented Aug 17, 2018

@eiri I guess the goal of your proposal is to have both structured and "informative" logging at the same time?

No. The goal is to pass args as is in JSON (I'll be talking about JSON but it can be any format). In fact the more I think about it the more I am convinced that the message in log shouldn't have any embedded data at all. I wanted to see "Shard compaction: " instead of "Compacting shards/12344/3452342/something by <0.1.2>" in the textual part of the message.

Also that automatic couch_log:to_kv/1 conversion doesn't look too helpful to me, it's not really a structure if you have a lot of attributes args.arg1, args.arg2 meaning different things.

The couch_log:to_kv/1 is only a temporary measure to support old style (not yet converted) log statements. I can see two ways of supporting the old style:

  • convert invocation in the macro
  • have a special case in the formatter which handles non key-value based arguments

I think the first approach is better for the following reasons:

  • having special case in the formatter means that it needs to be implemented in every formatter
  • distinguishing key-value list from plain list is tricky
  • it uses exactly the same codepath which means we wouldn't have bugs which affect only one of the styles
  1. It makes search queries more complicated. With two data bins you have to guess if you need search for meta.pid = <0.11.0> or for args.pid = <0.11.0> because programmer felt like packing pid into args instead of passing it to meta.

I agree it could be a problem

  1. It duplicates data. The same pid info can end up both in args and in meta. Not to mention that it is also written in the message.

Agree that same pid can end up in both. I think we should discourage putting any info in the message as I mentioned earlier.

  1. It encourages to continue to pack meta into message instead of promoting thinking about logs as structured data.

On the contrary. If we make it clear that we are not doing io_lib format at all it would encourage people to remove any printf style formats from their messages. As you can see from the example (I copied the example here from my previous comment) there are no formats in the message:

?LOG_ERROR("Compaction started", [
    {compactor_pid, Pid},
    {shard, Shard}
], [{user, #user_ctx.name}]).
  1. It require inventing new sprint format and enforcing everyone to learn it.

This is changed see example with ?LOG_ERROR("Compaction started", [ above. I forgot to mention that my previous comment is iteration of my previous proposal with some elements from yours. As one of ways to discourage putting info into the message we
could check if the message contain any ~ for new style invocations and abort the build with error. Unfortunately it is impossible to implement in erlang using macro our only option here is parse_transforms. Which I would prefer to avoid.

A side note on those macros and semi-automatic updates - I think we can use parse_transform to propagate auto-meta, iirc lager was doing that under the hood.

We could but in this case I would propose to bump supported erlang to at least 18.0 so we can use merl. Writing parse_transfrom using erl_syntax is ugly and tedious.

Overal there is not a big difference between our proposals.

couch_log:error("Compaction started for ~p~n", ["shards/...."], [{pid, <1.2.3>}, {user, "foo"}])` ->     
    {
         msg: "Compaction started shards/...\n", 
         pid: "<1.2.3>", 
         user: "foo"
    }
?LOG_ERROR("Compaction started", [{shard, Shard}, {compactor_pid, <1.2.3>}], [{user, "foo"}]) ->
    {
         msg: "Compaction started", 
         args: {
             compactor_pid, "<1.2.3>"
         }, 
        "meta": {
             user: "foo"
        }
    }

As you can see the differences are:

eiri iilyak
Presence of format markers *
Embed data into message *
Calls to io_lib:format * maybe
Args available in JSON *
Matadata available in JSON * *
Ease of converting old calls * *
Have to think where to put args *

I agree with you that the biggest problem with my proposal is that the developer would have to think where to put args either into Args or into Meta. I would be OK with merging two bins if we would find a way to detect keys collision. In this case the only differences between proposals would be:

  • key-value list instead of plain list for Args
  • format markers in the message in the source

@eiri
Copy link
Member

eiri commented Aug 17, 2018

No. The goal is to pass args as is in JSON (I'll be talking about JSON but it can be any format). In fact the more I think about it the more I am convinced that the message in log shouldn't have any embedded data at all.

Then you lost me again. If you are not using args to format message, then why are you insisting on spreading data across two distinct name spaces? Your first answer was "Two bins are useful to avoid name clashes" and second: "The goal is to pass args as is in JSON". Those are not "why", those are consequences of that decision. Both args and meta are data, list of key-values or JSON as you call it - what benefits are there from arbitrary splitting it?

As one of ways to discourage putting info into the message we
could check if the message contain any ~ for new style invocations and abort the build with error.

There are quite a difference between discouraging and crippled functionality. I don't think breaking basic expectation of how logging works will win it a lot of fans.

Overall there is not a big difference between our proposals.

Yes, except mine not requiring an ideological revolution and northern river reversal effort to implement. I know you said you don't think it's a big deal - well, I do. Massive re-writing of code base and enforcing people to re-learn things happens, for example PSE, but we can avoid it here and get exactly same result - structured logs.

In this case the only differences between proposals would be:

  • key-value list instead of plain list for Args

Actually the real difference here is that you think Args is a part of structured data and I see it as a part of data-point "message".

It's about user-friendliness really. Purest form of structured log is couch_log([{message, "msg"}, {level, "info"}, {key, val},..}]). Since "message" and "level" two required attributes, instead of writing:

Level = "info",
Msg = io_lib:format(Fmt, Args),
couch_log([{message, Msg}, {level, Level},..]).

We can write this as couch_log:Level(Fmt, Args, [{key, val},..]) This is easier to write and find when scanning code, but it's just a syntactic sugar.

@iilyak
Copy link
Contributor

iilyak commented Aug 17, 2018

Yes, except mine not requiring an ideological revolution and northern river reversal effort to implement. I know you said you don't think it's a big deal - well, I do.

The implementation would be less than 300 lines (most likely less than 50).

Actually the real difference here is that you think Args is a part of structured data and I see it as a part of data-point "message".

It's about user-friendliness really. Purest form of structured log is couch_log([{message, "msg"}, {level, "info"}, {key, val},..}]). Since "message" and "level" two required attributes, instead of writing:

What are the benefits of doing Msg = io_lib:format(Fmt, Args)?

@eiri
Copy link
Member

eiri commented Aug 17, 2018

The implementation would be less than 300 lines (most likely less than 50).

I'm not talking about implementation, I'm talking about scale of change - all the code you are suggesting to modify and all public API you are suggesting to redefine.

What are the benefits of doing Msg = io_lib:format(Fmt, Args)?

Not benefits of doing, but benefits of allowing to do, in my opinion software should be helpful to programmer, not limiting and prescriptive.

@iilyak
Copy link
Contributor

iilyak commented Aug 17, 2018

I'm not talking about implementation, I'm talking about scale of change - all the code you are
suggesting to modify and all public API you are suggesting to redefine.

I really like the idea to use macro, because this way we would be able to embed additional meta data about location easily. However if it is one of the most important concerns for you we always can implement the desired functionality via parse_transform.

What are the benefits of doing Msg = io_lib:format(Fmt, Args)?

Not benefits of doing, but benefits of allowing to do, in my opinion software should be helpful
to programmer, not limiting and prescriptive.

In my opinion if we would keep the current approach (when we call io_lib:format(Fmt, Args)
behind the scene) we as developers would continue to embed data into the message.

@iilyak
Copy link
Contributor

iilyak commented Aug 30, 2018

Compromise solution

All developers would agree on

  1. all new log statements should receive proplist type for args.
  2. we should discourage use of string formatter directives (such as ~p)
  3. The Message in the call shouldn't have any data embedded

Implementation

  1. We would implement couch_log:Level(Message, Args, Meta) interface using parse_transform and merl.
  2. The implementation would support both plane list and proplist for Args (for compatibility reasons)
  3. The io_lib:format call would be done in the formatters (which need it).
  4. In case when keys supplied via Args collide with automatically injected in Meta, Args takes precedence.

Examples

Old style logging call

couch_log:error("Compaction started for ~p~n", ["shards/...."])

line formatter output

Compaction started for shards/..... [pid=<0.11.0>, line=123, file=couch_db_updater.erl, arg_1="shards/..."]

json formatter output

{
    "message": "Compaction started for ~p~n",
    "line": 123,
    "file": "couch_db_updater.erl",
    "pid": "<0.11.0>",
    "arg_1": "shards/..." 
}

New style logging

couch_log:error("Compaction started", [{shard, "shards/..."}], [{user, "foo"}]).

line formatter output

Compaction started. [pid=<0.11.0>, line=123, file=couch_db_updater.erl, shard="shards/...", user="foo"]

json formatter output

{
    "message": "Compaction started for ~p~n",
    "line": 123,
    "file": "couch_db_updater.erl",
    "pid": "<0.11.0>",
    "shard": "shards/...",
    "user": "foo"
}

@mikerhodes
Copy link
Contributor

I think a move to structured logging will be very helpful. As @eiri points out, at least in my experience it makes one think more deeply about useful log ... content I'll call it 😄

FWIW I have tended to think of the "message" as the human readable bit that will show up by default in tools like Kibana or Splunk. In this thinking should therefore be a useful one line to scan. So for the compaction example, I would likely choose to output the message of "Compaction started for shards/....." rather than just "Compaction started.". I think this is my only main insight that I haven't seen in the awesome discussion on this ticket so far :)

While I think @eiri is right that message is "just another field", most solutions appear to output a message field by default in the UI and have disclosure arrows to view other things, so I do think that these messages need to be useful for a human reading the logs.

For the other fields, I've tended to think about what would be useful from a searching perspective. So "action: compaction_start" might be a useful one that these examples miss out. Broadly: what should I include that means that a search doesn't have to include free-text searches of the message field. In practice this usually means duplicated information between the message and the fields.

For the API specifics:

  • I side with @eiri that a single "bin" for the extra fields in the JSON is easier, it's not obvious to me whether there's a natural mapping. For the meta-info of line etc. perhaps the macro/parse transform needs to validate no name clashes once its validated the log level? Problem here is that you need good test code coverage to make sure this doesn't explode (could a transform do this check at compile time?).

  • I like something like:

      couch_log:error(FormatString, FormatArgs, ExtraFields)
    

    which ends up like:

      couch_log:error("Compaction started for ~p", [{shard, "shards/..."}], [{user, "foo"}, {shard, "shards/..."}])
    

    Note the duplication of shard info, but I think this makes reading the logs and searching them simpler. Arguably you could have named bit in the string formatting that could read fields from ExtraFields to avoid having to have the FormatArgs argument. I don't know if Erlang's formatting supports it though.

  • I have always liked the "log4X" pattern where what to log is separate from the format of the log, and they are configured separately. At a scan, the Erlang 21 logger @kocolosk links to follows this pattern; I've always liked python's logger class too, though it's a bit long in the tooth.

@iilyak
Copy link
Contributor

iilyak commented Aug 31, 2018

Thank you @mikerhodes.

So "action: compaction_start" might be a useful one that these examples miss out.

We definitely missed action field. How would you add it to API? Are you proposing to add into ExtraFields?

Arguably you could have named bit in the string formatting that could read fields from ExtraFields to avoid having to have the FormatArgs argument. I don't know if Erlang's formatting supports it though.

Unfortunately Erlang io_lib:format doesn't support named arguments or interpolation. However your comment lead me to one idea which I would like to hear some feedback about from a wider audience. The idea is to use maps for fields and pass list of fields to use in formatter in the order of appearance in the format string.

couch_log:error("Compaction started for ~p", [shard], #{
     user => "foo", 
     shard => "shards/..."
}).

@eiri
Copy link
Member

eiri commented Sep 4, 2018

I think an action field is a good idea with one caveat that without formal naming strategy it's fairly simple to get a lot of inconsistencies in there and have for example starting_compaction in one place and compaction_start in another. Can't think of a good way to solve it without defining some naming rules beforehand and then adding a check for compliance as a part of a review process.
Than again I might be overthinking this and it doesn't matter as long as the actions are unique and greppable in code.

On another note, here seems to be a general believe that FormatArgs always represent data worth adding to a data portion of logs. I personally don't think that's the case, with a message log("Starting compaction for shard ~p on update sequence ~p", [<<"shard">>, 12]) it would be helpful to have an ability to query by shard field, much less so for update_sequence field. Let's not forget that data points are not free, each represent an index on a receiving side meaning storage size and log processing rate.

If we are insist on treating FormatArgs and ExtraFields as the same, then probably we are better to give up on io_lib:format for message formatting and switch to a proper templating like mustache or dtl, if we can't keep things simple we as well can go whole way.

@mikerhodes
Copy link
Contributor

@iilyak said:

So "action: compaction_start" might be a useful one that these examples miss out.

We definitely missed action field. How would you add it to API? Are you proposing to add into ExtraFields?

Yes, I'd have it in extra fields. Not all log messages need it. Make it convention rather than enforced.

@eiri said:

I think an action field is a good idea with one caveat that without formal naming strategy it's fairly simple to get a lot of inconsistencies in there and have for example starting_compaction in one place and compaction_start in another.

I can totally see this. I'd prefer to start with seeing if it's a problem in practice. Hopefully we are not starting the same process in many places 😄

I personally don't think that's the case, with a message log("Starting compaction for shard ~p on update sequence ~p", [<<"shard">>, 12]) it would be helpful to have an ability to query by shard field, much less so for update_sequence field.

That's a really good point. Looking at other things I've used in the past, they do split up the idea of the format args for the message from the extra structured fields args.

I think @iilyak's proposal takes that into account:

couch_log:error("Compaction started for ~p", [shard], #{
     user => "foo", 
     shard => "shards/..."
}).

I, at least, found this a very readable code stanza.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants