Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging of non-ASCII characters #49

Closed
z1atk0 opened this issue Feb 22, 2022 · 11 comments
Closed

Logging of non-ASCII characters #49

z1atk0 opened this issue Feb 22, 2022 · 11 comments
Milestone

Comments

@z1atk0
Copy link

z1atk0 commented Feb 22, 2022

Hi,

I'm not sure if this is actually a sysklogd issue, but I have to start somewhere, I guess. 🤷‍♂️ Short context: the recently released Slackware 15.0 comes with your sysklogd-2.3.0, while previous versions of Slackware (14.2 and lower) included the old infodrom.org sysklogd-1.5.1 that you forked off a while ago.

Now it looks like the logging of (the greater portion of) non-ASCII characters appears to be broken. I found a line in my logs which (when viewed with less, and $LESSCHARSET="utf-8") looks like this (please scroll right to the end of the line) ...

Feb 21 21:54:15 next gcr-prompter[2368]: GLib-GIO: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for <E2>M-^@M-^Xgio-vfs<E2>M-^@M-^Y

... while it is actually supposed to look like this (again, please scroll right to the end of the line):

Feb 21 21:54:15 next gcr-prompter[2368]: GLib-GIO: _g_io_module_get_default: Found default implementation gvfs (GDaemonVfs) for ‘gio-vfs’

After quite a bit of head scratching, some debugging, and building of a couple of sysklogd's released versions on Slackware 15.0, I came to the conclusion that 1.6 is the last version that logs non-ASCII characters properly. As of 2.0, which I understand was more or less almost a rewrite from scratch, non-ASCII characters turn up garbled in the on-disk logs.

This is with sysklogd-2.0 and later ... on one terminal:

[root@next:/usr/src/slackware/a]# /usr/sbin/syslogd -s -d
Starting.
nslookup 'none:syslog'
Created inet socket 4 for (null):syslog ...
Created inet socket 5 for (null):syslog ...
Created UNIX socket 6 ...
cfline(*.info;*.!warn;authpriv.none;cron.none;mail.none;news.none	-/var/log/messages)
symbolic name: info ==> 6
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: '/var/log/messages'
RFC3164 format enabled
cfline(*.warn;authpriv.none;cron.none;mail.none;news.none	-/var/log/syslog)
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: '/var/log/syslog'
RFC3164 format enabled
cfline(*.=debug						-/var/log/debug)
symbolic name: debug ==> 7
leading char in action: /
filename: '/var/log/debug'
RFC3164 format enabled
cfline(authpriv.*						-/var/log/secure)
symbolic name: * ==> 255
symbolic name: authpriv ==> 80
leading char in action: /
filename: '/var/log/secure'
RFC3164 format enabled
cfline(cron.*							-/var/log/cron)
symbolic name: * ==> 255
symbolic name: cron ==> 72
leading char in action: /
filename: '/var/log/cron'
RFC3164 format enabled
cfline(mail.*							-/var/log/maillog)
symbolic name: * ==> 255
symbolic name: mail ==> 16
leading char in action: /
filename: '/var/log/maillog'
RFC3164 format enabled
cfline(*.emerg							*)
symbolic name: emerg ==> 0
leading char in action: *
write-all
RFC3164 format enabled
Searching for /etc/syslog.d/*.conf ...
Opening /etc/syslog.d/dev-tty12.conf ...
Parsing /etc/syslog.d/dev-tty12.conf ...cfline(*.*	/dev/tty12)
symbolic name: * ==> 255
leading char in action: /
filename: '/dev/tty12'
RFC3164 format enabled
60 60  X 60 60 60 60  X 60  X  X 60 60 60 60 60 60 60 60 60 60 60 60 60 60 FILE: /var/log/messages
1F 1F  X 1F 1F 1F 1F  X 1F  X  X 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F FILE: /var/log/syslog
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 FILE: /var/log/debug
 X  X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/secure
 X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/cron
 X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/maillog
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 WALL: 
FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF TTY: /dev/tty12
syslogd v2.0: restart.
logmsg: syslog.info<46>, flags 0, from next, app-name syslogd procid 23020 msgid (null) sd (null) msg syslogd v2.0: restart.
Called fprintlog_first(), fmt3164(46, 0x00, next, syslogd, 23020, (null), (null), syslogd v2.0: restart.)logging to FILE /var/log/messages
Called fprintlog_first(), fmt3164(46, 0x00, next, syslogd, 23020, (null), (null), syslogd v2.0: restart.)logging to TTY /dev/tty12
syslogd: restarted.
Debugging disabled, SIGUSR1 to turn on debugging.
Message from UNIX socket #6: <15>1 2022-02-22T12:40:33.145243+01:00 next test - - - iso/latin - öäüÖÄÜ߀¢§ 
logmsg: user.debug<15>, flags 0, from next, app-name test procid (null) msgid (null) sd (null) msg iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ 
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ )logging to FILE /var/log/debug
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ )logging to TTY /dev/tty12
Message from UNIX socket #6: <15>1 2022-02-22T12:40:33.149393+01:00 next test - - - unicode - –…‘’• 
logmsg: user.debug<15>, flags 0, from next, app-name test procid (null) msgid (null) sd (null) msg unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� 
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� )logging to FILE /var/log/debug
Called fprintlog_first(), fmt3164(15, 0x00, next, test, (null), (null), (null), unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� )logging to TTY /dev/tty12
[...]

And meanwhile, on a terminal not far away:

[root@next:~]# pkill -USR1 syslogd
[root@next:~]# date; logger.sysklogd -p debug -s -t test "iso/latin - öäüÖÄÜ߀¢§"; logger.sysklogd -p debug -s -t test "unicode - –…‘’•"; tail -2 /var/log/debug
Di 22 Feb 2022 12:40:33 CET
Feb 22 2022 12:40:33 test: iso/latin - öäüÖÄÜ߀¢§ 
Feb 22 2022 12:40:33 test: unicode - –…‘’• 
Feb 22 12:40:33 next test: iso/latin - öäü�M-^V�M-^D�M-^\�M-^_�M-^B�¢§ 
Feb 22 12:40:33 next test: unicode - �M-^@M-^S�M-^@��M-^@M-^X�M-^@M-^Y�M-^@� 

Note the Message from UNIX socket lines in the debug output where the characters are still intact! This seems to suggest that syslogd is receiving all characters just fine, and they only get munched later on, somewhere on their way to disk.

And here's the same exercise with sysklogd-1.6, where everything works as expected. On one terminal:

[root@next:/usr/src/slackware/a]# /usr/sbin/syslogd -d
Allocated parts table for 1024 file descriptors.
Starting.
Called init.
Called allocate_log, nlogs = -1.
cfline(*.info;*.!warn;authpriv.none;cron.none;mail.none;news.none	-/var/log/messages)
symbolic name: info ==> 6
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: /var/log/messages
Called allocate_log, nlogs = 0.
cfline(*.warn;authpriv.none;cron.none;mail.none;news.none	-/var/log/syslog)
symbolic name: warn ==> 4
symbolic name: none ==> 16
symbolic name: authpriv ==> 80
symbolic name: none ==> 16
symbolic name: cron ==> 72
symbolic name: none ==> 16
symbolic name: mail ==> 16
symbolic name: none ==> 16
symbolic name: news ==> 56
leading char in action: /
filename: /var/log/syslog
Called allocate_log, nlogs = 1.
cfline(*.=debug						-/var/log/debug)
symbolic name: debug ==> 7
leading char in action: /
filename: /var/log/debug
Called allocate_log, nlogs = 2.
cfline(authpriv.*						-/var/log/secure)
symbolic name: * ==> 255
symbolic name: authpriv ==> 80
leading char in action: /
filename: /var/log/secure
Called allocate_log, nlogs = 3.
cfline(cron.*							-/var/log/cron)
symbolic name: * ==> 255
symbolic name: cron ==> 72
leading char in action: /
filename: /var/log/cron
Called allocate_log, nlogs = 4.
cfline(mail.*							-/var/log/maillog)
symbolic name: * ==> 255
symbolic name: mail ==> 16
leading char in action: /
filename: /var/log/maillog
Called allocate_log, nlogs = 5.
cfline(*.emerg							*)
symbolic name: emerg ==> 0
leading char in action: *
write-all
Called allocate_log, nlogs = 6.
cfline(include /etc/syslog.d/*.conf)
symbolic name: d/*.confCalled logerr, msg: unknown priority name "d/*.conf"
logmsg: syslog.err<43>, flags 4, from next, msg syslogd: unknown priority name "d/*.conf"
Called fprintlog, logging to CONSOLE /dev/console
Opened UNIX socket `/dev/log'.
 0: 60 60  X 60 60 60 60  X 60  X  X 60 60 60 60 60 60 60 60 60 60 60 60 60 60 FILE: /var/log/messages
 1: 1F 1F  X 1F 1F 1F 1F  X 1F  X  X 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F 1F FILE: /var/log/syslog
 2: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 FILE: /var/log/debug
 3:  X  X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/secure
 4:  X  X  X  X  X  X  X  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/cron
 5:  X  X FF  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X  X FILE: /var/log/maillog
 6:  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 WALL: 
logmsg: syslog.info<46>, flags 4, from next, msg syslogd v1.6: restart.
Called fprintlog, logging to FILE /var/log/messages
syslogd: restarted.
Debugging disabled, SIGUSR1 to turn on debugging.
Select interrupted.
Calling select, active file descriptors (max 3): 3 

Successful select, descriptor count = 1, Activity on: 3 
Message from UNIX socket: #3
Message length: 61, File descriptor: 3.
logmsg: user.debug<15>, flags 2, from next, msg Feb 22 12:22:07 test: iso/latin - öäüÖÄÜ߀¢§
Called fprintlog, logging to FILE /var/log/debug
Calling select, active file descriptors (max 3): 3 

Successful select, descriptor count = 1, Activity on: 3 
Message from UNIX socket: #3
Message length: 53, File descriptor: 3.
logmsg: user.debug<15>, flags 2, from next, msg Feb 22 12:22:07 test: unicode - –…‘’•
Called fprintlog, logging to FILE /var/log/debug
Calling select, active file descriptors (max 3): 3 
^Csyslogd: exiting on signal 2
logmsg: syslog.info<46>, flags 4, from next, msg exiting on signal 2
Called fprintlog, logging to FILE /var/log/messages

And from the second terminal:

[root@next:~]# pkill -USR1 syslogd
[root@next:~]# date; logger.util-linux -p debug -s -t test "iso/latin - öäüÖÄÜ߀¢§"; logger.util-linux -p debug -s -t test "unicode - –…‘’•"; tail -2 /var/log/debug
Di 22 Feb 2022 12:22:07 CET
<15>Feb 22 12:22:07 test: iso/latin - öäüÖÄÜ߀¢§
<15>Feb 22 12:22:07 test: unicode - –…‘’•
Feb 22 12:22:07 next test: iso/latin - öäüÖÄÜ߀¢§
Feb 22 12:22:07 next test: unicode - –…‘’•

All characters end up in the logs ungarbled. In case you're wondering, logger.sysklogd is the logger binary from sysklogd itself, while logger.util-linux is the one from - yes, you guessed it! 😉 - util-linux, because sysklogd-1.6 does not ship a logger binary itself.

Soooooooo ... TBH, I really have no idea if this is actually a bug, or if it's actually known, to be expected, wanted, needed, desired, or even required or mandated behaviour, either by some RFC, or some other standards I don't know about. I'll leave that up to you to decide. Thanks for putting up with my lengthy bug report! 👍

@troglobit
Copy link
Owner

Hi, sorry for the late reply, been busy changing $DAYJOB and that seems to have taken its toll on me.

Now, I don't know of any specifics re:RFC and unicode, but one could easily argue that this is at least a regression in behavior compared to v1.6 (you're right, it was a HUGE redesign!)

I must thank you for the extensive report, it was very thorough and I appreciate the time you've put into it a LOT! ❤️

I'll have a look to see if I can create a little test case for this to reproduce, and then try to isolate the culprit.

@troglobit
Copy link
Owner

Well, that was quick!

https://github.com/troglobit/sysklogd/blob/e433051/src/syslogd.c#L817-L846

/*
 * Removes characters from log messages that are unsafe to display.
 * TODO: Permit UTF-8 strings that include a BOM per RFC 5424?
 */
static void
parsemsg_remove_unsafe_characters(const char *in, char *out, size_t outlen)
{
...

The culprit is a safety filter which was added when syncing with the FreeBSD sources to get RFC5424 support, https://cgit.freebsd.org/src/tree/usr.sbin/syslogd/syslogd.c#n967 so it appears FreeBSD has the same limitation.

Not really sure what to do about this. I'm not super keen on adding native support for parsing unicode to sysklogd, nor do I want to add an external dependency for this, and removing the check seems (?) to be unsafe ... I'm guessing they had a good reason for putting it in there 😕

I'll continue poking around a bit, looking at FreeBSD and a few other sources.

@troglobit
Copy link
Owner

RFC5424 seems to indicate that unicode support is optional;

"... a syslog application conforming to this specification may not
be able to ascertain that the information given to it from an
originator is encoded in UTF-8. If it cannot determine that with
certainty, the syslog application may choose to not incorporate the
BOM in the MSG."

From what I can tell though, the NetBSD syslogd does support unicode BOM deciphering (search for IS_BOM()), so I'll go ahead and see if I can graft that onto sysklogd.

@troglobit
Copy link
Owner

Ooh dear ... that was a hot mess! 😱 😨

(I need to get over my fear of the NetBSD code base, it holds the key to future signing and TLS support, but for now I'm going back to the FreeBSD version ...)


Anyway, FreeBSD has the following option, which we can easily add. I've tested it and it fixes your reported issue.

-8 Tells syslogd not to interfere with 8-bit data.  Normally syslogd replaces C1 control
   characters (ISO 8859 and Unicode characters) with their “M-x” equivalent.  Note, this
   option does not change the way syslogd alters control characters (see iscntrl(3)).  They
   are always replaced with their “^x” equivalent.

What do you think, @z1atk0, would that be OK as a workaround for now?

@troglobit troglobit added this to the v2.4.0 milestone Mar 5, 2022
@z1atk0
Copy link
Author

z1atk0 commented Mar 7, 2022

Wow, seems like I missed quite a flurry of activity over the weekend! 🙀 Sorry for not replying any earlier, $DAYJOB distractions over here as well. 😉

As for the solution/workaround you found: yes, the -8 workaround would be perfectly fine for my purposes, TYVM! 👍 😎

@troglobit
Copy link
Owner

Cool, no problem, I'll push it later tonight (CET) 😎✌️

@troglobit
Copy link
Owner

There, hope it works better for you now. Thanks again for taking the time to report this and writing such an awesome bug report! ❤️

@Exchie
Copy link

Exchie commented Jun 21, 2022

Hello,
there is an option in version 2.3.0 in the file src/syslogd.c:
    static int      mask_C1 = 1;          /* mask characters from 0x80 - 0x9F */
If to toggle it to 0, some problems with non-ascii chars will vanish. However,
i'm not sure if it's safe and if new hidden issues are not to come.

@troglobit
Copy link
Owner

@Exchie this is a closed issue with a fix in 6022d3c, which also was released as v2.4.0? The fix implements a command line option -8 that does exactly what you say, toggle mask_c1.

I'm not exactly sure what you want to say with your comment? So I'm going to assume you are new to GitHub and leave it at that.

@Exchie
Copy link

Exchie commented Jun 23, 2022

Sorry, I haven't looked at the new modified code for "-8" option @2.4.0 version. I had 2.3.0 version installed then, have found out that this flag mask_C1 works for me, but to be sure this doesn't violate security, asked the question.

@troglobit
Copy link
Owner

@Exchie OK, the lack of a question mark in your comment made it really difficult to tell if you meant it as a statement or a question. It looked like a statement, hence my confusing since the issue was already fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants