Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strptime discards timezone information parsed from %z specifier #359

Closed
Sharpie opened this issue Aug 21, 2020 · 4 comments
Closed

strptime discards timezone information parsed from %z specifier #359

Sharpie opened this issue Aug 21, 2020 · 4 comments
Labels
bug go-port Things which will be addressed in the Go port AKA Miller 6

Comments

@Sharpie
Copy link

Sharpie commented Aug 21, 2020

When parsing date strings, the strptime function appears to discard time zone information provided by the %z format specifier and always uses the time zone of the machine running mlr.

Reproduction Case

Round-tripping a RFC 3339 date through mlr results in Midnight, 4 hours west of UTC being printed as Midnight UTC.

$ uname -a
Linux jumpbox-01 3.10.0-1062.1.2.el7.x86_64 #1 SMP Mon Sep 30 14:19:46 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

$ date +'%z'
+0000

$ mlr --version
Miller v5.9.0

$ printf 'time=2020-08-20T00:00:00-0400\n' |mlr put -S '$time=strftime(strptime($time, "%Y-%m-%dT%H:%M:%S%z"), "%Y-%m-%dT%H:%M:%S%z")'

time=2020-08-20T00:00:00+0000

Expected Behavior

mlr shifts the parsed date to UTC and prints 4 AM UTC, as date does:

$ date -d "2020-08-20T00:00:00-0400" +"%Y-%m-%dT%H:%M:%S%z"

2020-08-20T04:00:00+0000

Or, mlr retains the time zone information as ruby does:

$ ruby -rdate -e 'puts DateTime.strptime("2020-08-20T00:00:00-0400", "%Y-%m-%dT%H:%M:%S%z").rfc3339'

2020-08-20T00:00:00-04:00

Workaround

Provided all timestamps being parsed are from the same time zone, running mlr with the TZ environment variable set to that time zone and using strptime_local will produce the expected results:

printf 'time=2020-08-20T00:00:00-0400\n' |env TZ='UTC+4' mlr put -S '$time=strftime(strptime_local($time, "%Y-%m-%dT%H:%M:%S%z"), "%Y-%m-%dT%H:%M:%S%z")'

time=2020-08-20T04:00:00+0000
@Sharpie
Copy link
Author

Sharpie commented Aug 21, 2020

Noticed this while parsing some Apache logs with [%d/%b/%Y:%H:%M:%S %z] as the format specifier and plotting the results against other data. After much confusion, realized "this would all make sense if these webserver events occurred 4 hours later" and discovered the shift.

@johnkerl
Copy link
Owner

This will be addressed by the Go port.

@johnkerl johnkerl added the go-port Things which will be addressed in the Go port AKA Miller 6 label Sep 15, 2020
@johnkerl
Copy link
Owner

@Sharpie sorry for the long delay!

The Go port has been a long process and I'm doing datetime mods as one of the final steps.

In Miller 6 this works as expected:

$ export TZ=America/Sao_Paulo

$ printf 'time=2020-08-20T00:00:00-0400\n' | mlr put '$time=strftime(strptime($time, "%Y-%m-%dT%H:%M:%S%z"), "%Y-%m-%dT%H:%M:%S%z")'
time=2020-08-20T04:00:00+0000

$ export TZ=Asia/Istanbul

$ printf 'time=2020-08-20T00:00:00-0400\n' | mlr put '$time=strftime(strptime($time, "%Y-%m-%dT%H:%M:%S%z"), "%Y-%m-%dT%H:%M:%S%z")'
time=2020-08-20T04:00:00+0000

$ export TZ=

$ printf 'time=2020-08-20T00:00:00-0400\n' | mlr put '$time=strftime(strptime($time, "%Y-%m-%dT%H:%M:%S%z"), "%Y-%m-%dT%H:%M:%S%z")'
time=2020-08-20T04:00:00+0000

@Sharpie
Copy link
Author

Sharpie commented Oct 19, 2021

No worries about the delay! Thanks for building such an amazing tool --- mlr and jq do an outstanding job of filling in gaps left by awk.

@johnkerl johnkerl removed the active label Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug go-port Things which will be addressed in the Go port AKA Miller 6
Projects
None yet
Development

No branches or pull requests

2 participants