Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimise check of UTC / GMT during parsing #286

Merged
merged 9 commits into from
Sep 9, 2020

Conversation

oxinabox
Copy link
Contributor

@oxinabox oxinabox commented Sep 4, 2020

The last remaining one from #282

This one is actually a fairly meaningful improvement over #282 (comment)

julia> @btime parse(ZonedDateTime, "2017-11-14 11:03:53 +0100", dateformat"yyyy-mm-dd HH:MM:SS zzzzz");
  700.641 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2016-04-11 08:00 UTC", dateformat"yyyy-mm-dd HH:MM ZZZ");
  612.665 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2000+00", dateformat"yyyyz");
  463.898 ns (13 allocations: 496 bytes)

julia> @btime parse(ZonedDateTime, Test.GenericString("2018-01-01 00:00 UTC"), dateformat"yyyy-mm-dd HH:MM ZZZ");
  6.772 μs (107 allocations: 3.55 KiB)

julia> @btime tryparse(ZonedDateTime, "2013-03-20 11:00:00+04:00", dateformat"y-m-d H:M:SSz");
  883.522 ns (14 allocations: 640 bytes)

julia> @btime tryparse(ZonedDateTime, "2016-04-11 08:00 EST", dateformat"yyyy-mm-dd HH:MM zzz");
  92.074 ns (0 allocations: 0 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006+0700");
  825.240 ns (14 allocations: 688 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006Z");
  627.679 ns (9 allocations: 416 bytes)

julia> @btime ZonedDateTime("2018-11-01-0600", dateformat"yyyy-mm-ddzzzz");
  611.247 ns (13 allocations: 544 bytes)

Much more than I expected.
I guess because it removed allocations.
I wonder if there are other allocations that can be removed.

@codecov-commenter
Copy link

codecov-commenter commented Sep 4, 2020

Codecov Report

Merging #286 into master will decrease coverage by 0.04%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
- Coverage   93.58%   93.53%   -0.05%     
==========================================
  Files          30       31       +1     
  Lines        1528     1532       +4     
==========================================
+ Hits         1430     1433       +3     
- Misses         98       99       +1     
Impacted Files Coverage Δ
src/compat.jl 75.00% <75.00%> (ø)
src/parse.jl 94.44% <100.00%> (ø)
src/types/variabletimezone.jl 100.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89da5b7...6d2ec18. Read the comment docs.

src/parse.jl Outdated Show resolved Hide resolved
@omus
Copy link
Member

omus commented Sep 4, 2020

Currently in master

julia> p1(name) = name in ("UTC", "GMT")
p1 (generic function with 1 method)

julia> @btime p1("ABC")
  15.051 ns (1 allocation: 32 bytes)
false

julia> @btime p1("UTC")
  9.519 ns (1 allocation: 32 bytes)
true

julia> @btime p1("GMT")
  12.639 ns (1 allocation: 32 bytes)
true

Current PR

julia> p2(name) = length(name) === 3 && @inbounds ((name[1] === 'U' && name[2]==='T' && name[3]==='C') || (name[1]==='G' && name[2]==='M' && name[3]==='T'))
p2 (generic function with 1 method)

julia> @btime p2("ABC")
  7.283 ns (0 allocations: 0 bytes)
false

julia> @btime p2("UTC")
  7.161 ns (0 allocations: 0 bytes)
true

julia> @btime p2("GMT")
  7.538 ns (0 allocations: 0 bytes)
true

Alternate

julia> p3(name) = name == "UTC" || name == "GMT"
p3 (generic function with 1 method)

julia> @btime p3("ABC")
  7.475 ns (0 allocations: 0 bytes)
false

julia> @btime p3("UTC")
  3.362 ns (0 allocations: 0 bytes)
true

julia> @btime p3("GMT")
  6.782 ns (0 allocations: 0 bytes)
true

src/parse.jl Outdated Show resolved Hide resolved
src/parse.jl Outdated Show resolved Hide resolved
Co-authored-by: Curtis Vogt <[email protected]>
@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 4, 2020

I was not prepared for it to be in on a tuple of strings being the slow part.

@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 4, 2020

Hmm rerunning the bencharm with the == in I am seeing it as much slower
In particular parse(ZonedDateTime, Test.GenericString("2018-01-01 00:00 UTC"), dateformat"yyyy-mm-dd HH:MM ZZZ"); is allocating more

julia> @btime parse(ZonedDateTime, "2017-11-14 11:03:53 +0100", dateformat"yyyy-mm-dd HH:MM:SS zzzzz");
  710.171 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2016-04-11 08:00 UTC", dateformat"yyyy-mm-dd HH:MM ZZZ");
  625.136 ns (15 allocations: 704 bytes)

julia> @btime parse(ZonedDateTime, "2000+00", dateformat"yyyyz");
  470.918 ns (13 allocations: 496 bytes)

julia> @btime parse(ZonedDateTime, Test.GenericString("2018-01-01 00:00 UTC"), dateformat"yyyy-mm-dd HH:MM ZZZ");
  7.486 μs (127 allocations: 4.20 KiB)

julia> @btime tryparse(ZonedDateTime, "2013-03-20 11:00:00+04:00", dateformat"y-m-d H:M:SSz");
  872.264 ns (14 allocations: 640 bytes)

julia> @btime tryparse(ZonedDateTime, "2016-04-11 08:00 EST", dateformat"yyyy-mm-dd HH:MM zzz");
  90.553 ns (0 allocations: 0 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006+0700");
  836.514 ns (14 allocations: 688 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006Z");
  631.869 ns (9 allocations: 416 bytes)

julia> @btime ZonedDateTime("2018-11-01-0600", dateformat"yyyy-mm-ddzzzz");
  613.408 ns (13 allocations: 544 bytes)

Shall i reverse the commit that changed to ==?

@omus
Copy link
Member

omus commented Sep 4, 2020

Shall i reverse the commit that changed to ==?

Let's try one more thing. What if you make "UTC" and "GMT" global constants?

Edit: Actually, the in call may be fast too if we declare the Tuple{String,String} as a const

@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 4, 2020

 Let's try one more thing. What if you make "UTC" and "GMT" global constants?

If that helps, we need to open a bug report on julia.

@oxinabox
Copy link
Contributor Author

oxinabox commented Sep 4, 2020

Let's try one more thing. What if you make "UTC" and "GMT" global constants?

No it doesn't help.

julia> @btime parse(ZonedDateTime, "2017-11-14 11:03:53 +0100", dateformat"yyyy-mm-dd HH:MM:SS zzzzz");
  702.759 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2016-04-11 08:00 UTC", dateformat"yyyy-mm-dd HH:MM ZZZ");
  672.657 ns (15 allocations: 704 bytes)

julia> @btime parse(ZonedDateTime, "2000+00", dateformat"yyyyz");
  504.984 ns (13 allocations: 496 bytes)

julia> @btime parse(ZonedDateTime, Test.GenericString("2018-01-01 00:00 UTC"), dateformat"yyyy-mm-dd HH:MM ZZZ");
  7.339 μs (127 allocations: 4.20 KiB)

julia> @btime tryparse(ZonedDateTime, "2013-03-20 11:00:00+04:00", dateformat"y-m-d H:M:SSz");
  878.184 ns (14 allocations: 640 bytes)

julia> @btime tryparse(ZonedDateTime, "2016-04-11 08:00 EST", dateformat"yyyy-mm-dd HH:MM zzz");
  90.995 ns (0 allocations: 0 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006+0700");
  832.155 ns (14 allocations: 688 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006Z");
  637.614 ns (9 allocations: 416 bytes)

julia> @btime ZonedDateTime("2018-11-01-0600", dateformat"yyyy-mm-ddzzzz");
  611.971 ns (13 allocations: 544 bytes)

I think the problem is the bencharks you used in
#286 (comment)
were using Strings
where-as the real code
and the benchmark i did in #282
used SubStrings.
and infact I did actually est in the original benchmarks ==
it is under is_UTC_or_GMT_or (second result from top).
And there it found == was not much faster than in

@omus
Copy link
Member

omus commented Sep 4, 2020

Using SubString.

Currently in master

julia> p1(name) = name in ("UTC", "GMT")
p1 (generic function with 1 method)

julia> @btime p1($(SubString("ABC")))
  49.681 ns (5 allocations: 224 bytes)
false

julia> @btime p1($(SubString("UTC")))
  36.300 ns (3 allocations: 128 bytes)
true

julia> @btime p1($(SubString("GMT")))
  57.718 ns (5 allocations: 224 bytes)
true

Current PR

julia> p2(name) = length(name) === 3 && @inbounds ((name[1] === 'U' && name[2]==='T' && name[3]==='C') || (name[1]==='G' && name[2]==='M' && name[3]==='T'))
p2 (generic function with 1 method)

julia> @btime p2($(SubString("ABC")))
  18.315 ns (0 allocations: 0 bytes)
false

julia> @btime p2($(SubString("UTC")))
  21.570 ns (0 allocations: 0 bytes)
true

julia> @btime p2($(SubString("GMT")))
  31.627 ns (0 allocations: 0 bytes)
true

Alternate

julia> p3(name) = name == "UTC" || name == "GMT"
p3 (generic function with 1 method)

julia> @btime p3($(SubString("ABC")))
  43.316 ns (4 allocations: 192 bytes)
false

julia> @btime p3($(SubString("UTC")))
  29.094 ns (2 allocations: 96 bytes)
true

julia> @btime p3($(SubString("GMT")))
  49.995 ns (4 allocations: 192 bytes)
true

There is probably a general performance fix we can do for all SubString equality then.

@omus
Copy link
Member

omus commented Sep 4, 2020

The allocating seems to come from ==. I made an alternate == which removes the allocations:

function alt(a::AbstractString, b::AbstractString)
    a === b && return true
    ncodeunits(a) != ncodeunits(b) && return false
    for (c::AbstractChar, d::AbstractChar) in zip(a, b)
        c != d && return false
    end
    return true
end

Current PR (Julia 1.5.1)

julia> p2(name) = length(name) === 3 && @inbounds ((name[1] === 'U' && name[2]==='T' && name[3]==='C') || (name[1]==='G' && name[2]==='M' && name[3]==='T'))
p2 (generic function with 1 method)

julia> @btime p2($(SubString("ABC")))

  18.557 ns (0 allocations: 0 bytes)
false

julia> @btime p2($(SubString("UTC")))
  21.320 ns (0 allocations: 0 bytes)
true

julia> @btime p2($(SubString("GMT")))
  25.836 ns (0 allocations: 0 bytes)
true

Alternate (Julia 1.5.1)

julia> p4(name) = alt(name, "UTC") || alt(name, "GMT")
p4 (generic function with 1 method)

julia> @btime p4($(SubString("ABC")))

  13.796 ns (0 allocations: 0 bytes)
false

julia> @btime p4($(SubString("UTC")))
  13.799 ns (0 allocations: 0 bytes)
true

julia> @btime p4($(SubString("GMT")))
  19.570 ns (0 allocations: 0 bytes)
true

On Julia 1.6.0-DEV.842 (594ac899ff) it appears == doesn't allocate at all. I'll track down when it was fixed. For now I'd propose using the custom equality function which keeps the code easy to read.

@omus
Copy link
Member

omus commented Sep 4, 2020

Looks like Julia 1.6.0-DEV.722 (JuliaLang/julia#37192) was where the problem was fixed.

@omus
Copy link
Member

omus commented Sep 8, 2020

Current PR (Julia 1.5.1)

julia> p3(name) = name == "UTC" || name == "GMT"
p3 (generic function with 1 method)

julia> @btime p3($(SubString("ABC")))
  8.193 ns (0 allocations: 0 bytes)
false

julia> @btime p3($(SubString("UTC")))
  6.663 ns (0 allocations: 0 bytes)
true

julia> @btime p3($(SubString("GMT")))
  9.507 ns (0 allocations: 0 bytes)
true

@omus
Copy link
Member

omus commented Sep 8, 2020

Current master (70df06e)

julia> using TimeZones, BenchmarkTools, Dates, Test

julia> @btime parse(ZonedDateTime, "2017-11-14 11:03:53 +0100", dateformat"yyyy-mm-dd HH:MM:SS zzzzz");
  785.343 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2016-04-11 08:00 UTC", dateformat"yyyy-mm-dd HH:MM ZZZ");
  746.769 ns (15 allocations: 704 bytes)

julia> @btime parse(ZonedDateTime, "2000+00", dateformat"yyyyz");
  520.869 ns (13 allocations: 496 bytes)

julia> @btime parse(ZonedDateTime, Test.GenericString("2018-01-01 00:00 UTC"), dateformat"yyyy-mm-dd HH:MM ZZZ");
  9.732 μs (136 allocations: 4.44 KiB)

julia> @btime tryparse(ZonedDateTime, "2013-03-20 11:00:00+04:00", dateformat"y-m-d H:M:SSz");
  1.044 μs (14 allocations: 640 bytes)

julia> @btime tryparse(ZonedDateTime, "2016-04-11 08:00 EST", dateformat"yyyy-mm-dd HH:MM zzz");
  103.878 ns (0 allocations: 0 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006+0700");
  946.227 ns (14 allocations: 688 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006Z");
  730.542 ns (9 allocations: 416 bytes)

julia> @btime ZonedDateTime("2018-11-01-0600", dateformat"yyyy-mm-ddzzzz");
  683.200 ns (13 allocations: 544 bytes)

Current PR:

julia> using TimeZones, BenchmarkTools, Dates, Test

julia> @btime parse(ZonedDateTime, "2017-11-14 11:03:53 +0100", dateformat"yyyy-mm-dd HH:MM:SS zzzzz");
  782.208 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2016-04-11 08:00 UTC", dateformat"yyyy-mm-dd HH:MM ZZZ");
  675.058 ns (13 allocations: 592 bytes)

julia> @btime parse(ZonedDateTime, "2000+00", dateformat"yyyyz");
  520.414 ns (13 allocations: 496 bytes)

julia> @btime parse(ZonedDateTime, Test.GenericString("2018-01-01 00:00 UTC"), dateformat"yyyy-mm-dd HH:MM ZZZ");
  8.336 μs (127 allocations: 4.20 KiB)

julia> @btime tryparse(ZonedDateTime, "2013-03-20 11:00:00+04:00", dateformat"y-m-d H:M:SSz");
  979.769 ns (14 allocations: 640 bytes)

julia> @btime tryparse(ZonedDateTime, "2016-04-11 08:00 EST", dateformat"yyyy-mm-dd HH:MM zzz");
  103.364 ns (0 allocations: 0 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006+0700");
  941.826 ns (14 allocations: 688 bytes)

julia> @btime ZonedDateTime("2000-01-02T03:04:05.006Z");
  726.115 ns (9 allocations: 416 bytes)

julia> @btime ZonedDateTime("2018-11-01-0600", dateformat"yyyy-mm-ddzzzz");
  683.013 ns (13 allocations: 544 bytes)

src/compat.jl Outdated Show resolved Hide resolved
src/compat.jl Outdated Show resolved Hide resolved
Co-authored-by: Eric Davies <[email protected]>
@omus omus merged commit af8797c into JuliaTime:master Sep 9, 2020
@omus omus added the performance Changes that impact code performance label Sep 23, 2020
kpamnany pushed a commit to RelationalAI-oss/TimeZones.jl that referenced this pull request May 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Changes that impact code performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants