Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@time macro can have huge error #39760

Closed
ufechner7 opened this issue Feb 20, 2021 · 52 comments
Closed

@time macro can have huge error #39760

ufechner7 opened this issue Feb 20, 2021 · 52 comments
Labels

Comments

@ufechner7
Copy link

ufechner7 commented Feb 20, 2021

Steps to reproduce:

mkdir temp
cd temp
git clone https://github.com/ufechner7/KiteViewer.git
cd KiteViewer
julia --project
using Pkg
Pkg.instantiate()
exit()
./create_sys_image.sh

And then execute the test:

./runjulia
include("test/time_bug.jl")

Result:

# [ Info:  started timer at: 2021-02-20T13:58:23.814
#   0.668130 seconds (933.81 k allocations: 55.527 MiB, 1.70% gc time, 98.76% compilation time)
# [ Info:          2.896614005s: 2 seconds, 896 milliseconds

# julia> 2.8966/0.668
# 4.336227544910179

# @time reports 0.67 seconds, while the real execution time incl. compilation is more like 2.90 seconds

This means that the @time macro reports a time that is more than a factor of four lower then the real execution time.

The same problem happens without using a custom sysimage, but the relative error is smaller (8s vs 5s).
This are times that I can measure with my stopwatch, so I can confirm that the time reported by
tick and tock is much closer to the real time than the time reported by the @time macro.

I would consider any error of more then 10% as a bug, in particular if the measured time is above 1s.

Any comment?

@DilumAluthge
Copy link
Member

@IanButterworth

@IanButterworth
Copy link
Sponsor Member

Have you checked whether the same happens in 1.5?

If it does, it could be that execution time is dominated by interpretation, given @time's result doesn't include interpretation

@ufechner7
Copy link
Author

Well, measuring the time is simple: Record the start time and the end time and calculate the difference. If the @time macro is not doing that, then it has a major design flaw that should be fixed. Just my two cents...

@IanButterworth
Copy link
Sponsor Member

IanButterworth commented Feb 20, 2021

Yeah, it's a fair point. There has been some efforts to fix it #37938

It's not clear if your issue is specific to 1.6 though, and it's still just a guess that missing interpreter top level compilation time is the cause.
Perhaps an easier thing than testing 1.5.3 would be to just compare in 1.6 the result of @time vs. the number returned from @elapsed given that @elapsed didn't change in 1.6.

@ufechner7
Copy link
Author

ufechner7 commented Feb 21, 2021

More or less the same results when using @Elapsed:

# Output:
# julia> include("test/time2_bug.jl")
# 
# [ Info:  started timer at: 2021-02-20T13:58:23.814
#   0.670841 
# [ Info:          2.92531268s: 2 seconds, 925 milliseconds

# julia> 2.9253/0.6708
# 4.360912343470483

# @elapsed reports 0.67 seconds, while the real execution time incl. compilation is more like 2.9 seconds

@IanButterworth
Copy link
Sponsor Member

Ok. Seems like it's an issue of missing the top-level compilation, and not specific to 1.6
Perhaps it's worth renaming this issue to remove the 1.6rc1?

I may open an issue to specifically track @time missing top-level compilation time, given it's an ongoing issue that's been a bit buried in PR comments etc.

@KristofferC KristofferC changed the title @time macro can have huge error on 1.6rc1 @time macro can have huge error Feb 21, 2021
@timholy
Copy link
Sponsor Member

timholy commented Feb 21, 2021

Why do we force-compile @time blocks?

while false; end # compiler heuristic: compile this block (alter this if the heuristic changes)

So presumably what's happening is this:

  • tick starts
  • Julia starts compiling the @time block. Notably, this compiles things that can be inferred from the top-level @time code, the ex payload, & passed types. In this case, that's most of the time. This all happens before @time's own internal timers start, because compilation happens before execution.
  • @time starts its own timers
  • it executes the ex passed to time. Anything that needed runtime dispatch gets compiled now (any non-inferrable internal calls from the ex payload)
  • @time wraps up collecting data & returns
  • tock runs

So tick/tock include the time needed to compile this whereas @time's report does not.

It seems like commenting-out the while false; end might fix it but it would be nice to know why it's there. git blame suggests #35353 but that was just a code move. To me it seems that if we precompile all of @time's dependencies (which we do) then having this block run in compiled mode rather than interpreted mode is wrong. Maybe it was added to improve accuracy of @time for fast-running code? Isn't that what @btime is for?

@KristofferC
Copy link
Sponsor Member

git blame suggests #35353 but that was just a code move

Origin is #33717

@timholy
Copy link
Sponsor Member

timholy commented Feb 21, 2021

CC @vtjnash, any thoughts?

@ufechner7
Copy link
Author

I did the same test on Julia 1.5.3:

# On Julia 1.5.3:
# julia> include("test/time_bug.jl")
# [ Info:  started timer at: 2021-02-21T10:59:55.463
#   0.811249 seconds (1.94 M allocations: 103.003 MiB, 2.59% gc time)
# [ Info:          2.161581335s: 2 seconds, 161 milliseconds

# julia> 2.161/0.811
# 2.664611590628853

# @time reports 0.81 seconds, while the real execution time incl. compilation is more like 2.16 seconds

This means the error is a factor of 2.7 on Julia 1.5.3 and a factor of 4.33 on Julia 1.6.0rc1 .

@ufechner7
Copy link
Author

ufechner7 commented Feb 21, 2021

As far as I see there are two options to fix this issue:
a. merge #37938
b. delete the line while false; end from the @time macro

What are the advantages and disadvantages of these two possible fixes?

@IanButterworth
Copy link
Sponsor Member

IanButterworth commented Feb 21, 2021

Some testing on master

julia> @eval function f(a)  # Expensive function to compile. From https://github.com/JuliaLang/julia/pull/37678#discussion_r495515401
                  $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:100]...) end)
                  $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:100]...) end)
                  return a
              end
f (generic function with 1 method)

julia> @time f(2)
  0.000000 seconds
-3.706395705243192e209

julia> @time x = 1
  0.000000 seconds
1

julia> Revise.track(Base)

## `while false; end` removed from @time

julia> @time 1 # Let Revise recompile @time itself given change (the % overflow is excusable here)
  0.000001 seconds (30 allocations: 2.188 KiB, 196153.24% compilation time)
1

julia> @eval function f(a)   # Note the slight change to indices
                  $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:101]...) end)
                  $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:101]...) end)
                  return a
              end
f (generic function with 1 method)

julia> @time f(2)
  0.896492 seconds (1.54 M allocations: 65.063 MiB, 1.47% gc time, 100.00% compilation time)
-3.063136946481976e207

julia> @time x = 1
  0.000001 seconds (6 allocations: 192 bytes)
1

So it's necessary to make allocations accurate.

If we remove the while false; end can we just subtract the baseline (6 allocations: 192 bytes) from the report??
We could add a unit test to ensure that(6 allocations: 192 bytes) stays the baseline

@ufechner7
Copy link
Author

Sounds good. But this: 196153.24% compilation time also needs to be fixed.

@KristofferC
Copy link
Sponsor Member

I take it you missed the comment above it.

@ufechner7
Copy link
Author

I take it you missed the comment above it.

Indeed, I missed the comment. @time 1 is indeed something in practice nobody will execute, so I agree that the % overflow is excusable here. On the other hand: compilation_time/total_time should never be greater then one, or do I miss something?

@IanButterworth
Copy link
Sponsor Member

That's not the point. Recompiling @time itself (via revise) is a singular edge case we don't need to worry about

@IanButterworth
Copy link
Sponsor Member

can we just subtract the baseline (6 allocations: 192 bytes) from the report??

Maybe that's not going to work

julia> @time x = 1
  0.000001 seconds (5 allocations: 176 bytes)
1

julia> @time x = 1
  0.000001 seconds (6 allocations: 192 bytes)
1

julia> @time x = 1
  0.000001 seconds (5 allocations: 176 bytes)
1

julia> @time x = 1
  0.000001 seconds (6 allocations: 192 bytes)
1

julia> @time x = 1
  0.000001 seconds (6 allocations: 192 bytes)
1

@ufechner7
Copy link
Author

Well, we can subtract 6 and 192 and set it to zero if it is negative ...

@IanButterworth
Copy link
Sponsor Member

@ufechner7 would you mind trying out #39779 to see if that kind of report helps?

@ufechner7
Copy link
Author

Yes, but how can I check it out? I tried:

ufechner@TUD277255:~/repos/julia$ git checkout cf1226e55d033336f109ae0ffc06058e2f050d65
fatal: reference is not a tree: cf1226e55d033336f109ae0ffc06058e2f050d65
ufechner@TUD277255:~/repos/julia$ 

@IanButterworth
Copy link
Sponsor Member

git checkout ib/time_include_topleveloverhead

@ufechner7
Copy link
Author

ufechner7 commented Feb 22, 2021

Current branch master is up to date.
ufechner@TUD277255:~/repos/julia$ git checkout ib/time_include_topleveloverhead
error: pathspec 'ib/time_include_topleveloverhead' did not match any file(s) known to git.
ufechner@TUD277255:~/repos/julia$ 

@ufechner7
Copy link
Author

I did:

git fetch origin pull/39779/head:topleveloverhead
git checkout topleveloverhead
make clean
make -j4

updated the registry, instantiated the KiteViewer project,
started julia with

julia --project

and executed

julia> include("test/time_bug.jl")
[ Info:  started timer at: 2021-02-22T13:26:00.081
[ Info:          0.182874173s: 182 milliseconds
[ Info:  started timer at: 2021-02-22T13:26:00.285
  4.517629 seconds (10.18 M allocations: 604.539 MiB, 3.17% gc time, 12.87% compilation time)
[ Info:          7.926560815s: 7 seconds, 926 milliseconds

julia> 

Conclusion: Problem not solved. Did I check out the wrong branch/ pullrequest?

@simeonschaub
Copy link
Member

You probably need to call ./julia instead of julia, since the latter is likely just your global Julia installation.

@ufechner7
Copy link
Author

ufechner7 commented Feb 22, 2021

You probably need to call ./julia instead of julia, since the latter is likely just your global Julia installation.

No, just julia is good enough:

ufechner@TUD277255:~/repos/KiteViewer$ julia --project
[ Info:  started timer at: 2021-02-22T15:04:18.554
[ Info:          0.582521124s: 582 milliseconds
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0-DEV.583 (2021-02-21)
 _/ |\__'_|_|_|\__'_|  |  topleveloverhead/cf1226e55d* (fork: 1 commits, 1 day)
|__/                   |

julia> 

@simeonschaub
Copy link
Member

Sorry, I stand corrected

@IanButterworth
Copy link
Sponsor Member

That branch is far from well tested, but if it is top-level compilation, I believe you should be seeing the + report

julia> using TickTock

julia> @eval function f(a)
                  $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:400]...) end)
                  $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:400]...) end)
                  return a
              end
f (generic function with 1 method)

julia> begin
       tick();tock()

       tick()
       @time f(2)
       tock()
       end
[ Info:  started timer at: 2021-02-22T09:30:57.712
[ Info:          0.000112848s: empty period
[ Info:  started timer at: 2021-02-22T09:30:57.713
  0.000000 seconds + 1.7 seconds of top-level compilation
[ Info:          0.000205973s: empty period

but notice that in this case top-level compilation isn't affecting the tick tock time, the compilation is happening before tick.

@ufechner7 perhaps something else is going on with your code. Could you try replacing tock() with println(tok()). I'm wondering whether in your case the problem is invalidation of the logging system.

Or perhaps replace tick() and tock() with before = Base.time_ns() and println((Base.time_ns() - before) / 1e9, " seconds")

@ufechner7
Copy link
Author

I tried your example after deleteing startup.jl.

julia> using TickTock

julia> @eval function f(a)
                         $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:400]...) end)
                         $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:400]...) end)
                         return a
                     end
f (generic function with 1 method)

julia> begin
             tick();tock()

             tick()
             @time f(2)
             tock()
             end
[ Info:  started timer at: 2021-02-22T18:07:51.596
[ Info:          0.280492316s: 280 milliseconds
[ Info:  started timer at: 2021-02-22T18:07:51.928
  0.000000 seconds
[ Info:          0.000108435s: empty period

julia> 

So I think we are not testing the same Julia. Can you please double check the way I checked out your code?

@IanButterworth
Copy link
Sponsor Member

Looks like you're on the correct commit hash. Your computer might just be faster than mine and doing the top-level compilation faster than 0.5 seconds, which is the cutoff in that PR (which may be a bad design decision given this kind of confusion!).

In the example function try increasing the 400 to 1000 or greater

@ufechner7
Copy link
Author

Not much difference:

ufechner@TUD277255:~/repos/KiteViewer$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.0-DEV.583 (2021-02-21)
 _/ |\__'_|_|_|\__'_|  |  topleveloverhead/cf1226e55d* (fork: 1 commits, 1 day)
|__/                   |

julia> using TickTock

julia> @eval function f(a)
                         $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:2000]...) end)
                         $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:2000]...) end)
                         return a
                     end
f (generic function with 1 method)

julia> begin
             tick();tock()

             tick()
             @time f(2)
             tock()
             end
[ Info:  started timer at: 2021-02-22T18:42:59.924
[ Info:          0.288659268s: 288 milliseconds
[ Info:  started timer at: 2021-02-22T18:43:00.270
  0.000000 seconds
[ Info:            6.8015e-5s: empty period

julia> 

@ufechner7
Copy link
Author

ufechner7 commented Feb 22, 2021

I am not using Revise in the moment ...

In case it matters: I am using:

  • Ubuntu 18.04
  • CPU Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
  • RAM: 32 GBytes
  • disk: PNY CS3030 1TB SSD (m.2 SSD)

@JeffBezanson
Copy link
Sponsor Member

As folks here figured out, this originated mostly in wanting accurate allocation counts, and because most people want @time f() to mean "how long does f take to run", not the latency around it. If you really want the surrounding latency, it's fairly easy to get with other tools like unix time or calling time_ns yourself. So I think what we have is the right definition. Another workaround is to do @time @eval f().

@oscardssmith
Copy link
Member

I disagree with this. In my opinion, @btime is what to use if you want to know how long a program takes. @time should be the preferred way of measuring the time of a single run including all associated compilation time.

@KristofferC
Copy link
Sponsor Member

It seems odd to me to want

@time foo()

to give a very different result from

function g()
    @time foo()
end
g()

@ericphanson
Copy link
Contributor

Is it that strange though? I think it would be nice to easily get both numbers, considering they represent different things. Moreover, I think it's pretty weird to see @time say it's really fast when your experience sitting behind the keyboard is very different.

@JeffBezanson
Copy link
Sponsor Member

It is easy to get both numbers: @time x and @time @eval x.

@IanButterworth
Copy link
Sponsor Member

I didn't know that trick! That seems like the solution..

Should we wrap it into @time_eval or just try to promote it in the docs?

@ufechner7
Copy link
Author

ufechner7 commented Feb 23, 2021

From my point of view it would then be more logical to have an @time macro that gives the accurate time and an @allocate macro that gives an accurate number of allocations. What is the use of an @time macro that prints a duration where nobody but very few experts understand the meaning of the duration that it measures?

From software engineering I like POLA: Principle of Least Astonishment. Don't violate that.

@ufechner7
Copy link
Author

ufechner7 commented Feb 23, 2021

I can confirm that @time @eval measures the correct time (well, within a small error margin) including the full compilation time:

include("../src/Utils.jl")
using TickTock
tick();tock()

tick()
@time @eval Utils.test(true);
tock()


# julia> include("test/time_bug3.jl")
# ...
# [ Info:  started timer at: 2021-02-23T02:00:01.993
#   2.870946 seconds (9.14 M allocations: 548.666 MiB, 9.90% gc time, 99.96% compilation time)
# [ Info:          2.910599207s: 2 seconds, 910 milliseconds

# @time @eval reports 2.87 seconds, which comes close to the execution time incl. compilation of 2.9s
#             measured by ticktock

# If executed repeatedly, the time is decreasing for a number of executions:
# julia> @time @eval Utils.test(true);
#   0.330632 seconds (495.98 k allocations: 27.351 MiB, 2.96% gc time, 82.00% compilation time)

# julia> @time @eval Utils.test(true);
#   0.004910 seconds (4.32 k allocations: 989.578 KiB)

# julia> @time @eval Utils.test(true);
#   0.001892 seconds (4.32 k allocations: 989.609 KiB)

# julia> @time @eval Utils.test(true);
#   0.002338 seconds (4.32 k allocations: 989.609 KiB)

# julia> @time @eval Utils.test(true);
#   0.001538 seconds (4.32 k allocations: 989.609 KiB)

The above was run on Julia 1.6rc1.

@ufechner7
Copy link
Author

ufechner7 commented Feb 23, 2021

But which time is @time alone currently measuring? Can anybody give a clear definition of that?

@JeffBezanson
Copy link
Sponsor Member

The elapsed time of the execution of the given expression, after the compiler has done whatever it can and after any other setup overhead. So the reported compilation time is extra compilation that had to be done at run time since it could not be predicted in advance. That's arguably useful, since it tells you something about how statically-compileable your code is.

@timholy
Copy link
Sponsor Member

timholy commented Feb 23, 2021

It's not terribly difficult to capture both with a single call:

macro toptime(ex)
    val = gensym("val")
    ex = quote
        while false; end # compiler heuristic: compile this block (alter this if the heuristic changes)
        local stats = Base.gc_num()
        local compile_elapsedtime = Base.cumulative_compile_time_ns_before()
        local elapsedtime = time_ns()
        $val = $(esc(ex))
        elapsedtime = time_ns() - elapsedtime
        compile_elapsedtime = Base.cumulative_compile_time_ns_after() - compile_elapsedtime
        local diff = Base.GC_Diff(Base.gc_num(), stats)
        Base.time_print(elapsedtime, diff.allocd, diff.total_time, Base.gc_alloc_count(diff), compile_elapsedtime, true)
    end
    touter = gensym("touter")
    Expr(:toplevel, :($touter = time_ns()), ex, quote
        $touter = time_ns() - $touter
        println("outer time: ", $touter/1e9)
        $val
    end)
end

But this can only be used from toplevel. Unless there's another way to split an expression that says "compile this block but not this one"?

@ufechner7
Copy link
Author

ufechner7 commented Feb 23, 2021

I tested the macro toptime:

julia> include("test/time_bug4.jl")
...
[ Info:  started timer at: 2021-02-23T21:04:42.278
  0.668911 seconds (934.21 k allocations: 55.547 MiB, 1.49% gc time, 99.37% compilation time)
outer time: 2.925239483
[ Info:          3.108777514s: 3 seconds, 108 milliseconds 

Printing could be approved, and not so sure about the accuracy. But in principle it works. @time @eval gives much higher allocations, though:

 2.870946 seconds (9.14 M allocations: 548.666 MiB, 9.90% gc time, 99.96% compilation time)

Also not clear to me: I am using already a custom sysimage and the same function Utils.test(true) was also called from PackageCompiler. Why is the outer time nevertheless so high?

@timholy
Copy link
Sponsor Member

timholy commented Feb 23, 2021

There does seem to be a bizarre lag. I can't take time to figure it out now; I like the idea of advertising the @eval trick in the @time docstring.

@IanButterworth
Copy link
Sponsor Member

I'd be up opening a PR on that.

Perhaps not worth worrying too much about 0.02% but it's the first time I've seen the comp time report be > 100% (and not be a overflow error)

julia> @eval function f(a)  # Expensive function to compile. (Thanks @yuyichao!)
                         $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:401]...) end)
                         $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:401]...) end)
                         return a + 3
                     end
f (generic function with 1 method)

julia> @time @eval f(2)
 13.801006 seconds (6.26 M allocations: 317.365 MiB, 3.65% gc time, 100.02% compilation time)

@ufechner7
Copy link
Author

ufechner7 commented Feb 24, 2021

I can reproduce your result for @time @eval f(2).
A value of more than 100% compilation time is an indication of an error in the algorithm that calculates this value.

@vchuravy
Copy link
Member

@ufechner7 Please use backticks to quote macros @time and @eval might not want to be pinged.

@ufechner7
Copy link
Author

Done.

@ufechner7
Copy link
Author

Bump! There is a pull request to improve the documentation. Can this become part of Julia 1.6, please?

@IanButterworth
Copy link
Sponsor Member

#39802 is now in 1.6.0.

Perhaps the issue can remain open to see how people get on with the @time vs. @time @eval trick

@goerch
Copy link

goerch commented Apr 30, 2022

We had a discussion here and the trick solved the mystery for me. Only unusual observation is that not all tester confirmed the problem. Could it be platform specific (in this case only Windows)?

@brenhinkeller
Copy link
Sponsor Contributor

I guess we can close this now -- the @time @eval trick is maybe not quite as well known as it should be, but it's about as prominently mentioned as it possibly could be in the @time docstring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests