Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a donotdelete builtin #44036

Merged
merged 1 commit into from
Feb 9, 2022
Merged

Add a donotdelete builtin #44036

merged 1 commit into from
Feb 9, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Feb 4, 2022

In #43852 we noticed that the compiler is getting good enough to
completely DCE a number of our benchmarks. We need to add some sort
of mechanism to prevent the compiler from doing so. This adds just
such an intrinsic. The intrinsic itself doesn't do anything, but
it is considered effectful by our optimizer, preventing it from
being DCE'd. At the LLVM level, it turns into a volatile store to
an alloca (or an llvm.sideeffect if the values passed to the
dcebarrier do not have any actual LLVM-level representation).

The docs for the new intrinsic are as follows:

    dcebarrier(args...)

This function prevents dead-code elimination (DCE) of itself and any arguments
passed to it, but is otherwise the lightest barrier possible. In particular,
it is not a GC safepoint, does model an observable heap effect, does not expand
to any code itself and may be re-ordered with respect to other side effects
(though the total number of executions may not change).

A useful model for this function is that it hashes all memory `reachable` from
args and escapes this information through some observable side-channel that does
not otherwise impact program behavior. Of course that's just a model. The
function does nothing and returns `nothing`.

This is intended for use in benchmarks that want to guarantee that `args` are
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).

**Note**: `dcebarrier` does not affect constant foloding. For example, in
          `dcebarrier(1+1)`, no add instruction needs to be executed at runtime and
          the code is semantically equivalent to `dcebarrier(2).`

*# Examples

function loop()
    for i = 1:1000
        # The complier must guarantee that there are 1000 program points (in the correct
       	# order) at which the value of `i` is in a register, but has otherwise
        # total control over the program.
        dcebarrier(i)
    end
end

I believe the voltatile store at the LLVM level is actually somewhat
stronger than what we want here. Ideally the dcebarrier would not
and up generating any machine code at all and would also be compatible
with optimizations like SROA and vectorization. However, I think this
is fine for now.

@tkf
Copy link
Member

tkf commented Feb 4, 2022

This seems to be related to @vchuravy's JuliaCI/BenchmarkTools.jl#92 which included clobber() and escape(). IIUC, similar ASM-level hack is in google/benchmark and called DoNotOptimize(...) and ClobberMemory()

Can we have a more intuitive name like google/benchmark's DoNotOptimize?

I believe the voltatile store at the LLVM level is actually somewhat
stronger than what we want here.

Can we use call void asm sideeffect "", "X,~{memory}"($name %0)?

https://github.com/JuliaCI/BenchmarkTools.jl/pull/92/files#diff-7a1b40723def106eb9bf5c19254f26077a5b8e09741f69198e39365c306c99bcR57

@Keno
Copy link
Member Author

Keno commented Feb 4, 2022

Can we use call void asm sideeffect "", "X,~{memory}"($name %0)?

Yes, but that has the same optimizability challenges, and perhaps even more. I thought the volatile store might at least have some chance of not interfering with loop vectorization.

@Keno
Copy link
Member Author

Keno commented Feb 4, 2022

I believe the voltatile store at the LLVM level is actually somewhat
stronger than what we want here. Ideally the dcebarrier would not
and up generating any machine code at all and would also be compatible
with optimizations like SROA and vectorization.

@preames any thoughts on this?

@Keno
Copy link
Member Author

Keno commented Feb 4, 2022

Can we have a more intuitive name like google/benchmark's DoNotOptimize?

It's intended to be consistent with Base.inferencebarrier.

Keno added a commit to JuliaCI/BaseBenchmarks.jl that referenced this pull request Feb 4, 2022
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).

**Note**: `dcebarrier` does not affect constant foloding. For example, in
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note**: `dcebarrier` does not affect constant foloding. For example, in
**Note**: `dcebarrier` does not affect constant folding. For example, in

@preames
Copy link

preames commented Feb 4, 2022

There's some prior art on this type of thing in Java with JMH's Blackhole.consume. Naming wise, I would find something along those lines better than dcebarrier. As can already be seen in the discussion above, use of the word "barrier" gives the impression that the call has memory effects, whereas that seems not to be the intent per the draft wording.

Implementation wise, I would start by lowering to an external function call marked "inaccessiblememonly nounwind willreturn" at the LLVM level. This would have some cost - the actual call sequence - but should have minimal impact on optimization.

I would be leery of the volatile store to alloca lowering. volatiles are generally not touched, but there is precedent for removing them if the location being touched is well understood. An alloca seems like an entirely reasonable location for the compiler to assume is not memory mapped IO.

Once implemented with the external call, we could chose to add an LLVM intrinsic with the same meaning. I think this is a broadly reuseable concept, and probably wouldn't be too hard to get upstream.

@Keno
Copy link
Member Author

Keno commented Feb 4, 2022

blackhole(args...) is a pretty good name

@Keno
Copy link
Member Author

Keno commented Feb 7, 2022

Upon discussion with @JeffBezanson and @vtjnash, they preferred a name that did not require a graduate course on the blackhole information paradox in order to build the correct intuition about whether or not the optimizer is allowed to delete the value or not. We ultimately settled on donotdelete(args...). donotoptimize(args...) was considered bad, because all kinds of optimization are generally allowed, except that it must be computed eventually.

Keno added a commit to JuliaCI/BenchmarkTools.jl that referenced this pull request Feb 8, 2022
@Keno
Copy link
Member Author

Keno commented Feb 8, 2022

Alright, I guess, we should merge the BenchmarkTools version first, then do a nanosoldier run here to see what the effect is (we expect regressions because it's a change in what's being benchmarked), just so we have a baseline.

@KristofferC
Copy link
Sponsor Member

The new BenchmarkTools version also has to get deployed explicitly on Nanosoldier.

@Keno
Copy link
Member Author

Keno commented Feb 8, 2022

The new BenchmarkTools version also has to get deployed explicitly on Nanosoldier.

I've tagged BenchmarkTools 1.3 and according to @vtjnash, Nanosolider will pick up the latest registered version, so we'll wait for that to go through. I'll rebase this in the meantime, since it's accumulated conflicts.

In #43852 we noticed that the compiler is getting good enough to
completely DCE a number of our benchmarks. We need to add some sort
of mechanism to prevent the compiler from doing so. This adds just
such an intrinsic. The intrinsic itself doesn't do anything, but
it is considered effectful by our optimizer, preventing it from
being DCE'd. At the LLVM level, it turns into a volatile store to
an alloca (or an llvm.sideeffect if the values passed to the
`dcebarrier` do not have any actual LLVM-level representation).

The docs for the new intrinsic are as follows:
```
    dcebarrier(args...)

This function prevents dead-code elimination (DCE) of itself and any arguments
passed to it, but is otherwise the lightest barrier possible. In particular,
it is not a GC safepoint, does model an observable heap effect, does not expand
to any code itself and may be re-ordered with respect to other side effects
(though the total number of executions may not change).

A useful model for this function is that it hashes all memory `reachable` from
args and escapes this information through some observable side-channel that does
not otherwise impact program behavior. Of course that's just a model. The
function does nothing and returns `nothing`.

This is intended for use in benchmarks that want to guarantee that `args` are
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).

**Note**: `dcebarrier` does not affect constant foloding. For example, in
          `dcebarrier(1+1)`, no add instruction needs to be executed at runtime and
          the code is semantically equivalent to `dcebarrier(2).`

*# Examples

function loop()
    for i = 1:1000
        # The complier must guarantee that there are 1000 program points (in the correct
       	# order) at which the value of `i` is in a register, but has otherwise
        # total control over the program.
        dcebarrier(i)
    end
end
```

I believe the voltatile store at the LLVM level is actually somewhat
stronger than what we want here. Ideally the `dcebarrier` would not
and up generating any machine code at all and would also be compatible
with optimizations like SROA and vectorization. However, I think this
is fine for now.
@oscardssmith oscardssmith changed the title Add a DCE barrier builtin Add a donotdelete builtin Feb 8, 2022
@Keno
Copy link
Member Author

Keno commented Feb 8, 2022

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Something went wrong when running your job:

NanosoldierError: error when preparing/pushing to report repo: failed process: Process(setenv(`git push`; dir="/nanosoldier/workdir/NanosoldierReports"), ProcessExited(1)) [1]

Unfortunately, the logs could not be uploaded.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 9, 2022

Not too bad. None get faster (of course), but only a handful got badly affected: https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_hash/95a9e7f_vs_60f414e/report.md

@Keno
Copy link
Member Author

Keno commented Feb 9, 2022

Yep, pretty much as expected. The benchmarks that got affected are the scalar ones that are essentially trivial, so it's like for LLVM to have deleted them. Looks like this is working. Excellent.

@Keno Keno merged commit a947fc7 into master Feb 9, 2022
@Keno Keno deleted the kf/dcebarrier branch February 9, 2022 06:36
@DilumAluthge
Copy link
Member

Is it possible that this PR broke Windows CI?

@Keno
Copy link
Member Author

Keno commented Feb 9, 2022

So it did. Looks like the new test failed. Will fix.

Comment on lines +479 to +481
FnAttrs.addAttribute(C, Attribute::InaccessibleMemOnly);
FnAttrs.addAttribute(C, Attribute::WillReturn);
FnAttrs.addAttribute(C, Attribute::NoUnwind);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an AttrBuilder. These calls do no have any effects and will be deleted.

/Users/jameson/julia1/src/codegen.cpp:479:5: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result]
    FnAttrs.addAttribute(C, Attribute::InaccessibleMemOnly);
    ^~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/jameson/julia1/src/codegen.cpp:480:5: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result]
    FnAttrs.addAttribute(C, Attribute::WillReturn);
    ^~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~
/Users/jameson/julia1/src/codegen.cpp:481:5: warning: ignoring return value of function declared with 'warn_unused_result' attribute [-Wunused-result]
    FnAttrs.addAttribute(C, Attribute::NoUnwind);
    ^~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by #44097

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 13, 2022

backport?

antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
In JuliaLang#43852 we noticed that the compiler is getting good enough to
completely DCE a number of our benchmarks. We need to add some sort
of mechanism to prevent the compiler from doing so. This adds just
such an intrinsic. The intrinsic itself doesn't do anything, but
it is considered effectful by our optimizer, preventing it from
being DCE'd. At the LLVM level, it turns into call to an external
varargs function.

The docs for the new intrinsic are as follows:
```
    donotdelete(args...)

This function prevents dead-code elimination (DCE) of itself and any arguments
passed to it, but is otherwise the lightest barrier possible. In particular,
it is not a GC safepoint, does model an observable heap effect, does not expand
to any code itself and may be re-ordered with respect to other side effects
(though the total number of executions may not change).

A useful model for this function is that it hashes all memory `reachable` from
args and escapes this information through some observable side-channel that does
not otherwise impact program behavior. Of course that's just a model. The
function does nothing and returns `nothing`.

This is intended for use in benchmarks that want to guarantee that `args` are
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).

**Note**: `donotdelete` does not affect constant foloding. For example, in
          `donotdelete(1+1)`, no add instruction needs to be executed at runtime and
          the code is semantically equivalent to `donotdelete(2).`

*# Examples

function loop()
    for i = 1:1000
        # The complier must guarantee that there are 1000 program points (in the correct
       	# order) at which the value of `i` is in a register, but has otherwise
        # total control over the program.
        donotdelete(i)
    end
end
```
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
In JuliaLang#43852 we noticed that the compiler is getting good enough to
completely DCE a number of our benchmarks. We need to add some sort
of mechanism to prevent the compiler from doing so. This adds just
such an intrinsic. The intrinsic itself doesn't do anything, but
it is considered effectful by our optimizer, preventing it from
being DCE'd. At the LLVM level, it turns into call to an external
varargs function.

The docs for the new intrinsic are as follows:
```
    donotdelete(args...)

This function prevents dead-code elimination (DCE) of itself and any arguments
passed to it, but is otherwise the lightest barrier possible. In particular,
it is not a GC safepoint, does model an observable heap effect, does not expand
to any code itself and may be re-ordered with respect to other side effects
(though the total number of executions may not change).

A useful model for this function is that it hashes all memory `reachable` from
args and escapes this information through some observable side-channel that does
not otherwise impact program behavior. Of course that's just a model. The
function does nothing and returns `nothing`.

This is intended for use in benchmarks that want to guarantee that `args` are
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).

**Note**: `donotdelete` does not affect constant foloding. For example, in
          `donotdelete(1+1)`, no add instruction needs to be executed at runtime and
          the code is semantically equivalent to `donotdelete(2).`

*# Examples

function loop()
    for i = 1:1000
        # The complier must guarantee that there are 1000 program points (in the correct
       	# order) at which the value of `i` is in a register, but has otherwise
        # total control over the program.
        donotdelete(i)
    end
end
```
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
In JuliaLang#43852 we noticed that the compiler is getting good enough to
completely DCE a number of our benchmarks. We need to add some sort
of mechanism to prevent the compiler from doing so. This adds just
such an intrinsic. The intrinsic itself doesn't do anything, but
it is considered effectful by our optimizer, preventing it from
being DCE'd. At the LLVM level, it turns into call to an external
varargs function.

The docs for the new intrinsic are as follows:
```
    donotdelete(args...)

This function prevents dead-code elimination (DCE) of itself and any arguments
passed to it, but is otherwise the lightest barrier possible. In particular,
it is not a GC safepoint, does model an observable heap effect, does not expand
to any code itself and may be re-ordered with respect to other side effects
(though the total number of executions may not change).

A useful model for this function is that it hashes all memory `reachable` from
args and escapes this information through some observable side-channel that does
not otherwise impact program behavior. Of course that's just a model. The
function does nothing and returns `nothing`.

This is intended for use in benchmarks that want to guarantee that `args` are
actually computed. (Otherwise DCE may see that the result of the benchmark is
unused and delete the entire benchmark code).

**Note**: `donotdelete` does not affect constant foloding. For example, in
          `donotdelete(1+1)`, no add instruction needs to be executed at runtime and
          the code is semantically equivalent to `donotdelete(2).`

*# Examples

function loop()
    for i = 1:1000
        # The complier must guarantee that there are 1000 program points (in the correct
       	# order) at which the value of `i` is in a register, but has otherwise
        # total control over the program.
        donotdelete(i)
    end
end
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants