Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FakeRead is semantically meaningful to miri (but gets optimized away) #97092

Open
nikomatsakis opened this issue May 16, 2022 · 13 comments
Open
Labels
A-mir-opt Area: MIR optimizations A-miri Area: The miri tool C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@nikomatsakis
Copy link
Contributor

nikomatsakis commented May 16, 2022

Given this sample program:

#![feature(generators)]

fn main() {
    let x = Some(22);
    let _a = match x {
        Some(y) if foo() => (),
        _ => (),
    };
}

fn foo() -> bool { 
    true
}

the desugaring of the match intentionally adds some borrows when executing the guard:

    bb3: {
        StorageLive(_6);                 // scope 1 at foo.rs:6:14: 6:15
        _6 = &((_1 as Some).0: i32);     // scope 1 at foo.rs:6:14: 6:15
        _4 = &shallow _1;                // scope 1 at foo.rs:5:20: 5:21
        StorageLive(_7);                 // scope 1 at foo.rs:6:20: 6:25
        _7 = foo() -> [return: bb4, unwind: bb8]; // scope 1 at foo.rs:6:20: 6:25
                                         // mir::Constant
                                         // + span: foo.rs:6:20: 6:23
                                         // + literal: Const { ty: fn() -> bool {foo}, val: Value(Scalar(<ZST>)) }
    }

    bb4: {
        switchInt(move _7) -> [false: bb6, otherwise: bb5]; // scope 1 at foo.rs:6:20: 6:25
    }

    bb5: {
        StorageDead(_7);                 // scope 1 at foo.rs:6:24: 6:25
        FakeRead(ForMatchGuard, _4);     // scope 1 at foo.rs:6:24: 6:25
        FakeRead(ForGuardBinding, _6);   // scope 1 at foo.rs:6:24: 6:25

The purposeof the FakeRead of _4and _6 is to ensure that the discriminant does not change while the guard executes -- i.e., that the guard doesn't (via unsafe code, say) mutate x to be None. But after optimizations those FakeReads are removed:


    bb0: {
        Deinit(_1);                      // scope 0 at foo.rs:4:13: 4:21
        ((_1 as Some).0: i32) = const 22_i32; // scope 0 at foo.rs:4:13: 4:21
        discriminant(_1) = 1;            // scope 0 at foo.rs:4:13: 4:21
        _4 = &((_1 as Some).0: i32);     // scope 1 at foo.rs:6:14: 6:15
        _5 = foo() -> bb1;               // scope 1 at foo.rs:6:20: 6:25
                                         // mir::Constant
                                         // + span: foo.rs:6:20: 6:23
                                         // + literal: Const { ty: fn() -> bool {foo}, val: Value(Scalar(<ZST>)) }
    }

    bb1: {
        switchInt(move _5) -> [false: bb3, otherwise: bb2]; // scope 1 at foo.rs:6:20: 6:25
    }

this means that foo() could trigger writes without causing UB. This seems bad!

UPDATE: This is overstating the case. It's ok for us to optimize FakeRead away, but tools like miri or some future sanitizers would still want to see them (for the reasons give above).

@nikomatsakis nikomatsakis added the C-bug Category: This is a bug. label May 16, 2022
@nikomatsakis
Copy link
Contributor Author

cc @RalfJung @rust-lang/wg-unsafe-code-guidelines and @eholk (with whom I found this)

@digama0
Copy link
Contributor

digama0 commented May 17, 2022

It should be fine for optimizations to remove UB, as long as Miri runs without said optimization. The UB will still occur in the AM version of the code, and Miri's ability to catch this UB depends on whether it runs before or after the optimization.

@RalfJung
Copy link
Member

Miri would (I think) currently ICE on FakeRead, so we probably are deleting them too early.

@nikomatsakis
Copy link
Contributor Author

After opening this issue, I was thinking that -- indeed -- I overstated the case, but indeed the point stands that a FakeRead is semantically meaningful. All of this (in my mind) points to the continued need for an "analysis MIR" that is used by miri, borrow check, and friends and where optimizations have not been done.

(I would also like to have some kind of sanitizer mode where we do UB detection, and in that mode, FakeRead wouldn't really be fake, but since that mode doesn't exist outside of miri, it's ok to optimize them away.)

@nikomatsakis nikomatsakis changed the title FakeRead should not be treated as dead code FakeRead is semantically meaningful to miri (but gets optimized away) May 17, 2022
@nikomatsakis
Copy link
Contributor Author

Updated title / OP.

@saethlin
Copy link
Member

The issue title makes it sound like you're trying to say that optimizations which remove FakeRead are buggy, but I don't think that's the case based on the other comments here? Then your latest comment makes it sound like this issue is actually a call for separating MIR for analysis from MIR for code generation. I guess I'm just confused because in an issue with only 5 comments it seems like the consensus on what change would close the issue has taken a huge turn.

Are you asking for a wholly separate IR? Or a compiler API to get an IR which can't be combined with optimizations? (because while Miri sets -Zmir-opt-level=0, I think a user could override that) Or are you suggesting that we should remove MIR optimizations which may hide UB?

@digama0
Copy link
Contributor

digama0 commented May 17, 2022

I think we should remove optimizations that hide UB at opt level 0. I assume that the only optimizations that run at opt level 0 are those that must run for the latter stages to not ICE, but ideally anything that hides UB should not be on that list.

@RalfJung
Copy link
Member

I think a user overriding the opt level in Miri is fine, as long as the user is aware that this might miss UB.

But indeed by default Miri should be run on MIR without any UB-removing optimizations?

@JakobDegen
Copy link
Contributor

So, I do agree that there is an issue here, but I don't think the issue is the one that is described. I for some reason forgot to include it in the MIR docs, but my working model of FakeRead has been that it is a nop at AM runtime. If that is the case, it is effectively just an analysis artifact and certainly sound to remove. However, this issue makes a good point - it seems like we want match exhaustiveness checking to be "sound" in the sense that in all match statements (in non-UB programs) accepted by the compiler, one of the arms must actually be taken. This leaves us with two problems:

  1. FakeRead does need to have an actual behavior at runtime, but it's not clear to me what that behavior should be. It certainly can't be an actual read, because then we might go reading into an UnsafeCell and Option<Mutex<u8>>::is_some would cause a data race if another thread currently holds the lock. What we actually want is to somehow assert that the reference is still valid at that point, without doing any reading. Possibly a regular old SB retag would suffice, maybe we need to insert a protector?

  2. As Ralf and I discovered on Zulip yesterday, even if we fix the above problem, shared references to enums are actually enough to change the active variant in some cases under current SB. This means that the above soundness theorem would be false even if FakeRead worked the way we wanted it to. This is proving to be somewhat hard to fix too. In any case, that should not necessarily block this issue, I did feel it was worth mentioning though.

@nikomatsakis
Copy link
Contributor Author

nikomatsakis commented Jun 6, 2022 via email

@RalfJung
Copy link
Member

RalfJung commented Jun 6, 2022

If so, it is meant to be equivalent to &x -- i.e., a logical read of all reachable content

&x would just be creating a reference; Stacked Borrows treats that as a logical read but that might change (for &mut and writes, this turned out to be problematic).

@JakobDegen
Copy link
Contributor

what should the official operational semantics of a FakeRead be?

Yes, that is what I am saying.

equivalent to &x -- i.e., a logical read of all reachable content [...] I'm not totally sure how stacked borrows manages that kind of thing at present

So, straying a little bit off-topic, but what I was getting at above is that under SB a "logical read" and a &x (ie a retag) are not the same, and actually cannot be, because of UnsafeCell. (&(foo: Mutex<u8>) cannot read into the Mutex). In any case though, I think the right idea for now is to make this equivalent to &*x (ie a reborrow). That will be enough to assert in SB (as it stands today) that the tag is still in the borrow stack, which should have all the desired semantics.


I would like to add an additional note that I mentioned on Zulip once but I think deserves a little more permanance: I mentioned above that we will likely want to prove a theorem that says that the match exhaustiveness checking that we do, together with the aliasing model and match guards that we insert, are enough to guarantee that all matches are indeed exhaustive. If we want to prove such a theorem though (and have our language semantics rely on it), that means we are now imposing lower bounds on the strength of the aliasing model.

In the conversation about allowing the 2PB future incompatibility lint, if I recall correctly it was decided that SB's inability to model it was non-blocking, because SB could always just do less strict alias analysis. However, if it is true that we require lower bounds on the strength of the aliasing model for the reasons described above, that means just doing less strict aliasing analysis is not always an option. I unfortunately cannot come up with a practical example here, and it may not be possible, but still I think this is a non-trivial concern.

@ChrisDenton ChrisDenton added the needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. label Jul 16, 2023
@fmease fmease added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-miri Area: The miri tool A-mir-opt Area: MIR optimizations T-opsem Relevant to the opsem team and removed needs-triage-legacy Old issue that were never triaged. Remove this label once the issue has been sufficiently triaged. labels Jan 26, 2024
@RalfJung
Copy link
Member

I am inclined to close this as being a part of rust-lang/unsafe-code-guidelines#399 -- it's less about FakeRead specifically and more about the op.sem of all the operations that match desugars to, and what we want them all to collectively ensure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-mir-opt Area: MIR optimizations A-miri Area: The miri tool C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

7 participants