From fd4b4cd769300cfde5d54865d227990b71b762d1 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 1 Nov 2018 12:24:08 +0100 Subject: [PATCH 01/23] RFC for an operator to take a raw reference --- text/0000-raw-reference-operator.md | 146 ++++++++++++++++++++++++++++ 1 file changed, 146 insertions(+) create mode 100644 text/0000-raw-reference-operator.md diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md new file mode 100644 index 00000000000..a74789d0e5a --- /dev/null +++ b/text/0000-raw-reference-operator.md @@ -0,0 +1,146 @@ +- Feature Name: raw_reference_operator +- Start Date: 2018-11-01 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Introduce a new primitive operator on the MIR level: `&[mut|const] raw ` +to create a raw pointer to the given place (this is not surface syntax, it is +just how MIR might be printed). Desugar the surface syntax `&[mut] as +*[mut|const] _` to use this operator, instead of two statements (first take +normal reference, then cast). + +# Motivation +[motivation]: #motivation + +Currently, if one wants to create a raw pointer pointing to something, one has +no choice but to create a reference and immediately cast it to a raw pointer. +The problem with this is that there are some invariants that we want to attach +to references, that have to *always hold*. (This is not finally decided yet, +but true in practice because of annotations we emit to LLVM. It is also the +next topic of discussion in the +[Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/).) +In particular, references must be aligned and dereferencable, even when they are +created and never used. + +One consequence of these rules is that it becomes essentially impossible to +create a raw pointer pointing to an unaligned struct field: `&packed.field as +*const _` creates an immediate unaligned reference, triggering undefined +behavior because it is not aligned. Similarly, `&(*raw).field as *const _` is +not just computing an offset of the raw pointer `raw`, it also asserts that the +intermediate shared reference is aligned and dereferencable. In both cases, +that is likely not what the author of the code intended. + +To fix this, we propose to introduce a new primitive operation on the MIR level +that, in a single statement, creates a raw pointer to a given place. No +intermediate reference exists, so no invariants have to be adhered to. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +When working with unaligned or potentially dangling pointers, it is crucial that +you always use raw pointers and not references: References come with guarantees +that the compiler assumes are always upheld, and these guarantees include proper +alignment and not being dangling. Importantly, these guarantees must be +maintained even when the reference is created and never used! The following is +UB (assuming `packed` is a variable of packed type): + +```rust +let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior +``` + +There is no situation in which the above code is correct, and hence it is a hard +error to write this. + +The only way to create a pointer to an unaligned or dangling location without +triggering undefined behavior is to *immediately* cast it to a raw pointer: + +```rust +let x = unsafe { &packed.field as *const _ }; +``` + +These two operations (taking a reference, casting to a raw pointer) are actually +considered a single operation happening in one step, and hence the invariants +incurred by references do not come into play. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +When translating HIR to MIR, we recognize `&[mut] as *[mut|const] _` as +a special pattern and turn it into a single MIR `Rvalue` that takes the address +and produces it as a raw pointer -- a "take raw reference" operation. This +might be a variant of the existing `Ref` operation (say, a boolean flag for +whether this is raw), or a new `Rvalue` variant. The borrow checker should do +the usual checks on ``, but can just ignore the result of this operation +and the newly created "reference" can have any lifetime. (Currently this will +be some form of unbounded inference variable because the only use is a +cast-to-raw, the new "raw reference" operation can have the same behavior.) +When translating MIR to LLVM, nothing special has to happen as references and +raw pointers have the same LLVM type anyway; the new operation behaves like +`Ref`. + +When interpreting MIR in the miri engine, the engine will recognize that the +value produced by this `Rvalue` has raw pointer type, and hence must not satisfy +any special invariants. + +When doing unsafety checking, we make references to packed fields that do *not* +use this new "raw reference" operation a *hard error even in unsafe blocks* +(after a transition period). There is no situation in which this code is okay; +it creates a reference that violates basic invariants. Taking a raw reference +to a packed field, on the other hand, is a safe operation as the raw pointer +comes with no special promises. "Unsafety checking" is thus not even a good +term for this, maybe it should be a special pass dedicated to packed fields +traversing MIR, or this can happen when lowering HIR to MIR. This check has +nothing to do with whether we are in an unsafe block or not. + +# Drawbacks +[drawbacks]: #drawbacks + +It might be surprising that the following two pieces of code are not equivalent: +```rust +// Variant 1 +let x = unsafe { &packed.field }; // Undefined behavior! +let x = x as *const _; +// Variant 2 +let x = unsafe { &packed.field as *const _ }; +``` + +# Rationale and alternatives +[rationale-and-alternatives]: #rationale-and-alternatives + +This is a compromise: I see no reasonable way to translate the first variant +shown in the "Drawbacks" section to a raw reference operation, and the second +variant is so common that we likely do not want to rule it out. Hence the +proposal to make them not equivalent. + +One alternative to introducing a new primitive operation might be to somehow +exempt "references immediately cast to a raw pointer" from the invariant. +However, it is unclear how that information is supposed to be encoded in the +MIR, and how it is to be maintained by optimizations. We believe that the +semantics of a MIR program, including whether it has undefined behavior, should +be deducible by executing it one step at a time. + +Instead of compiling `&[mut] as *[mut|const] _` to a raw reference +operation, we could introduce new surface syntax and keep the existing HIR->MIR +lowering the way it is. However, that would make lots of carefully written +existing code dealing with packed structs have undefined behavior. (There is +likely also lots of code that forgets to cast to a raw pointer, but I see no way +to make that legal -- and the proposal would make such uses a hard error in the +long term, so we should catch many of these bugs.) Also, no good proposal for a +surface syntax has been made yet -- and if one comes up later, this proposal is +forwards-compatible with also having explicit syntax for taking a raw reference +(and deprecating the safe-ref-then-cast way of writing this operation). + +# Prior art +[prior-art]: #prior-art + +I am not aware of another language with both comparatively strong invariants for +its reference types, and raw pointers. The need for taking a raw reference only +arise because of Rust having both of these features. + +# Unresolved questions +[unresolved-questions]: #unresolved-questions + +None I can think of. From e250bed7f1e31e9534f753e1533739c5331248a2 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 4 Nov 2018 13:06:54 +0100 Subject: [PATCH 02/23] there is some design space in terms of when and how we emit the new operation --- text/0000-raw-reference-operator.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index a74789d0e5a..4c25c8a1916 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -133,6 +133,14 @@ surface syntax has been made yet -- and if one comes up later, this proposal is forwards-compatible with also having explicit syntax for taking a raw reference (and deprecating the safe-ref-then-cast way of writing this operation). +We could be using the new operator in more cases, e.g. we could have some kind +of analysis which determines during desugaring if a reference is *only* used as +a raw pointer, and if yes, creates it as a raw pointer to begin with. This +would make more code use raw references, thus making more code defined. +However, if someone *relies* on this behavior there is a danger of accidentally +adding a non-raw-ptr use to a reference, which would then rather subtly make the +program have UB. + # Prior art [prior-art]: #prior-art @@ -143,4 +151,5 @@ arise because of Rust having both of these features. # Unresolved questions [unresolved-questions]: #unresolved-questions -None I can think of. +We could have different rules for when to take a raw reference (as opposed to a +safe one). From af78d46f92c15b195e900a16648a239fdd82061c Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 4 Nov 2018 13:21:30 +0100 Subject: [PATCH 03/23] more examples by ubsan --- text/0000-raw-reference-operator.md | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 4c25c8a1916..54a1789414c 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -9,7 +9,7 @@ Introduce a new primitive operator on the MIR level: `&[mut|const] raw ` to create a raw pointer to the given place (this is not surface syntax, it is just how MIR might be printed). Desugar the surface syntax `&[mut] as -*[mut|const] _` to use this operator, instead of two statements (first take +*[mut|const] _` to use this operator, instead of two MIR statements (first take normal reference, then cast). # Motivation @@ -34,7 +34,7 @@ intermediate shared reference is aligned and dereferencable. In both cases, that is likely not what the author of the code intended. To fix this, we propose to introduce a new primitive operation on the MIR level -that, in a single statement, creates a raw pointer to a given place. No +that, in a single MIR statement, creates a raw pointer to a given place. No intermediate reference exists, so no invariants have to be adhered to. # Guide-level explanation @@ -52,7 +52,23 @@ let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior ``` There is no situation in which the above code is correct, and hence it is a hard -error to write this. +error to write this. This applies to most ways of creating a reference, i.e., +all of the following are UB if `X` is not properly aligned and dereferencable: + +```rust +fn foo() -> &T { + &X +} + +fn bar(x: &T) {} +bar(&X); // this is UB at the call site, not in `bar` + +let &x = &X; // this is actually dereferencing the pointer, certainly UB +let _ = &X; // throwing away the value immediately changes nothing +&X; // different syntax for the same thing + +let x = &X as &T as *const T; // this is casting to raw "too late" +``` The only way to create a pointer to an unaligned or dangling location without triggering undefined behavior is to *immediately* cast it to a raw pointer: @@ -153,3 +169,9 @@ arise because of Rust having both of these features. We could have different rules for when to take a raw reference (as opposed to a safe one). + +What do we specify inference to do for cases like +```rust +let x: *const T = {&*null}; +``` +Does this take a raw reference, or a safe reference? From 1a62d8249e359f4189a5584d29c0869e1dd77afe Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Mon, 5 Nov 2018 11:07:00 +0100 Subject: [PATCH 04/23] coercions to raw ptrs also trigger the new operation --- text/0000-raw-reference-operator.md | 35 ++++++++++++++++------------- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 54a1789414c..6cb00742050 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -71,31 +71,36 @@ let x = &X as &T as *const T; // this is casting to raw "too late" ``` The only way to create a pointer to an unaligned or dangling location without -triggering undefined behavior is to *immediately* cast it to a raw pointer: +triggering undefined behavior is to *immediately* turn it into a raw pointer. +All of the following are valid: ```rust let x = unsafe { &packed.field as *const _ }; +let y: *const _ = unsafe { &packed.field }; ``` -These two operations (taking a reference, casting to a raw pointer) are actually -considered a single operation happening in one step, and hence the invariants -incurred by references do not come into play. +The intention is to cover all cases where a reference, just created, is +immediately explicitly used as a value of raw pointer type. + +These two operations (taking a reference, casting/coercing to a raw pointer) are +actually considered a single operation happening in one step, and hence the +invariants incurred by references do not come into play. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation When translating HIR to MIR, we recognize `&[mut] as *[mut|const] _` as -a special pattern and turn it into a single MIR `Rvalue` that takes the address -and produces it as a raw pointer -- a "take raw reference" operation. This -might be a variant of the existing `Ref` operation (say, a boolean flag for -whether this is raw), or a new `Rvalue` variant. The borrow checker should do -the usual checks on ``, but can just ignore the result of this operation -and the newly created "reference" can have any lifetime. (Currently this will -be some form of unbounded inference variable because the only use is a -cast-to-raw, the new "raw reference" operation can have the same behavior.) -When translating MIR to LLVM, nothing special has to happen as references and -raw pointers have the same LLVM type anyway; the new operation behaves like -`Ref`. +well as coercions from `&[mut] ` to a raw pointer type as a special +pattern and turn them into a single MIR `Rvalue` that takes the address and +produces it as a raw pointer -- a "take raw reference" operation. This might be +a variant of the existing `Ref` operation (say, a boolean flag for whether this +is raw), or a new `Rvalue` variant. The borrow checker should do the usual +checks on ``, but can just ignore the result of this operation and the +newly created "reference" can have any lifetime. (Currently this will be some +form of unbounded inference variable because the only use is a cast-to-raw, the +new "raw reference" operation can have the same behavior.) When translating MIR +to LLVM, nothing special has to happen as references and raw pointers have the +same LLVM type anyway; the new operation behaves like `Ref`. When interpreting MIR in the miri engine, the engine will recognize that the value produced by this `Rvalue` has raw pointer type, and hence must not satisfy From 879fa5caaac7240a227715227faff0b2606a7109 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 20 Nov 2018 15:40:21 +0100 Subject: [PATCH 05/23] make a lint part of this RFC, clarify when we use a raw ref --- text/0000-raw-reference-operator.md | 73 ++++++++++++++++------------- 1 file changed, 40 insertions(+), 33 deletions(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 6cb00742050..fe98abb59e3 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -35,7 +35,9 @@ that is likely not what the author of the code intended. To fix this, we propose to introduce a new primitive operation on the MIR level that, in a single MIR statement, creates a raw pointer to a given place. No -intermediate reference exists, so no invariants have to be adhered to. +intermediate reference exists, so no invariants have to be adhered to. We also +add a lint for cases that seem like the programmer wanted a raw reference, not a +safe one, but did not use the right syntax. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation @@ -75,8 +77,10 @@ triggering undefined behavior is to *immediately* turn it into a raw pointer. All of the following are valid: ```rust -let x = unsafe { &packed.field as *const _ }; -let y: *const _ = unsafe { &packed.field }; +let packed_cast = unsafe { &packed.field as *const _ }; +let packed_coercion: *const _ = unsafe { &packed.field }; +let null_cast: *const _ = unsafe { &*ptr::null() } as *const _; +let null_coercion: *const _ = unsafe { &*ptr::null() }; ``` The intention is to cover all cases where a reference, just created, is @@ -89,18 +93,19 @@ invariants incurred by references do not come into play. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -When translating HIR to MIR, we recognize `&[mut] as *[mut|const] _` as -well as coercions from `&[mut] ` to a raw pointer type as a special -pattern and turn them into a single MIR `Rvalue` that takes the address and -produces it as a raw pointer -- a "take raw reference" operation. This might be -a variant of the existing `Ref` operation (say, a boolean flag for whether this -is raw), or a new `Rvalue` variant. The borrow checker should do the usual -checks on ``, but can just ignore the result of this operation and the -newly created "reference" can have any lifetime. (Currently this will be some -form of unbounded inference variable because the only use is a cast-to-raw, the -new "raw reference" operation can have the same behavior.) When translating MIR -to LLVM, nothing special has to happen as references and raw pointers have the -same LLVM type anyway; the new operation behaves like `Ref`. +When translating HIR to MIR, we recognize `&[mut] as *[mut|const] ?T` +(where `?T` can be any type, also a partial one like `_`) as well as coercions +from `&[mut] ` to a raw pointer type as a special pattern and turn them +into a single MIR `Rvalue` that takes the address and produces it as a raw +pointer -- a "take raw reference" operation. This might be a variant of the +existing `Ref` operation (say, a boolean flag for whether this is raw), or a new +`Rvalue` variant. The borrow checker should do the usual checks on ``, +but can just ignore the result of this operation and the newly created +"reference" can have any lifetime. (Currently this will be some form of +unbounded inference variable because the only use is a cast-to-raw, the new "raw +reference" operation can have the same behavior.) When translating MIR to LLVM, +nothing special has to happen as references and raw pointers have the same LLVM +type anyway; the new operation behaves like `Ref`. When interpreting MIR in the miri engine, the engine will recognize that the value produced by this `Rvalue` has raw pointer type, and hence must not satisfy @@ -116,6 +121,14 @@ term for this, maybe it should be a special pass dedicated to packed fields traversing MIR, or this can happen when lowering HIR to MIR. This check has nothing to do with whether we are in an unsafe block or not. +Moreover, to prevent programmers from accidentally creating a safe reference +when they did not want to, we add a lint that identifies situations where the +programmer likely wants a raw reference, and suggest an explicit cast in that +case. One possible heuristic here would be: If a safe reference (shared or +mutable) is only ever used to create raw pointers, then likely it could be a raw +pointer to begin with. The details of this are best worked out in the +implementation phase of this RFC. + # Drawbacks [drawbacks]: #drawbacks @@ -138,12 +151,12 @@ proposal to make them not equivalent. One alternative to introducing a new primitive operation might be to somehow exempt "references immediately cast to a raw pointer" from the invariant. -However, it is unclear how that information is supposed to be encoded in the -MIR, and how it is to be maintained by optimizations. We believe that the -semantics of a MIR program, including whether it has undefined behavior, should -be deducible by executing it one step at a time. +However, we believe that the semantics of a MIR program, including whether it +has undefined behavior, should be deducible by executing it one step at a time. +Given that, it is unclear how a semantics that "lazily" checks references should +work, and how it could be compatible with the annotations we emit for LLVM. -Instead of compiling `&[mut] as *[mut|const] _` to a raw reference +Instead of compiling `&[mut] as *[mut|const] ?T` to a raw reference operation, we could introduce new surface syntax and keep the existing HIR->MIR lowering the way it is. However, that would make lots of carefully written existing code dealing with packed structs have undefined behavior. (There is @@ -154,13 +167,13 @@ surface syntax has been made yet -- and if one comes up later, this proposal is forwards-compatible with also having explicit syntax for taking a raw reference (and deprecating the safe-ref-then-cast way of writing this operation). -We could be using the new operator in more cases, e.g. we could have some kind -of analysis which determines during desugaring if a reference is *only* used as -a raw pointer, and if yes, creates it as a raw pointer to begin with. This -would make more code use raw references, thus making more code defined. -However, if someone *relies* on this behavior there is a danger of accidentally -adding a non-raw-ptr use to a reference, which would then rather subtly make the -program have UB. +We could be using the new operator in more cases, e.g. instead of having a smart +lint that tells people to insert casts, we could use that same analysis to infer +when to use a raw reference. This would make more code use raw references, thus +making more code defined. However, if someone *relies* on this behavior there +is a danger of accidentally adding a non-raw-ptr use to a reference, which would +then rather subtly make the program have UB. That's why we proposed this as a +lint instead. # Prior art [prior-art]: #prior-art @@ -174,9 +187,3 @@ arise because of Rust having both of these features. We could have different rules for when to take a raw reference (as opposed to a safe one). - -What do we specify inference to do for cases like -```rust -let x: *const T = {&*null}; -``` -Does this take a raw reference, or a safe reference? From 0b496eeb9d5c4a1e9efc29683af0c4c8c693a00e Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sun, 9 Dec 2018 20:39:18 +0100 Subject: [PATCH 06/23] expand description: this new operation entirely replaces ref-to-raw casts --- text/0000-raw-reference-operator.md | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index fe98abb59e3..041c6fbc849 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -97,15 +97,20 @@ When translating HIR to MIR, we recognize `&[mut] as *[mut|const] ?T` (where `?T` can be any type, also a partial one like `_`) as well as coercions from `&[mut] ` to a raw pointer type as a special pattern and turn them into a single MIR `Rvalue` that takes the address and produces it as a raw -pointer -- a "take raw reference" operation. This might be a variant of the -existing `Ref` operation (say, a boolean flag for whether this is raw), or a new -`Rvalue` variant. The borrow checker should do the usual checks on ``, -but can just ignore the result of this operation and the newly created -"reference" can have any lifetime. (Currently this will be some form of -unbounded inference variable because the only use is a cast-to-raw, the new "raw -reference" operation can have the same behavior.) When translating MIR to LLVM, -nothing special has to happen as references and raw pointers have the same LLVM -type anyway; the new operation behaves like `Ref`. +pointer -- a "take raw reference" operation. We also use this new `Rvalue` to +translate `x as *[mut|const] ?T`; before this RFC such code gets translated to +MIR as a reborrow followed by a cast. Once this is done, `Misc` casts from +reference to raw pointers can be removed from MIR, they are no longer needed. + +This new `Rvalue` might be a variant of the existing `Ref` operation (say, a +boolean flag for whether this is raw), or a new `Rvalue` variant. The borrow +checker should do the usual checks on ``, but can just ignore the result +of this operation and the newly created "reference" can have any lifetime. +(Before this RFC this will be some form of unbounded inference variable because +the only use is a cast-to-raw, the new "raw reference" operation can have the +same behavior.) When translating MIR to LLVM, nothing special has to happen as +references and raw pointers have the same LLVM type anyway; the new operation +behaves like `Ref`. When interpreting MIR in the miri engine, the engine will recognize that the value produced by this `Rvalue` has raw pointer type, and hence must not satisfy From 0b9df7aa390265727bf9212fa7ca1d9960820273 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 22 Dec 2018 12:38:16 +0100 Subject: [PATCH 07/23] dereferencability --- text/0000-raw-reference-operator.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 041c6fbc849..901bae52826 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -192,3 +192,9 @@ arise because of Rust having both of these features. We could have different rules for when to take a raw reference (as opposed to a safe one). + +Does the operator creating a raw pointer allow creating pointers that are not +dereferencable (with the size determined by `mem::size_of_val`)? It might turn +out to be useful to make dereferencability not part of the validity invariant, +but part of the alias model, so this is a separate question from whether the +pointer is aligned and non-NULL. From 61c2e8236ed77e46d849701589eddd8ad2dfd217 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 22 Dec 2018 12:42:07 +0100 Subject: [PATCH 08/23] more on dereferencability --- text/0000-raw-reference-operator.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 901bae52826..8b1118d9bf1 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -197,4 +197,7 @@ Does the operator creating a raw pointer allow creating pointers that are not dereferencable (with the size determined by `mem::size_of_val`)? It might turn out to be useful to make dereferencability not part of the validity invariant, but part of the alias model, so this is a separate question from whether the -pointer is aligned and non-NULL. +pointer is aligned and non-NULL. Notice that we use `getelementptr inbounds` +for field access, so we would require some amount of dereferencability anyway +(or we could change codegen to not emit `inbounds` when creating a raw +reference, but that might adversely affect performance). From 5f7e9eaf263df5f7d98be9a16d5879742405715e Mon Sep 17 00:00:00 2001 From: haslersn Date: Sun, 6 Jan 2019 17:28:36 +0100 Subject: [PATCH 09/23] RFC 2582: Fix typo (must not -> need not) --- text/0000-raw-reference-operator.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 8b1118d9bf1..93d03855188 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -113,8 +113,8 @@ references and raw pointers have the same LLVM type anyway; the new operation behaves like `Ref`. When interpreting MIR in the miri engine, the engine will recognize that the -value produced by this `Rvalue` has raw pointer type, and hence must not satisfy -any special invariants. +value produced by this `Rvalue` has raw pointer type, and hence needs not +satisfy any special invariants. When doing unsafety checking, we make references to packed fields that do *not* use this new "raw reference" operation a *hard error even in unsafe blocks* From 35b7810a4a09a05b48aceb2ee657fad2fde9a570 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Tue, 22 Jan 2019 22:05:40 +0100 Subject: [PATCH 10/23] update summary and be more explicit --- text/0000-raw-reference-operator.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 93d03855188..4f3a1e6c649 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -9,8 +9,9 @@ Introduce a new primitive operator on the MIR level: `&[mut|const] raw ` to create a raw pointer to the given place (this is not surface syntax, it is just how MIR might be printed). Desugar the surface syntax `&[mut] as -*[mut|const] _` to use this operator, instead of two MIR statements (first take -normal reference, then cast). +*[mut|const] _` as well as coercions from references to raw pointers to use this +operator, instead of two MIR statements (first take normal reference, then +cast). # Motivation [motivation]: #motivation @@ -73,8 +74,8 @@ let x = &X as &T as *const T; // this is casting to raw "too late" ``` The only way to create a pointer to an unaligned or dangling location without -triggering undefined behavior is to *immediately* turn it into a raw pointer. -All of the following are valid: +triggering undefined behavior is to *immediately* turn it into a raw pointer +using an explicit cast or an implicit coercion. All of the following are valid: ```rust let packed_cast = unsafe { &packed.field as *const _ }; @@ -112,7 +113,7 @@ same behavior.) When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like `Ref`. -When interpreting MIR in the miri engine, the engine will recognize that the +When interpreting MIR in the Miri engine, the engine will recognize that the value produced by this `Rvalue` has raw pointer type, and hence needs not satisfy any special invariants. From 9a853d54c304e72612d51b763c9fbc79195550c3 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Wed, 23 Jan 2019 13:50:50 +0100 Subject: [PATCH 11/23] make example more concrete --- text/0000-raw-reference-operator.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-operator.md index 4f3a1e6c649..5264e481d26 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-operator.md @@ -48,9 +48,15 @@ you always use raw pointers and not references: References come with guarantees that the compiler assumes are always upheld, and these guarantees include proper alignment and not being dangling. Importantly, these guarantees must be maintained even when the reference is created and never used! The following is -UB (assuming `packed` is a variable of packed type): +UB: ```rust +#[repr(packed)] +struct Packed { + pad: u8, + field: u16, +} +let packed = Packed { pad: 0, field: 0 }; let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior ``` From 6cdac4d42527622f8450784170981b1b64303db2 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 24 Jan 2019 08:51:24 +0100 Subject: [PATCH 12/23] rename to raw_reference_mir_operator --- ...reference-operator.md => 0000-raw-reference-mir-operator.md} | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename text/{0000-raw-reference-operator.md => 0000-raw-reference-mir-operator.md} (99%) diff --git a/text/0000-raw-reference-operator.md b/text/0000-raw-reference-mir-operator.md similarity index 99% rename from text/0000-raw-reference-operator.md rename to text/0000-raw-reference-mir-operator.md index 5264e481d26..d68941e847a 100644 --- a/text/0000-raw-reference-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -1,4 +1,4 @@ -- Feature Name: raw_reference_operator +- Feature Name: raw_reference_mir_operator - Start Date: 2018-11-01 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) From 4b60fb1ae49a6323ab6b837378bf0acf895155f6 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 26 Jan 2019 10:46:21 +0100 Subject: [PATCH 13/23] interaction with auto-deref --- text/0000-raw-reference-mir-operator.md | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index d68941e847a..04df7a04c62 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -97,6 +97,10 @@ These two operations (taking a reference, casting/coercing to a raw pointer) are actually considered a single operation happening in one step, and hence the invariants incurred by references do not come into play. +Notice that this only applies if no automatic call to `deref` or `deref_mut` got +inserted: those are regular function calls taking a reference, so in that case a +reference is created and it must satisfy the usual guarantees. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -104,10 +108,12 @@ When translating HIR to MIR, we recognize `&[mut] as *[mut|const] ?T` (where `?T` can be any type, also a partial one like `_`) as well as coercions from `&[mut] ` to a raw pointer type as a special pattern and turn them into a single MIR `Rvalue` that takes the address and produces it as a raw -pointer -- a "take raw reference" operation. We also use this new `Rvalue` to -translate `x as *[mut|const] ?T`; before this RFC such code gets translated to -MIR as a reborrow followed by a cast. Once this is done, `Misc` casts from -reference to raw pointers can be removed from MIR, they are no longer needed. +pointer -- a "take raw reference" operation. We do this *after* auto-deref, +meaning this pattern does not apply when a call to `deref` or `deref_mut` got +inserted. We also use this new `Rvalue` to translate `x as *[mut|const] ?T`; +before this RFC such code gets translated to MIR as a reborrow followed by a +cast. Once this is done, `Misc` casts from reference to raw pointers can be +removed from MIR, they are no longer needed. This new `Rvalue` might be a variant of the existing `Ref` operation (say, a boolean flag for whether this is raw), or a new `Rvalue` variant. The borrow @@ -208,3 +214,9 @@ pointer is aligned and non-NULL. Notice that we use `getelementptr inbounds` for field access, so we would require some amount of dereferencability anyway (or we could change codegen to not emit `inbounds` when creating a raw reference, but that might adversely affect performance). + +The interaction with auto-deref is a bit unfortunate. Maybe we can have a lint +to detect what seems to be unwanted cases of auto-deref -- namely, terms that +look like `&[mut] as *[mut|const] ?T` in the surface syntax but had a +method call inserted, thus manifesting a reference (with the associated +guarantees) where none might be expected. From 315bb3147a9dec630c7e4c189f04e94fa2818b1f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 21 Feb 2019 11:39:24 +0100 Subject: [PATCH 14/23] propose we might have dedicated syntax --- text/0000-raw-reference-mir-operator.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 04df7a04c62..af57b09b77d 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -203,6 +203,12 @@ arise because of Rust having both of these features. # Unresolved questions [unresolved-questions]: #unresolved-questions +It seems desirable to have a dedicated syntax for this, and to eventually lint +people to use that syntax), considering the special case introduced by this RFC +to be a deprecated pattern. This requires some work on figuring out an +unambiguous and meaningful syntax. A tentative proposal is `&raw const ` +to create a `*const _`, and `&raw mut ` to create a `*mut _`. + We could have different rules for when to take a raw reference (as opposed to a safe one). From cb2dd0fab8c7bbf4f89c489b3d69a00d805a1bba Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 21 Feb 2019 12:51:44 +0100 Subject: [PATCH 15/23] mention more drawbacks, unresolved questions and future possibilities --- text/0000-raw-reference-mir-operator.md | 45 +++++++++++++++++++------ 1 file changed, 35 insertions(+), 10 deletions(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index af57b09b77d..ede2bf8d2a4 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -110,10 +110,14 @@ from `&[mut] ` to a raw pointer type as a special pattern and turn them into a single MIR `Rvalue` that takes the address and produces it as a raw pointer -- a "take raw reference" operation. We do this *after* auto-deref, meaning this pattern does not apply when a call to `deref` or `deref_mut` got -inserted. We also use this new `Rvalue` to translate `x as *[mut|const] ?T`; -before this RFC such code gets translated to MIR as a reborrow followed by a -cast. Once this is done, `Misc` casts from reference to raw pointers can be -removed from MIR, they are no longer needed. +inserted. Redundant parentheses are ignored, but block expressions are not: +`{ &[mut] }` materializes a reference that must be valid, no matter +which coercions or casts follow outside the block. + +We also use this new `Rvalue` to translate `x as *[mut|const] ?T`; before this +RFC such code gets translated to MIR as a reborrow followed by a cast. Once +this is done, `Misc` casts from reference to raw pointers can be removed from +MIR, they are no longer needed. This new `Rvalue` might be a variant of the existing `Ref` operation (say, a boolean flag for whether this is raw), or a new `Rvalue` variant. The borrow @@ -145,7 +149,8 @@ programmer likely wants a raw reference, and suggest an explicit cast in that case. One possible heuristic here would be: If a safe reference (shared or mutable) is only ever used to create raw pointers, then likely it could be a raw pointer to begin with. The details of this are best worked out in the -implementation phase of this RFC. +implementation phase of this RFC. The lint should, at the very least, fire for +cases involving just a redundant block, such as `{ &mut } as *mut ?T`. # Drawbacks [drawbacks]: #drawbacks @@ -159,6 +164,11 @@ let x = x as *const _; let x = unsafe { &packed.field as *const _ }; ``` +If `as` ever becomes an operation that can be overloaded, the behavior of +`&packed.field as *const _` can *not* be obtained by dispatching to the +overloaded `as` operator. Calling that method would assert validity of the +reference. + # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives @@ -203,11 +213,9 @@ arise because of Rust having both of these features. # Unresolved questions [unresolved-questions]: #unresolved-questions -It seems desirable to have a dedicated syntax for this, and to eventually lint -people to use that syntax), considering the special case introduced by this RFC -to be a deprecated pattern. This requires some work on figuring out an -unambiguous and meaningful syntax. A tentative proposal is `&raw const ` -to create a `*const _`, and `&raw mut ` to create a `*mut _`. +Which level should the lint default to? Eventually it should likely be `deny` +because it hints as code that might be UB and that can be fixed by a small +change. For a transition period, `warn` seems more appropriate. We could have different rules for when to take a raw reference (as opposed to a safe one). @@ -226,3 +234,20 @@ to detect what seems to be unwanted cases of auto-deref -- namely, terms that look like `&[mut] as *[mut|const] ?T` in the surface syntax but had a method call inserted, thus manifesting a reference (with the associated guarantees) where none might be expected. + +# Future possibilities +[future-possibilities]: #future-possibilities + +It seems desirable to have a dedicated syntax for this, and to eventually lint +people to use that syntax), considering the special case introduced by this RFC +to be a deprecated pattern. This requires some work on figuring out an +unambiguous and meaningful syntax. A tentative proposal is `&raw const ` +to create a `*const _`, and `&raw mut ` to create a `*mut _`. + +If Rust's type ascriptions end up performing coercions, those coercions should +trigger the raw reference operator just like other coercions do. So +`&packed.field: *const _` would be `&raw const packed.field`. + +If Rust ever gets type ascriptions with coercions for binders, likewise these +coercions would be subject to these rules in cases like +`match &packed.field { x: *const _ => x }`. From 1f1e6909ad6a632446e4a57bdc705ca2b134b11d Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 21 Mar 2019 19:50:02 +0100 Subject: [PATCH 16/23] half-rewrite RFC to center around a proposed syntax for raw references --- text/0000-raw-reference-mir-operator.md | 285 ++++++++++-------------- 1 file changed, 118 insertions(+), 167 deletions(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index ede2bf8d2a4..8a3c407a8b5 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -6,49 +6,43 @@ # Summary [summary]: #summary -Introduce a new primitive operator on the MIR level: `&[mut|const] raw ` -to create a raw pointer to the given place (this is not surface syntax, it is -just how MIR might be printed). Desugar the surface syntax `&[mut] as -*[mut|const] _` as well as coercions from references to raw pointers to use this -operator, instead of two MIR statements (first take normal reference, then -cast). +Introduce new variants of the `&` operator: `&raw mut ` to create a `*mut `, and `&raw const ` to create a `*const `. +This creates a raw pointer directly, as opposed to the already existing `&mut as *mut _`/`& as *const _`, which create a temporary reference and then cast that to a raw pointer. +As a consequence, the existing expressions ` as *mut ` and ` as *const ` where `` has reference type are equivalent to `&raw mut *` and `&raw const *`, respectively. +Moreover, add a lint to existing code that could use the new operator, and treat existing code that creates a reference and immediately casts or coerces it to a raw pointer as if it had been written with the new syntax. + +As an option (referred to as [SUGAR] below), we could treat `&mut as *mut _`/`& as *const _` as if they had been written with `&raw` to avoid creating temporary references when that was likely not the intention. # Motivation [motivation]: #motivation -Currently, if one wants to create a raw pointer pointing to something, one has -no choice but to create a reference and immediately cast it to a raw pointer. -The problem with this is that there are some invariants that we want to attach -to references, that have to *always hold*. (This is not finally decided yet, -but true in practice because of annotations we emit to LLVM. It is also the -next topic of discussion in the -[Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/).) -In particular, references must be aligned and dereferencable, even when they are -created and never used. - -One consequence of these rules is that it becomes essentially impossible to -create a raw pointer pointing to an unaligned struct field: `&packed.field as -*const _` creates an immediate unaligned reference, triggering undefined -behavior because it is not aligned. Similarly, `&(*raw).field as *const _` is -not just computing an offset of the raw pointer `raw`, it also asserts that the -intermediate shared reference is aligned and dereferencable. In both cases, -that is likely not what the author of the code intended. - -To fix this, we propose to introduce a new primitive operation on the MIR level -that, in a single MIR statement, creates a raw pointer to a given place. No -intermediate reference exists, so no invariants have to be adhered to. We also -add a lint for cases that seem like the programmer wanted a raw reference, not a -safe one, but did not use the right syntax. +Currently, if one wants to create a raw pointer pointing to something, one has no choice but to create a reference and immediately cast it to a raw pointer. +The problem with this is that there are some invariants that we want to attach to references, that have to *always hold*. +The details of this are not finally decided yet, but true in practice because of annotations we emit to LLVM. +It is also the next topic of discussion in the [Unsafe Code Guidelines](https://github.com/rust-rfcs/unsafe-code-guidelines/). +In particular, references must be aligned and dereferencable, even when they are created and never used. + +One consequence of these rules is that it becomes essentially impossible to create a raw pointer pointing to an unaligned struct field: +`&packed.field as *const _` creates an intermediate unaligned reference, triggering undefined behavior because it is not aligned. + +If one wants to avoid creating a reference to uninitialized data (which might or might not become part of the invariant that must be always upheld), it is also currently not possible to create a raw pointer to a field of an uninitialized struct: +again, `&mut uninit.field as *mut _` would create an intermediate reference to uninitialized data. + +Another issue people sometimes run into is computing the address/offset of a field without asserting that there is any memory allocated there. +This actually has two problems; first of all creating a reference asserts that the memory it points to is allocated, and secondly the offset computation is performed using `getelementptr inbounds`, meaning that the result of the computation is `poison` if it is not in-bounds of the allocation it started in. +This RFC just solves the first problem, but it also provides an avenue for the second (see "Future possibilities"). + +To avoid making too many assumptions by creating a reference, this RFC proposes to introduce a new primitive operation that directly creates a raw pointer to a given place. +No intermediate reference exists, so no invariants have to be adhered to: the pointer may be unaligned and/or dangling. +We also add a lint for cases that seem like the programmer unnecessarily created an intermediate reference, suggesting they reduce the assumptions their code is making by creating a raw pointer instead. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -When working with unaligned or potentially dangling pointers, it is crucial that -you always use raw pointers and not references: References come with guarantees -that the compiler assumes are always upheld, and these guarantees include proper -alignment and not being dangling. Importantly, these guarantees must be -maintained even when the reference is created and never used! The following is -UB: +When working with unaligned or potentially dangling pointers, it is crucial that you always use raw pointers and not references: +references come with guarantees that the compiler assumes are always upheld, and these guarantees include proper alignment and not being dangling. +Importantly, these guarantees must be maintained even when the reference is created and never used! +The following is UB: ```rust #[repr(packed)] @@ -60,9 +54,8 @@ let packed = Packed { pad: 0, field: 0 }; let x = unsafe { &packed.field }; // `x` is not aligned -> undefined behavior ``` -There is no situation in which the above code is correct, and hence it is a hard -error to write this. This applies to most ways of creating a reference, i.e., -all of the following are UB if `X` is not properly aligned and dereferencable: +There is no situation in which the above code is correct, and hence it is a hard error to write this (after a transition period). +This applies to most ways of creating a reference, i.e., all of the following are UB if `X` is not properly aligned and dereferencable: ```rust fn foo() -> &T { @@ -76,86 +69,62 @@ let &x = &X; // this is actually dereferencing the pointer, certainly UB let _ = &X; // throwing away the value immediately changes nothing &X; // different syntax for the same thing -let x = &X as &T as *const T; // this is casting to raw "too late" +let x = &X as *const T; // this is casting to raw but "too late", an intermediate reference has been created (only if we do no adapt [SUGAR]) +let x = &X as &T as *const T; // this is casting to raw but "too late" even if we adapt [SUGAR] +``` + +The only way to create a pointer to an unaligned or dangling location without triggering undefined behavior is to use `&raw`, which creates a raw pointer without an intermediate reference. +The following is valid: + +```rust +let packed_cast = &raw const packed.field; ``` -The only way to create a pointer to an unaligned or dangling location without -triggering undefined behavior is to *immediately* turn it into a raw pointer -using an explicit cast or an implicit coercion. All of the following are valid: +As an optional extension ([SUGAR]) to keep existing code working and to provide a way for projects to adjust to these rules before the syntax bikeshed is finished, and to do so in a way that they do not have to drop support for old Rust versions, we could also treat all of the following as if they had been written using `&raw const` instead of `&`: ```rust -let packed_cast = unsafe { &packed.field as *const _ }; +let packed_cast = unsafe { &raw const packed.field as *const _ }; let packed_coercion: *const _ = unsafe { &packed.field }; let null_cast: *const _ = unsafe { &*ptr::null() } as *const _; let null_coercion: *const _ = unsafe { &*ptr::null() }; ``` -The intention is to cover all cases where a reference, just created, is -immediately explicitly used as a value of raw pointer type. +The intention is to cover all cases where a reference, just created, is immediately explicitly used as a value of raw pointer type. -These two operations (taking a reference, casting/coercing to a raw pointer) are -actually considered a single operation happening in one step, and hence the -invariants incurred by references do not come into play. +Notice that this only applies if no automatic call to `deref` or `deref_mut` got inserted: +those are regular function calls taking a reference, so in that case a reference is created and it must satisfy the usual guarantees. -Notice that this only applies if no automatic call to `deref` or `deref_mut` got -inserted: those are regular function calls taking a reference, so in that case a -reference is created and it must satisfy the usual guarantees. # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -When translating HIR to MIR, we recognize `&[mut] as *[mut|const] ?T` -(where `?T` can be any type, also a partial one like `_`) as well as coercions -from `&[mut] ` to a raw pointer type as a special pattern and turn them -into a single MIR `Rvalue` that takes the address and produces it as a raw -pointer -- a "take raw reference" operation. We do this *after* auto-deref, -meaning this pattern does not apply when a call to `deref` or `deref_mut` got -inserted. Redundant parentheses are ignored, but block expressions are not: -`{ &[mut] }` materializes a reference that must be valid, no matter -which coercions or casts follow outside the block. - -We also use this new `Rvalue` to translate `x as *[mut|const] ?T`; before this -RFC such code gets translated to MIR as a reborrow followed by a cast. Once -this is done, `Misc` casts from reference to raw pointers can be removed from -MIR, they are no longer needed. - -This new `Rvalue` might be a variant of the existing `Ref` operation (say, a -boolean flag for whether this is raw), or a new `Rvalue` variant. The borrow -checker should do the usual checks on ``, but can just ignore the result -of this operation and the newly created "reference" can have any lifetime. -(Before this RFC this will be some form of unbounded inference variable because -the only use is a cast-to-raw, the new "raw reference" operation can have the -same behavior.) When translating MIR to LLVM, nothing special has to happen as -references and raw pointers have the same LLVM type anyway; the new operation -behaves like `Ref`. - -When interpreting MIR in the Miri engine, the engine will recognize that the -value produced by this `Rvalue` has raw pointer type, and hence needs not -satisfy any special invariants. - -When doing unsafety checking, we make references to packed fields that do *not* -use this new "raw reference" operation a *hard error even in unsafe blocks* -(after a transition period). There is no situation in which this code is okay; -it creates a reference that violates basic invariants. Taking a raw reference -to a packed field, on the other hand, is a safe operation as the raw pointer -comes with no special promises. "Unsafety checking" is thus not even a good -term for this, maybe it should be a special pass dedicated to packed fields -traversing MIR, or this can happen when lowering HIR to MIR. This check has -nothing to do with whether we are in an unsafe block or not. - -Moreover, to prevent programmers from accidentally creating a safe reference -when they did not want to, we add a lint that identifies situations where the -programmer likely wants a raw reference, and suggest an explicit cast in that -case. One possible heuristic here would be: If a safe reference (shared or -mutable) is only ever used to create raw pointers, then likely it could be a raw -pointer to begin with. The details of this are best worked out in the -implementation phase of this RFC. The lint should, at the very least, fire for -cases involving just a redundant block, such as `{ &mut } as *mut ?T`. +Rust contains two operators that perform place-to-rvalue conversion (matching `&` in C): one to create a reference (with some given mutability) and one to create a raw pointer (with some given mutability). +In the MIR, this is reflected as either a distinct `Rvalue` or a flag on the existing `Ref` variant. +The borrow checker should do the usual checks on the place used in `&raw`, but can just ignore the result of this operation and the newly created "reference" can have any lifetime. +When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like `Ref`. +When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by `&raw`. + +For the [SUGAR] option, when translating HIR to MIR, we recognize `&[mut] as *[mut|const] ?T` (where `?T` can be any type, also a partial one like `_`) as well as coercions from `&[mut] ` to a raw pointer type as a special pattern and treat them as if they would have been written `&raw [mut|const] `. +We do this *after* auto-deref, meaning this pattern does not apply when a call to `deref` or `deref_mut` got inserted. +Redundant parentheses are ignored, but block expressions are not: +`{ &[mut] }` materializes a reference that must be valid, no matter which coercions or casts follow outside the block. + +When doing unsafety checking, we make references to packed fields that do *not* use this new "raw reference" operation a *hard error even in unsafe blocks* (after a transition period). +There is no situation in which this code is okay; it creates a reference that violates basic invariants. +Taking a raw reference to a packed field, on the other hand, is a safe operation as the raw pointer comes with no special promises. +"Unsafety checking" is thus not even a good term for this, maybe it should be a special pass dedicated to packed fields traversing MIR, or this can happen when lowering HIR to MIR. +This check has nothing to do with whether we are in an unsafe block or not. + +Moreover, to prevent programmers from accidentally creating a safe reference when they did not want to, we add a lint that identifies situations where the programmer likely wants a raw reference, and suggest an explicit cast in that case. +One possible heuristic here would be: If a safe reference (shared or mutable) is only ever used to create raw pointers, then likely it could be a raw pointer to begin with. +The details of this are best worked out in the implementation phase of this RFC. +The lint should, at the very least, fire for the cases covered by [SUGAR] if we do *not* adopt that option, and it should fire when the factor that prevents this matching [SUGAR] is just a redundant block, such as `{ &mut } as *mut ?T`. # Drawbacks [drawbacks]: #drawbacks -It might be surprising that the following two pieces of code are not equivalent: +If we adapt [SUGAR], it might be surprising that the following two pieces of code are not equivalent: + ```rust // Variant 1 let x = unsafe { &packed.field }; // Undefined behavior! @@ -164,90 +133,72 @@ let x = x as *const _; let x = unsafe { &packed.field as *const _ }; ``` -If `as` ever becomes an operation that can be overloaded, the behavior of -`&packed.field as *const _` can *not* be obtained by dispatching to the -overloaded `as` operator. Calling that method would assert validity of the -reference. +Notice, however, that the lint should fire in variant 1. + +If `as` ever becomes an operation that can be overloaded, the behavior of `&packed.field as *const _` with [SUGAR] can *not* be obtained by dispatching to the overloaded `as` operator. +Calling that method would assert validity of the reference. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -This is a compromise: I see no reasonable way to translate the first variant -shown in the "Drawbacks" section to a raw reference operation, and the second -variant is so common that we likely do not want to rule it out. Hence the -proposal to make them not equivalent. - -One alternative to introducing a new primitive operation might be to somehow -exempt "references immediately cast to a raw pointer" from the invariant. -However, we believe that the semantics of a MIR program, including whether it -has undefined behavior, should be deducible by executing it one step at a time. -Given that, it is unclear how a semantics that "lazily" checks references should -work, and how it could be compatible with the annotations we emit for LLVM. - -Instead of compiling `&[mut] as *[mut|const] ?T` to a raw reference -operation, we could introduce new surface syntax and keep the existing HIR->MIR -lowering the way it is. However, that would make lots of carefully written -existing code dealing with packed structs have undefined behavior. (There is -likely also lots of code that forgets to cast to a raw pointer, but I see no way -to make that legal -- and the proposal would make such uses a hard error in the -long term, so we should catch many of these bugs.) Also, no good proposal for a -surface syntax has been made yet -- and if one comes up later, this proposal is -forwards-compatible with also having explicit syntax for taking a raw reference -(and deprecating the safe-ref-then-cast way of writing this operation). - -We could be using the new operator in more cases, e.g. instead of having a smart -lint that tells people to insert casts, we could use that same analysis to infer -when to use a raw reference. This would make more code use raw references, thus -making more code defined. However, if someone *relies* on this behavior there -is a danger of accidentally adding a non-raw-ptr use to a reference, which would -then rather subtly make the program have UB. That's why we proposed this as a -lint instead. +[SUGAR] is a compromise to keep the analysis that affects code generation simple. +Detecting "Variant 1" (from the "Drawbacks" section) would need a much less local analysis. +Hence the proposal to make them not equivalent. + +A drawback of not adapting [SUGAR] is that we will have to wait longer (namely, until stabilization) until people can finally write UB-free versions of code that handles dangling or unaligned raw pointers. + +One alternative to introducing a new primitive operation might be to somehow exempt "references immediately cast to a raw pointer" from the invariant. +(Basically, a "dynamic" version of the static analysis performed by the lint.) +However, I believe that the semantics of a MIR program, including whether it as undefined behavior, should be deducible by executing it one step at a time. +Given that, it is unclear how a semantics that "lazily" checks references should work, and how it could be compatible with the annotations we emit for LLVM. # Prior art [prior-art]: #prior-art -I am not aware of another language with both comparatively strong invariants for -its reference types, and raw pointers. The need for taking a raw reference only -arise because of Rust having both of these features. +I am not aware of another language with both comparatively strong invariants for its reference types, and raw pointers. +The need for taking a raw reference only arise because of Rust having both of these features. # Unresolved questions [unresolved-questions]: #unresolved-questions -Which level should the lint default to? Eventually it should likely be `deny` -because it hints as code that might be UB and that can be fixed by a small -change. For a transition period, `warn` seems more appropriate. - -We could have different rules for when to take a raw reference (as opposed to a -safe one). - -Does the operator creating a raw pointer allow creating pointers that are not -dereferencable (with the size determined by `mem::size_of_val`)? It might turn -out to be useful to make dereferencability not part of the validity invariant, -but part of the alias model, so this is a separate question from whether the -pointer is aligned and non-NULL. Notice that we use `getelementptr inbounds` -for field access, so we would require some amount of dereferencability anyway -(or we could change codegen to not emit `inbounds` when creating a raw -reference, but that might adversely affect performance). +Which level should the lint default to? +I do not know of precedence to make such a lint `deny`. +But the lint should be worded clearly about the risk of undefined behavior that is involved here. -The interaction with auto-deref is a bit unfortunate. Maybe we can have a lint -to detect what seems to be unwanted cases of auto-deref -- namely, terms that -look like `&[mut] as *[mut|const] ?T` in the surface syntax but had a -method call inserted, thus manifesting a reference (with the associated -guarantees) where none might be expected. +The interaction with auto-deref is a bit unfortunate. +Maybe the lint should also cover cases that look like `&[mut] as *[mut|const] ?T` in the surface syntax but had a method call inserted, thus manifesting a reference (with the associated guarantees). +The lint as described would not fire because the reference actually gets used as such (being passed to `deref`). +However, what would the lint suggest to do instead? +There just is no way to write this code without creating a reference. # Future possibilities [future-possibilities]: #future-possibilities -It seems desirable to have a dedicated syntax for this, and to eventually lint -people to use that syntax), considering the special case introduced by this RFC -to be a deprecated pattern. This requires some work on figuring out an -unambiguous and meaningful syntax. A tentative proposal is `&raw const ` -to create a `*const _`, and `&raw mut ` to create a `*mut _`. +With [SUGAR], if Rust's type ascriptions end up performing coercions, those coercions should trigger the raw reference operator just like other coercions do. +So `&packed.field: *const _` would be `&raw const packed.field`. +If Rust ever gets type ascriptions with coercions for binders, likewise these coercions would be subject to these rules in cases like `match &packed.field { x: *const _ => x }`. + +It has been suggested to [remove `static mut`][static-mut] because it is too easy to accidentally create references with lifetime `'static`. +With `&raw` we could instead restrict `static mut` to only allow taking raw pointers (`&raw [mut|const] STATIC`) and entirely disallow creating references (`&[mut] STATIC`) even in safe code (in a future edition, likely; with lints in older editions). + +As mentioned above, expressions such as `&raw mut x.field` still trigger more UB than might be expected---as witnessed by a [couple of attempts found in the wild of people implementing `offsetof`][offset-of] with something like: + +```rust +let x: *mut Struct = NonNull::dangling().as_ptr(); +let field: *mut Field = &mut x.field; +``` + +The lint as described in this RFC would nudge people to instead write + +```rust +let x: *mut Struct = NonNull::dangling().as_ptr(); +let field: *mut Field = &raw mut x.field; +``` -If Rust's type ascriptions end up performing coercions, those coercions should -trigger the raw reference operator just like other coercions do. So -`&packed.field: *const _` would be `&raw const packed.field`. +which is better, but still UB: we emit a `getelementptr inbounds` for the `.field` offset computation. +It might be a good idea to just not do that -- we know that references are fine, but we could decide that when raw pointers are involved that might be dangling, we do not want to assert anything from just the fact that an offset is being computed. +A plain `getelementptr` still provides some information to the alias analysis, and if we derive the `inbounds` entirely from the fact that the pointer is created from or turned into a reference, we would be able to not have any clause in the language semantics that calls out the offset computation as potentially triggering UB +If people just hear "`&raw` means my pointer can be dangling" they might think the second version above is actually okay, forgetting that the field access itself has its own subtle rule; getting rid of that rule would remove one foot-gun for unsafe code authors to worry about. -If Rust ever gets type ascriptions with coercions for binders, likewise these -coercions would be subject to these rules in cases like -`match &packed.field { x: *const _ => x }`. +[static-mut]: https://github.com/rust-lang/rust/issues/53639 +[offset-of]: https://github.com/rust-lang/rfcs/pull/2582#issuecomment-467629986 From 8eea0bfccfc3c39533ad0be27b887a67d532cf6d Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 21 Mar 2019 19:50:34 +0100 Subject: [PATCH 17/23] clarify --- text/0000-raw-reference-mir-operator.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 8a3c407a8b5..55a190f238f 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -145,7 +145,7 @@ Calling that method would assert validity of the reference. Detecting "Variant 1" (from the "Drawbacks" section) would need a much less local analysis. Hence the proposal to make them not equivalent. -A drawback of not adapting [SUGAR] is that we will have to wait longer (namely, until stabilization) until people can finally write UB-free versions of code that handles dangling or unaligned raw pointers. +A drawback of not adapting [SUGAR] is that we will have to wait longer (namely, until stabilization of the new syntax) until people can finally write UB-free versions of code that handles dangling or unaligned raw pointers. One alternative to introducing a new primitive operation might be to somehow exempt "references immediately cast to a raw pointer" from the invariant. (Basically, a "dynamic" version of the static analysis performed by the lint.) From 24aad9e76f2f834296d48bb2740dee9f3149d4d2 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Thu, 21 Mar 2019 19:52:30 +0100 Subject: [PATCH 18/23] mention copying around --- text/0000-raw-reference-mir-operator.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 55a190f238f..94d2a36d76e 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -24,6 +24,16 @@ In particular, references must be aligned and dereferencable, even when they are One consequence of these rules is that it becomes essentially impossible to create a raw pointer pointing to an unaligned struct field: `&packed.field as *const _` creates an intermediate unaligned reference, triggering undefined behavior because it is not aligned. +Instead, code currently has to copy values around to aligned locations if pointers need to be passed, e.g., to FFI, as in: + +```rust +#[derive(Default)] struct A(u8, i32); + +let mut a: A = Default::default(); +let mut local = a.1; // copy struct field to stack +unsafe { ffi_mod(&mut local as *mut _) }; // pass pointer to local to FFI +a.1 = local; // copy local to struct back +``` If one wants to avoid creating a reference to uninitialized data (which might or might not become part of the invariant that must be always upheld), it is also currently not possible to create a raw pointer to a field of an uninitialized struct: again, `&mut uninit.field as *mut _` would create an intermediate reference to uninitialized data. From 9889619d90aa18f643778182614807d508582583 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Sat, 30 Mar 2019 14:38:39 +0100 Subject: [PATCH 19/23] adjust lint level open question --- text/0000-raw-reference-mir-operator.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 94d2a36d76e..39cddb27ea8 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -171,9 +171,9 @@ The need for taking a raw reference only arise because of Rust having both of th # Unresolved questions [unresolved-questions]: #unresolved-questions -Which level should the lint default to? -I do not know of precedence to make such a lint `deny`. -But the lint should be worded clearly about the risk of undefined behavior that is involved here. +With [SUGAR], should the lint apply to cases that are covered by the special desugaring or not? +Also, if not, should the lint become `deny` eventually (maybe only on some editions)? +(Without [SUGAR], the lint clearly must apply to `&mut as *mut _`/`& as *const _`, and that pattern is common enough that the cost of `deny` is too high.) The interaction with auto-deref is a bit unfortunate. Maybe the lint should also cover cases that look like `&[mut] as *[mut|const] ?T` in the surface syntax but had a method call inserted, thus manifesting a reference (with the associated guarantees). From 6a803f180c1e8285e4f11e70967226b279950785 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 20:55:44 +0200 Subject: [PATCH 20/23] move SUGAR to its own subsection under future possibilities --- text/0000-raw-reference-mir-operator.md | 91 +++++++++++-------------- 1 file changed, 41 insertions(+), 50 deletions(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 39cddb27ea8..6e2c0793796 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -9,9 +9,7 @@ Introduce new variants of the `&` operator: `&raw mut ` to create a `*mut `, and `&raw const ` to create a `*const `. This creates a raw pointer directly, as opposed to the already existing `&mut as *mut _`/`& as *const _`, which create a temporary reference and then cast that to a raw pointer. As a consequence, the existing expressions ` as *mut ` and ` as *const ` where `` has reference type are equivalent to `&raw mut *` and `&raw const *`, respectively. -Moreover, add a lint to existing code that could use the new operator, and treat existing code that creates a reference and immediately casts or coerces it to a raw pointer as if it had been written with the new syntax. - -As an option (referred to as [SUGAR] below), we could treat `&mut as *mut _`/`& as *const _` as if they had been written with `&raw` to avoid creating temporary references when that was likely not the intention. +Moreover, add a lint to existing code that could use the new operator. # Motivation [motivation]: #motivation @@ -79,8 +77,7 @@ let &x = &X; // this is actually dereferencing the pointer, certainly UB let _ = &X; // throwing away the value immediately changes nothing &X; // different syntax for the same thing -let x = &X as *const T; // this is casting to raw but "too late", an intermediate reference has been created (only if we do no adapt [SUGAR]) -let x = &X as &T as *const T; // this is casting to raw but "too late" even if we adapt [SUGAR] +let x = &X as *const T; // this is casting to raw but "too late", an intermediate reference has been created ``` The only way to create a pointer to an unaligned or dangling location without triggering undefined behavior is to use `&raw`, which creates a raw pointer without an intermediate reference. @@ -90,19 +87,6 @@ The following is valid: let packed_cast = &raw const packed.field; ``` -As an optional extension ([SUGAR]) to keep existing code working and to provide a way for projects to adjust to these rules before the syntax bikeshed is finished, and to do so in a way that they do not have to drop support for old Rust versions, we could also treat all of the following as if they had been written using `&raw const` instead of `&`: - -```rust -let packed_cast = unsafe { &raw const packed.field as *const _ }; -let packed_coercion: *const _ = unsafe { &packed.field }; -let null_cast: *const _ = unsafe { &*ptr::null() } as *const _; -let null_coercion: *const _ = unsafe { &*ptr::null() }; -``` - -The intention is to cover all cases where a reference, just created, is immediately explicitly used as a value of raw pointer type. - -Notice that this only applies if no automatic call to `deref` or `deref_mut` got inserted: -those are regular function calls taking a reference, so in that case a reference is created and it must satisfy the usual guarantees. # Reference-level explanation @@ -114,11 +98,6 @@ The borrow checker should do the usual checks on the place used in `&raw`, but c When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like `Ref`. When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by `&raw`. -For the [SUGAR] option, when translating HIR to MIR, we recognize `&[mut] as *[mut|const] ?T` (where `?T` can be any type, also a partial one like `_`) as well as coercions from `&[mut] ` to a raw pointer type as a special pattern and treat them as if they would have been written `&raw [mut|const] `. -We do this *after* auto-deref, meaning this pattern does not apply when a call to `deref` or `deref_mut` got inserted. -Redundant parentheses are ignored, but block expressions are not: -`{ &[mut] }` materializes a reference that must be valid, no matter which coercions or casts follow outside the block. - When doing unsafety checking, we make references to packed fields that do *not* use this new "raw reference" operation a *hard error even in unsafe blocks* (after a transition period). There is no situation in which this code is okay; it creates a reference that violates basic invariants. Taking a raw reference to a packed field, on the other hand, is a safe operation as the raw pointer comes with no special promises. @@ -128,35 +107,16 @@ This check has nothing to do with whether we are in an unsafe block or not. Moreover, to prevent programmers from accidentally creating a safe reference when they did not want to, we add a lint that identifies situations where the programmer likely wants a raw reference, and suggest an explicit cast in that case. One possible heuristic here would be: If a safe reference (shared or mutable) is only ever used to create raw pointers, then likely it could be a raw pointer to begin with. The details of this are best worked out in the implementation phase of this RFC. -The lint should, at the very least, fire for the cases covered by [SUGAR] if we do *not* adopt that option, and it should fire when the factor that prevents this matching [SUGAR] is just a redundant block, such as `{ &mut } as *mut ?T`. +The lint should, at the very least, fire for the cases covered by the syntactic sugar extension (see [Future possibilities][future-possibilities]), and it should fire when the factor that prevents this matching the sugar is just a redundant block, such as `{ &mut } as *mut ?T`. # Drawbacks [drawbacks]: #drawbacks -If we adapt [SUGAR], it might be surprising that the following two pieces of code are not equivalent: - -```rust -// Variant 1 -let x = unsafe { &packed.field }; // Undefined behavior! -let x = x as *const _; -// Variant 2 -let x = unsafe { &packed.field as *const _ }; -``` - -Notice, however, that the lint should fire in variant 1. - -If `as` ever becomes an operation that can be overloaded, the behavior of `&packed.field as *const _` with [SUGAR] can *not* be obtained by dispatching to the overloaded `as` operator. -Calling that method would assert validity of the reference. +This introduces new clauses into our grammar for a niche operation. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -[SUGAR] is a compromise to keep the analysis that affects code generation simple. -Detecting "Variant 1" (from the "Drawbacks" section) would need a much less local analysis. -Hence the proposal to make them not equivalent. - -A drawback of not adapting [SUGAR] is that we will have to wait longer (namely, until stabilization of the new syntax) until people can finally write UB-free versions of code that handles dangling or unaligned raw pointers. - One alternative to introducing a new primitive operation might be to somehow exempt "references immediately cast to a raw pointer" from the invariant. (Basically, a "dynamic" version of the static analysis performed by the lint.) However, I believe that the semantics of a MIR program, including whether it as undefined behavior, should be deducible by executing it one step at a time. @@ -171,11 +131,6 @@ The need for taking a raw reference only arise because of Rust having both of th # Unresolved questions [unresolved-questions]: #unresolved-questions -With [SUGAR], should the lint apply to cases that are covered by the special desugaring or not? -Also, if not, should the lint become `deny` eventually (maybe only on some editions)? -(Without [SUGAR], the lint clearly must apply to `&mut as *mut _`/`& as *const _`, and that pattern is common enough that the cost of `deny` is too high.) - -The interaction with auto-deref is a bit unfortunate. Maybe the lint should also cover cases that look like `&[mut] as *[mut|const] ?T` in the surface syntax but had a method call inserted, thus manifesting a reference (with the associated guarantees). The lint as described would not fire because the reference actually gets used as such (being passed to `deref`). However, what would the lint suggest to do instead? @@ -184,10 +139,46 @@ There just is no way to write this code without creating a reference. # Future possibilities [future-possibilities]: #future-possibilities -With [SUGAR], if Rust's type ascriptions end up performing coercions, those coercions should trigger the raw reference operator just like other coercions do. +## "Syntactic sugar" extension + +We could treat `&mut as *mut _`/`& as *const _` as if they had been written with `&raw` to avoid creating temporary references when that was likely not the intention. +We could also do this when `&mut `/`& ` is used in a coercion site and gets coerced to a raw pointer. + +```rust +let x = &X as *const T; // this is fine now +let x: *const T; // this is fine if we also apply the "sugar" for coercions +let x = &X as &T as *const T; // this is casting to raw but "too late" even if we adapt [SUGAR] +let x = { &X } as *const T; // this is likely also too late (but should be covered by the lint) +let x: *const T = if b { &X } else { &Y }; // this is likely also too late (and hopefully covered by the lint) +``` + +Notice that this only applies if no automatic call to `deref` or `deref_mut` got inserted: +those are regular function calls taking a reference, so in that case a reference is created and it must satisfy the usual guarantees. + +The point of this to keep existing code working and to provide a way for projects to adjust to these rules before stabilization. +Another good reason for this extension is that code could be adjusted without having to drop support for old Rust versions. + +However, it might be surprising that the following two pieces of code are not equivalent: + +```rust +// Variant 1 +let x = unsafe { &packed.field }; // Undefined behavior! +let x = x as *const _; +// Variant 2 +let x = unsafe { &packed.field as *const _ }; // good code +``` + +This is at least partially mitigated by the fact that the lint should fire in variant 1. + +Another problem is that if `as` ever becomes an operation that can be overloaded, the behavior of `&packed.field as *const _` can *not* be obtained by dispatching to the overloaded `as` operator. +Calling that method would assert validity of the reference. + +In the future, if Rust's type ascriptions end up performing coercions, those coercions should trigger the raw reference operator just like other coercions do. So `&packed.field: *const _` would be `&raw const packed.field`. If Rust ever gets type ascriptions with coercions for binders, likewise these coercions would be subject to these rules in cases like `match &packed.field { x: *const _ => x }`. +## Other + It has been suggested to [remove `static mut`][static-mut] because it is too easy to accidentally create references with lifetime `'static`. With `&raw` we could instead restrict `static mut` to only allow taking raw pointers (`&raw [mut|const] STATIC`) and entirely disallow creating references (`&[mut] STATIC`) even in safe code (in a future edition, likely; with lints in older editions). From 92042f68ed0aa7fb80fb61283e07f9437f47bf20 Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 21:06:30 +0200 Subject: [PATCH 21/23] editing, and moving some other things to future possibilities --- text/0000-raw-reference-mir-operator.md | 31 ++++++++++++++----------- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 6e2c0793796..7ed63e916d1 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -9,7 +9,7 @@ Introduce new variants of the `&` operator: `&raw mut ` to create a `*mut `, and `&raw const ` to create a `*const `. This creates a raw pointer directly, as opposed to the already existing `&mut as *mut _`/`& as *const _`, which create a temporary reference and then cast that to a raw pointer. As a consequence, the existing expressions ` as *mut ` and ` as *const ` where `` has reference type are equivalent to `&raw mut *` and `&raw const *`, respectively. -Moreover, add a lint to existing code that could use the new operator. +Moreover, emit a lint for existing code that could use the new operator. # Motivation [motivation]: #motivation @@ -87,27 +87,20 @@ The following is valid: let packed_cast = &raw const packed.field; ``` - - # Reference-level explanation [reference-level-explanation]: #reference-level-explanation -Rust contains two operators that perform place-to-rvalue conversion (matching `&` in C): one to create a reference (with some given mutability) and one to create a raw pointer (with some given mutability). +Rust contains two operators that perform place-to-value conversion (matching `&` in C): one to create a reference (with some given mutability) and one to create a raw pointer (with some given mutability). In the MIR, this is reflected as either a distinct `Rvalue` or a flag on the existing `Ref` variant. +Lowering to MIR should *not* insert an implicit reborrow of `` in `&raw mut `; that reborrow would assert validity and thus defeat the entire point. The borrow checker should do the usual checks on the place used in `&raw`, but can just ignore the result of this operation and the newly created "reference" can have any lifetime. When translating MIR to LLVM, nothing special has to happen as references and raw pointers have the same LLVM type anyway; the new operation behaves like `Ref`. When interpreting MIR in the Miri engine, the engine will know not to enforce any invariants on the raw pointer created by `&raw`. -When doing unsafety checking, we make references to packed fields that do *not* use this new "raw reference" operation a *hard error even in unsafe blocks* (after a transition period). -There is no situation in which this code is okay; it creates a reference that violates basic invariants. -Taking a raw reference to a packed field, on the other hand, is a safe operation as the raw pointer comes with no special promises. -"Unsafety checking" is thus not even a good term for this, maybe it should be a special pass dedicated to packed fields traversing MIR, or this can happen when lowering HIR to MIR. -This check has nothing to do with whether we are in an unsafe block or not. - Moreover, to prevent programmers from accidentally creating a safe reference when they did not want to, we add a lint that identifies situations where the programmer likely wants a raw reference, and suggest an explicit cast in that case. One possible heuristic here would be: If a safe reference (shared or mutable) is only ever used to create raw pointers, then likely it could be a raw pointer to begin with. The details of this are best worked out in the implementation phase of this RFC. -The lint should, at the very least, fire for the cases covered by the syntactic sugar extension (see [Future possibilities][future-possibilities]), and it should fire when the factor that prevents this matching the sugar is just a redundant block, such as `{ &mut } as *mut ?T`. +The lint should, at the very least, fire for the cases covered by the [syntactic sugar extension][future-possibilities], and it should fire when the factor that prevents this matching the sugar is just a redundant block, such as `{ &mut } as *mut ?T`. # Drawbacks [drawbacks]: #drawbacks @@ -177,12 +170,21 @@ In the future, if Rust's type ascriptions end up performing coercions, those coe So `&packed.field: *const _` would be `&raw const packed.field`. If Rust ever gets type ascriptions with coercions for binders, likewise these coercions would be subject to these rules in cases like `match &packed.field { x: *const _ => x }`. -## Other +## Encouraging / requiring `&raw` in situations where references are often/definitely incorrect + +We could make references to packed fields that do *not* use this new "raw reference" operation a *hard error even in unsafe blocks* (after a transition period). +There is no situation in which this code is okay; it creates a reference that violates basic invariants. +Taking a raw reference to a packed field, on the other hand, is a safe operation as the raw pointer comes with no special promises. It has been suggested to [remove `static mut`][static-mut] because it is too easy to accidentally create references with lifetime `'static`. With `&raw` we could instead restrict `static mut` to only allow taking raw pointers (`&raw [mut|const] STATIC`) and entirely disallow creating references (`&[mut] STATIC`) even in safe code (in a future edition, likely; with lints in older editions). -As mentioned above, expressions such as `&raw mut x.field` still trigger more UB than might be expected---as witnessed by a [couple of attempts found in the wild of people implementing `offsetof`][offset-of] with something like: +## Other + +**Lowering of casts.** Currently, `mut_ref as *mut _` has a reborrow inserted, i.e., it gets lowered to `&mut *mut_ref as *mut _`. +It seems like a good idea to lower this to `&raw mut *mut_ref` instead to avoid any effects the reborrow might have in terms of permitted aliasing. + +**`offsetof` woes.** As mentioned above, expressions such as `&raw mut x.field` still trigger more UB than might be expected---as witnessed by a [couple of attempts found in the wild of people implementing `offsetof`][offset-of] with something like: ```rust let x: *mut Struct = NonNull::dangling().as_ptr(); @@ -198,7 +200,8 @@ let field: *mut Field = &raw mut x.field; which is better, but still UB: we emit a `getelementptr inbounds` for the `.field` offset computation. It might be a good idea to just not do that -- we know that references are fine, but we could decide that when raw pointers are involved that might be dangling, we do not want to assert anything from just the fact that an offset is being computed. -A plain `getelementptr` still provides some information to the alias analysis, and if we derive the `inbounds` entirely from the fact that the pointer is created from or turned into a reference, we would be able to not have any clause in the language semantics that calls out the offset computation as potentially triggering UB +However, there are concerns that a plain `getelementptr` will not be sufficiently optimized because it also permits arithmetic that wraps around the end of the address space. +LLVM currently does not support a `getelementptr nowrap` that disallows wrapping but permits cross-allocation arithmetic, but if that could be added, using it for raw pointers could save us from having to talk about the "no outofbounds arithmetic" rule in the semantics of field access (the UB triggered by creating dangling references would be enough). If people just hear "`&raw` means my pointer can be dangling" they might think the second version above is actually okay, forgetting that the field access itself has its own subtle rule; getting rid of that rule would remove one foot-gun for unsafe code authors to worry about. [static-mut]: https://github.com/rust-lang/rust/issues/53639 From e95df5c564a17d7738b88cef51f3b9bdfb7ce56f Mon Sep 17 00:00:00 2001 From: Ralf Jung Date: Fri, 26 Jul 2019 21:09:07 +0200 Subject: [PATCH 22/23] no more ref-to-ptr casts in MIR --- text/0000-raw-reference-mir-operator.md | 1 + 1 file changed, 1 insertion(+) diff --git a/text/0000-raw-reference-mir-operator.md b/text/0000-raw-reference-mir-operator.md index 7ed63e916d1..83f6f0e2008 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/0000-raw-reference-mir-operator.md @@ -183,6 +183,7 @@ With `&raw` we could instead restrict `static mut` to only allow taking raw poin **Lowering of casts.** Currently, `mut_ref as *mut _` has a reborrow inserted, i.e., it gets lowered to `&mut *mut_ref as *mut _`. It seems like a good idea to lower this to `&raw mut *mut_ref` instead to avoid any effects the reborrow might have in terms of permitted aliasing. +This has the side-effect of being able to entirely remove reference-to-pointer-*casts* from the MIR; that conversion would be done by a "raw reborrow" instead (which is consistent with the pointer-to-reference situation). **`offsetof` woes.** As mentioned above, expressions such as `&raw mut x.field` still trigger more UB than might be expected---as witnessed by a [couple of attempts found in the wild of people implementing `offsetof`][offset-of] with something like: From 9942dadfed1d947d1f35681847925259bf46eb13 Mon Sep 17 00:00:00 2001 From: Mazdak Farrokhzad Date: Sun, 15 Sep 2019 18:14:12 +0200 Subject: [PATCH 23/23] RFC 2582 --- ...e-mir-operator.md => 2582-raw-reference-mir-operator.md} | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) rename text/{0000-raw-reference-mir-operator.md => 2582-raw-reference-mir-operator.md} (98%) diff --git a/text/0000-raw-reference-mir-operator.md b/text/2582-raw-reference-mir-operator.md similarity index 98% rename from text/0000-raw-reference-mir-operator.md rename to text/2582-raw-reference-mir-operator.md index 83f6f0e2008..227455706a5 100644 --- a/text/0000-raw-reference-mir-operator.md +++ b/text/2582-raw-reference-mir-operator.md @@ -1,7 +1,7 @@ -- Feature Name: raw_reference_mir_operator +- Feature Name: `raw_ref_op` - Start Date: 2018-11-01 -- RFC PR: (leave this empty) -- Rust Issue: (leave this empty) +- RFC PR: [rust-lang/rfcs#2582](https://github.com/rust-lang/rfcs/pull/2582) +- Rust Issue: [rust-lang/rust#](https://github.com/rust-lang/rust/issues/) # Summary [summary]: #summary