-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix two instances of undefined behavior in codec crate #7751
base: master
Are you sure you want to change the base?
Conversation
It is UB to construct out-of-bounds pointers. Signed-off-by: Brian Anderson <[email protected]>
Previous code creates aliasing mutable slices. Signed-off-by: Brian Anderson <[email protected]>
components/codec/src/byte.rs
Outdated
return Err(ErrorInner::eof().into()); | ||
} | ||
|
||
let src_ptr_next = src_ptr.add(MEMCMP_GROUP_SIZE + 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get it. How can it be an undefined behavior? I think accessing the data it points to is undefined, but get the pointer address is pure a math calculation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the rules enough to say for sure, but here's what miri says about this code: https://gist.github.com/brson/26d2111ce4c1a555d0e9a817c37ff646
I take this to mean that even creating the pointer is UB.
I can't find language in https://doc.rust-lang.org/beta/reference/behavior-considered-undefined.html to support the idea that even doing pointer math to create out-of-bounds pointers is undefined. The strongest it says is that dereferencing dangling pointers is undefined, which is obvious to me.
cc @oli-obk @RalfJung sorry for pinging you so much, but maybe you can explain why it's not ok to do pointer math that produces a bad pointer here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah doing pointer arithmetic on out of bounds pointers is UB in LLVM, see for example this blog post: https://blog.regehr.org/archives/1518 - "it is UB to create a pointer that lies more than one element outside of an allocated object. It is UB merely to create such a pointer, it does not need to be dereferenced. Also, it is still UB even if the overflowed pointer happens to refer to some other allocated object.". This is about C, but applies equally to Rust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get it. So it's an undefined behavior because the calculation can produce an overflow value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find language in https://doc.rust-lang.org/beta/reference/behavior-considered-undefined.html to support the idea that even doing pointer math to create out-of-bounds pointers is undefined.
That list only enumerates UB caused by language primitives. When you call a method, the method docs mention the UB conditions for that method. In the case of add
, it says:
Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.
So no, it's not just overflow that is a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are asking "why does add
say this is UB", the short answer is "because this is what LLVM requires for getelementptr inbounds
, and the long answer is that this helps the compiler analyze which pointers could potentially point where.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should report targetted (to the intrinsic) errors instead of generic errors when they are caused by the intrinsic's shim?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intrinsic is an unstable internal implementation detail though -- how it is called and which intrinsic is used for what can change any time. Seems a bit strange to surface that in error messages?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right :/ the calling function's docs are important, not the bare intrinsic. Yea not sure what we can do to improve the miri error then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Brian Anderson <[email protected]>
This also marks the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, two comments about comments inline
Signed-off-by: Brian Anderson <[email protected]>
No release note, Please follow https://github.com/pingcap/community/blob/master/contributors/release-note-checker.md |
/rebuild |
/merge |
Your auto merge job has been accepted, waiting for:
|
/run-all-tests |
@brson merge failed. |
/merge |
/run-all-tests |
@brson merge failed. |
/merge |
/run-all-tests |
@@ -1048,12 +1071,11 @@ mod tests { | |||
let output_len = unsafe { | |||
let src_ptr = buffer.as_mut_ptr().add(encoded_prefix_len); | |||
let slice_len = buffer.len() - encoded_prefix_len; | |||
let src = std::slice::from_raw_parts(src_ptr, slice_len); | |||
let dest = std::slice::from_raw_parts_mut(src_ptr, slice_len); | |||
let buf = std::slice::from_raw_parts_mut(src_ptr, slice_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is on purpose as L1069 says. There should either be some misunderstanding or the case should be removed.
/cc @breeswish
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, how about calling try_decode_first_internal
instead?
Waiting on further review from @breeswish |
|
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What problem does this PR solve?
Fix two instances of undefined behavior in codec crate, one in
MemComparableBytesCodec::try_decode_first_internal
and one in its tests.Case 1: This function constructs a pointer called
src_ptr_next
to the next place it is going to need to look for data to decode. In the error case, this pointer points beyond its allocation, which is not allowed, and is undefined behavior. This patch changes that code path to use theoffset_from
function to instead calculate whether the difference from the current, valid pointer, to the next location, will be in-bounds.Case 2: A test case is constructing two overlapping slices, one of them mutable, in order to test the behavior of the decoder for overlapping slices. This is undefined behavior. I've changed the test to use the
in_place
versions of the functions, which don't require overlapping slices.Release note