Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix two instances of undefined behavior in codec crate #7751

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions components/codec/src/byte.rs
Original file line number Diff line number Diff line change
Expand Up @@ -348,11 +348,18 @@ impl MemComparableByteCodec {
let src_ptr_end = src_ptr.add(src_len);

loop {
let src_ptr_next = src_ptr.add(MEMCMP_GROUP_SIZE + 1);
if std::intrinsics::unlikely(src_ptr_next > src_ptr_end) {
// Make sure the source buffer is the expected length before
// constructing the next source pointer. Note that it is UB to
// even _create_ a pointer that points beyond its allocation.
let ptrdiff = src_ptr_end.offset_from(src_ptr);
let ptroob = ptrdiff < MEMCMP_GROUP_SIZE as isize + 1;

if std::intrinsics::unlikely(ptroob) {
return Err(ErrorInner::eof().into());
}

let src_ptr_next = src_ptr.add(MEMCMP_GROUP_SIZE + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. How can it be an undefined behavior? I think accessing the data it points to is undefined, but get the pointer address is pure a math calculation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the rules enough to say for sure, but here's what miri says about this code: https://gist.github.com/brson/26d2111ce4c1a555d0e9a817c37ff646

I take this to mean that even creating the pointer is UB.

I can't find language in https://doc.rust-lang.org/beta/reference/behavior-considered-undefined.html to support the idea that even doing pointer math to create out-of-bounds pointers is undefined. The strongest it says is that dereferencing dangling pointers is undefined, which is obvious to me.

cc @oli-obk @RalfJung sorry for pinging you so much, but maybe you can explain why it's not ok to do pointer math that produces a bad pointer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah doing pointer arithmetic on out of bounds pointers is UB in LLVM, see for example this blog post: https://blog.regehr.org/archives/1518 - "it is UB to create a pointer that lies more than one element outside of an allocated object. It is UB merely to create such a pointer, it does not need to be dereferenced. Also, it is still UB even if the overflowed pointer happens to refer to some other allocated object.". This is about C, but applies equally to Rust

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get it. So it's an undefined behavior because the calculation can produce an overflow value.

Copy link

@RalfJung RalfJung May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find language in https://doc.rust-lang.org/beta/reference/behavior-considered-undefined.html to support the idea that even doing pointer math to create out-of-bounds pointers is undefined.

That list only enumerates UB caused by language primitives. When you call a method, the method docs mention the UB conditions for that method. In the case of add, it says:

Both the starting and resulting pointer must be either in bounds or one byte past the end of the same allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.

So no, it's not just overflow that is a problem.

Copy link

@RalfJung RalfJung May 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are asking "why does add say this is UB", the short answer is "because this is what LLVM requires for getelementptr inbounds, and the long answer is that this helps the compiler analyze which pointers could potentially point where.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should report targetted (to the intrinsic) errors instead of generic errors when they are caused by the intrinsic's shim?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intrinsic is an unstable internal implementation detail though -- how it is called and which intrinsic is used for what can change any time. Seems a bit strange to surface that in error messages?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right :/ the calling function's docs are important, not the bare intrinsic. Yea not sure what we can do to improve the miri error then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the help @RalfJung @oli-obk


// Copy `MEMCMP_GROUP_SIZE` bytes any way. However we will truncate the returned
// length according to padding size if it is the last block.
std::ptr::copy(src_ptr, dest_ptr, MEMCMP_GROUP_SIZE);
Expand Down Expand Up @@ -1048,12 +1055,11 @@ mod tests {
let output_len = unsafe {
let src_ptr = buffer.as_mut_ptr().add(encoded_prefix_len);
let slice_len = buffer.len() - encoded_prefix_len;
let src = std::slice::from_raw_parts(src_ptr, slice_len);
let dest = std::slice::from_raw_parts_mut(src_ptr, slice_len);
let buf = std::slice::from_raw_parts_mut(src_ptr, slice_len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is on purpose as L1069 says. There should either be some misunderstanding or the case should be removed.

/cc @breeswish

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, how about calling try_decode_first_internal instead?

if is_desc {
MemComparableByteCodec::try_decode_first_desc(src, dest).unwrap()
MemComparableByteCodec::try_decode_first_in_place_desc(buf).unwrap()
} else {
MemComparableByteCodec::try_decode_first(src, dest).unwrap()
MemComparableByteCodec::try_decode_first_in_place(buf).unwrap()
}
};
assert_eq!(output_len.0, encoded_len);
Expand Down