-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support opaque pointers in crucible-llvm
and uc-crux-llvm
#1075
Comments
I agree on changing the equality perspective for overrides. |
The dependency on pointer types is deeply, deeply ingrained into UC-Crux-LLVM. UC-Crux attempts to uncover function preconditions by repeatedly executing a function starting from "minimal" symbolic inputs and attempting to "fix" any undefined behavior it encounters (see this post). In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound). Excising the use of pointer types would be a substantial undertaking. |
I certainly believe you when you say that this would be a substantial undertaking. That being said, I am curious to know how
At first glance, this approach seems to be compatible with opaque pointers. Even if pointer types are themselves opaque, their allocation sites are always annotated with their type. For example, an The part where this gets tricky is when LLVM reinterprets the underlying memory at a different type, be it through |
Except for heap allocations 🙂 But anyway, the problem is that UC-Crux attempts to execute individual functions from the program, and must allocate memory itself to do so. Consider this function: int f(int *x) {
return x[5] + 1;
} UC-Crux would first execute this with an "intial pointer" (see the blog post). It would fail when reading from These preconditions are stored in a To handle opaque pointers, this logic would all need to be based on offsets. Additionally, UC-Crux would either have to make shot-in-the-dark guesses about allocation sizes, or would have to instrument
It gives up a lot! It can simply fail to find adequate preconditions. |
Thanks, this is helpful to know. For the time being, I suppose we could simply not support opaque pointers in |
Yeah for sure!
We can always replace pointer types with a pointer to an unbounded array of bytes, UC-Crux will just give you bad results (and give up often). The original UC-symex paper had a lazy memory model that let them avoid this issue around allocation sizes, but again, that's a big project. |
For now, we simply ignore the explicit `Type` field. In a subsequent commit, we will use this type rather than inspecting the pointee type of the pointer argument. See #1075.
For now, we simply ignore the explicit `Type` fields. In a subsequent commit, we will use these types rather than inspecting the pointee type of the pointer arguments. See #1075.
This adds a `PtrOpaqueType` data constructor to `MemType`, which encodes opaque pointer types in `crucible-llvm`. There also adds cases for `PtrOpaqueType` in several functions. See #1075. There is more work to be done to robustly support opaque pointers, but I will split this out into separate commits.
This patch: 1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are parsed like LLVM's `ptr` type. 2. Makes the LLVM override matching machinery treat opaque pointers as equal to non-opaque pointers. This means that most of the existing overrides that use non-opaque pointers will now also work for opaque pointers... 3. ...well, for _most_ overrides, that is. An exception to this rule are certain classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque pointer, where the argument type is an `i8*`, but it would be named `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is a `ptr`. We could devise some clever string hackery to match both `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same override, but that seems like overkill. There aren't _too_ many intrinsics of this sort, so I've opted to just make copies of these intrinsics specialized to opaque pointers where appropriate. See #1075.
This tweaks the translator code to avoid inspecting pointers' pointee types, an approach that won't work for opaque pointers. See #1075. The following instructions' translations are adjusted in this patch: * `getelementptr` * `load` * `store` * `cmpxchg` * `atomicrmw`
This adds a `PtrOpaqueType` data constructor to `MemType`, which encodes opaque pointer types in `crucible-llvm`. There also adds cases for `PtrOpaqueType` in several functions. See #1075. There is more work to be done to robustly support opaque pointers, but I will split this out into separate commits.
Currently, `uc-crux-llvm` does not support opaque pointers (see the discussion at #1075). For now, we make `uc-crux-llvm` error if given an opaque pointer as input. We also instruct the `uc-crux-llvm` test suite to invoke Clang in such a way that it will not produce opaque pointers.
This patch: 1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are parsed like LLVM's `ptr` type. 2. Makes the LLVM override matching machinery treat opaque pointers as equal to non-opaque pointers. This means that most of the existing overrides that use non-opaque pointers will now also work for opaque pointers... 3. ...well, for _most_ overrides, that is. An exception to this rule are certain classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque pointer, where the argument type is an `i8*`, but it would be named `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is a `ptr`. We could devise some clever string hackery to match both `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same override, but that seems like overkill. There aren't _too_ many intrinsics of this sort, so I've opted to just make copies of these intrinsics specialized to opaque pointers where appropriate. See #1075.
This tweaks the translator code to avoid inspecting pointers' pointee types, an approach that won't work for opaque pointers. See #1075. The following instructions' translations are adjusted in this patch: * `getelementptr` * `load` * `store` * `cmpxchg` * `atomicrmw`
Currently, `uc-crux-llvm` does not support opaque pointers (see the discussion at #1075). For now, we make `uc-crux-llvm` error if given an opaque pointer as input. We also instruct the `uc-crux-llvm` test suite to invoke Clang in such a way that it will not produce opaque pointers.
This patch: 1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are parsed like LLVM's `ptr` type. 2. Makes the LLVM override matching machinery treat opaque pointers as equal to non-opaque pointers. This means that most of the existing overrides that use non-opaque pointers will now also work for opaque pointers... 3. ...well, for _most_ overrides, that is. An exception to this rule are certain classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque pointer, where the argument type is an `i8*`, but it would be named `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is a `ptr`. We could devise some clever string hackery to match both `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same override, but that seems like overkill. There aren't _too_ many intrinsics of this sort, so I've opted to just make copies of these intrinsics specialized to opaque pointers where appropriate. See #1075.
This tweaks the translator code to avoid inspecting pointers' pointee types, an approach that won't work for opaque pointers. See #1075. The following instructions' translations are adjusted in this patch: * `getelementptr` * `load` * `store` * `cmpxchg` * `atomicrmw`
For now, we simply ignore the explicit `Type` field. In a subsequent commit, we will use this type rather than inspecting the pointee type of the pointer argument. See #1075.
For now, we simply ignore the explicit `Type` fields. In a subsequent commit, we will use these types rather than inspecting the pointee type of the pointer arguments. See #1075.
This adds a `PtrOpaqueType` data constructor to `MemType`, which encodes opaque pointer types in `crucible-llvm`. There also adds cases for `PtrOpaqueType` in several functions. See #1075. There is more work to be done to robustly support opaque pointers, but I will split this out into separate commits.
Currently, `uc-crux-llvm` does not support opaque pointers (see the discussion at #1075). For now, we make `uc-crux-llvm` error if given an opaque pointer as input. We also instruct the `uc-crux-llvm` test suite to invoke Clang in such a way that it will not produce opaque pointers.
This patch: 1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are parsed like LLVM's `ptr` type. 2. Makes the LLVM override matching machinery treat opaque pointers as equal to non-opaque pointers. This means that most of the existing overrides that use non-opaque pointers will now also work for opaque pointers... 3. ...well, for _most_ overrides, that is. An exception to this rule are certain classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque pointer, where the argument type is an `i8*`, but it would be named `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is a `ptr`. We could devise some clever string hackery to match both `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same override, but that seems like overkill. There aren't _too_ many intrinsics of this sort, so I've opted to just make copies of these intrinsics specialized to opaque pointers where appropriate. See #1075.
This tweaks the translator code to avoid inspecting pointers' pointee types, an approach that won't work for opaque pointers. See #1075. The following instructions' translations are adjusted in this patch: * `getelementptr` * `load` * `store` * `cmpxchg` * `atomicrmw`
I think another, and possibly more sound approach would be to use the type specified at the usage points rather than the function declaration. Using the previous example:
the resulting llvm bitcode (from clang 17 with only opaque pointers) is:
From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 The soundness aspect I referred to earlier is that if the pointer is dereferenced with different types, uc-crux-llvm should be able to identify preconditions for all of the type uses rather than just those for the declared input type. I realize this is probably a significant extension of the current capabilities of uc-crux-llvm (or perhaps not: I haven't looked at the implementation details), but I think it is probably a reasonable path forward. Let me know if I've oversimplified or overlooked something here, however. |
Note that pointer type information was static, so UC-Crux had an idea of what shape to make arguments before executing the function. While in this simple example one could reasonably propagate the type at the use-site back up to the argument, in general that would require heap-aware data-flow analysis: for example, the pointer coming from the function argument could be stored to the heap, only later to be read out and used in some callee of a callee of the function under analysis. We could consider dynamic tracking of pointer usages, but this gets weird if e.g., we add a symbolic offset to a pointer before casting it to a particular type. Regardless, anything based on GEP types would only work until they get rid of GEP. IMO the correct solution is to view memory as untyped, and to expand allocations when the function under analysis hits a memory error that indicates that an allocation was too small. |
After upgrading
llvm-pretty
andllvm-pretty-bc-parser
to support LLVM's opaqueptr
type (see GaloisInc/llvm-pretty#102 and GaloisInc/llvm-pretty-bc-parser#177, respectively), we will also need to upgradecrucible-llvm
anduc-crux-llvm
to handle them.crucible-llvm
To a large extent,
crucible-llvm
is already designed to handle opaque pointers, as theLLVMPtr
type intentionally does not include a pointee type. That being said, there are still various places incrucible-llvm
that inspect pointee types of LLVM pointers, which will need to be changed to support theptr
type. Here are all the places that I have found:The
PtrType
data constructor forMemType
includes a pointee type:crucible/crucible-llvm/src/Lang/Crucible/LLVM/MemType.hs
Line 118 in ad4a553
In the spirit of having opaque and non-opaque pointers co-exist, I think we will also need a corresponding
PtrOpaqueType
data constructor that does not have a pointee type.Similarly, the LLVM override function parser machinery includes a non-opaque pointer type, but not an opaque pointer type:
crucible/crucible-llvm/src/Lang/Crucible/LLVM/QQ.hs
Line 48 in ad4a553
We will want to make the latter available when the user writes
ptr
in anllvmOvr
.Speaking of overrides, we have quite a few overrides that are written with non-opaque-pointer types, such as the
memcpy
override:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics/Libc.hs
Line 86 in ad4a553
We could write a copy of each of these overrides that replaces non-opaque pointers with opaque ones (e.g.,
memcpy(ptr, ptr, size_t)
, but this would be extremely tedious to maintain. Better yet would be to tweak the override matcher functionality to accept both opaque and non-opaque pointers alike. That is, in this code:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics/Common.hs
Lines 256 to 267 in ad4a553
Change
(==)
(which usesType
'sEq
instance) toeqTypeModuloOpaquePtrs
.Even with this change, there are still a handful of overrides that will need special, opaque-pointer-aware versions. An example of this is
llvm.memcpy.p0i8.p0i8.i32
:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics/LLVM.hs
Line 303 in ad4a553
With opaque pointers, the name of this intrinsic becomes
llvm.memcpy.p0.p0.i32
. We could, I suppose, devise some clever string hackery to match bothllvm.memcpy.p0.p0.i32
andllvm.memcpy.p0i8.p0i8.i32
with the same override, but that might be overkill. I don't expect there will be too many intrinsics of this sort, so I think it will be fine to just make copies of them.The translation of several LLVM instructions inspects pointee types. Here are all the ones that I have found:
getelementptr
:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Constant.hs
Lines 162 to 182 in ad4a553
load
:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 158 to 160 in ad4a553
crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 1541 to 1545 in ad4a553
store
:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 1557 to 1561 in ad4a553
cmpxchg
:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 1700 to 1704 in ad4a553
atomicrmw
:crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 217 to 219 in ad4a553
crucible/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs
Lines 1727 to 1731 in ad4a553
uc-crux-llvm
The largest challenge that I can foresee is figuring out how to deal with
PtrShape
:crucible/uc-crux-llvm/src/UCCrux/LLVM/Shape.hs
Lines 85 to 88 in ad4a553
If I'm reading this correctly,
Shape
is indexed by the pointee type to some degree, which won't work in an opaque pointer setting. That being said, I'm not quite sure how this code works, so I might be misunderstanding what is going on here.The text was updated successfully, but these errors were encountered: