Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support opaque pointers in crucible-llvm and uc-crux-llvm #1075

Open
RyanGlScott opened this issue Apr 10, 2023 · 8 comments
Open

Support opaque pointers in crucible-llvm and uc-crux-llvm #1075

RyanGlScott opened this issue Apr 10, 2023 · 8 comments
Labels
crucible llvm uc-crux Issues specific to under-constrained crux

Comments

@RyanGlScott
Copy link
Contributor

After upgrading llvm-pretty and llvm-pretty-bc-parser to support LLVM's opaque ptr type (see GaloisInc/llvm-pretty#102 and GaloisInc/llvm-pretty-bc-parser#177, respectively), we will also need to upgrade crucible-llvm and uc-crux-llvm to handle them.

crucible-llvm

To a large extent, crucible-llvm is already designed to handle opaque pointers, as the LLVMPtr type intentionally does not include a pointee type. That being said, there are still various places in crucible-llvm that inspect pointee types of LLVM pointers, which will need to be changed to support the ptr type. Here are all the places that I have found:

  • The PtrType data constructor for MemType includes a pointee type:

    In the spirit of having opaque and non-opaque pointers co-exist, I think we will also need a corresponding PtrOpaqueType data constructor that does not have a pointee type.

  • Similarly, the LLVM override function parser machinery includes a non-opaque pointer type, but not an opaque pointer type:

    We will want to make the latter available when the user writes ptr in an llvmOvr.

  • Speaking of overrides, we have quite a few overrides that are written with non-opaque-pointer types, such as the memcpy override:

    [llvmOvr| i8* @memcpy( i8*, i8*, size_t ) |]

    We could write a copy of each of these overrides that replaces non-opaque pointers with opaque ones (e.g., memcpy(ptr, ptr, size_t), but this would be extremely tedious to maintain. Better yet would be to tweak the override matcher functionality to accept both opaque and non-opaque pointers alike. That is, in this code:

    isMatchingDeclaration requested provided = and
    [ L.decName requested == L.decName provided
    , matchingArgList (L.decArgs requested) (L.decArgs provided)
    , L.decRetType requested == L.decRetType provided
    -- TODO? do we need to pay attention to various attributes?
    ]
    where
    matchingArgList [] [] = True
    matchingArgList [] _ = L.decVarArgs requested
    matchingArgList _ [] = L.decVarArgs provided
    matchingArgList (x:xs) (y:ys) = x == y && matchingArgList xs ys

    Change (==) (which uses Type's Eq instance) to eqTypeModuloOpaquePtrs.

    Even with this change, there are still a handful of overrides that will need special, opaque-pointer-aware versions. An example of this is llvm.memcpy.p0i8.p0i8.i32:

    [llvmOvr| void @llvm.memcpy.p0i8.p0i8.i32( i8*, i8*, i32, i32, i1 ) |]

    With opaque pointers, the name of this intrinsic becomes llvm.memcpy.p0.p0.i32. We could, I suppose, devise some clever string hackery to match both llvm.memcpy.p0.p0.i32 and llvm.memcpy.p0i8.p0i8.i32 with the same override, but that might be overkill. I don't expect there will be too many intrinsics of this sort, so I think it will be fine to just make copies of them.

  • The translation of several LLVM instructions inspects pointee types. Here are all the ones that I have found:

uc-crux-llvm

The largest challenge that I can foresee is figuring out how to deal with PtrShape:

data PtrShape m tag (ft :: FullType m)
= ShapeUnallocated
| ShapeAllocated !Int
| ShapeInitialized !(Seq (Shape m tag ft))

If I'm reading this correctly, Shape is indexed by the pointee type to some degree, which won't work in an opaque pointer setting. That being said, I'm not quite sure how this code works, so I might be misunderstanding what is going on here.

@RyanGlScott RyanGlScott added llvm crucible uc-crux Issues specific to under-constrained crux labels Apr 10, 2023
@kquick
Copy link
Member

kquick commented Apr 10, 2023

I agree on changing the equality perspective for overrides.

@langston-barrett
Copy link
Contributor

The dependency on pointer types is deeply, deeply ingrained into UC-Crux-LLVM. UC-Crux attempts to uncover function preconditions by repeatedly executing a function starting from "minimal" symbolic inputs and attempting to "fix" any undefined behavior it encounters (see this post). In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound). Excising the use of pointer types would be a substantial undertaking.

@RyanGlScott
Copy link
Contributor Author

I certainly believe you when you say that this would be a substantial undertaking. That being said, I am curious to know how uc-crux-llvm's current approach to tracking pointee types works, if only to better my understanding of how much work would be required:

In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound).

At first glance, this approach seems to be compatible with opaque pointers. Even if pointer types are themselves opaque, their allocation sites are always annotated with their type. For example, an alloca <ty> instruction will return something of type ptr, but the <ty> should tell you everything you need to know about size, preconditions, etc.

The part where this gets tricky is when LLVM reinterprets the underlying memory at a different type, be it through bitcasts, getelementptr, or some other mechanism. For example, see here for a tricky example where a pointer is allocated with a "pointee type" of [2 x [2 x i8]], but a subsequent store instruction reinterprets the underlying memory at type i32. How does uc-crux-llvm deal with this sort of situation?

@langston-barrett
Copy link
Contributor

Even if pointer types are themselves opaque, their allocation sites are always annotated with their type.

Except for heap allocations 🙂 But anyway, the problem is that UC-Crux attempts to execute individual functions from the program, and must allocate memory itself to do so. Consider this function:

int f(int *x) {
    return x[5] + 1;
}

UC-Crux would first execute this with an "intial pointer" (see the blog post). It would fail when reading from x, because there wouldn't be any allocation backing it. On the next attempt, UC-Crux would use the function's signature to deduce that x is a pointer to an array of integers, and allocate an array of integers with length 6. Executing the function this time results in a signed integer overflow, so UC-Crux would deduce that the integer at index 5 must be signed-less-than INT_MAX. After this modification, execution would succeed.

These preconditions are stored in a Shape, as you highlighted above. The Shape is tied to (and type-indexed by) the LLVM type of the function argument, and the logic for deducing the preconditions depends on the type/shape involved. Shapes are indexed by Cursors, which also use high-level notions like struct and array types, rather than byte offsets.

To handle opaque pointers, this logic would all need to be based on offsets. Additionally, UC-Crux would either have to make shot-in-the-dark guesses about allocation sizes, or would have to instrument load/store/getelementptr instructions to track the types used and relate them back to the corresponding part of the input being loaded from.

How does uc-crux-llvm deal with this sort of situation?

It gives up a lot! It can simply fail to find adequate preconditions.

@RyanGlScott
Copy link
Contributor Author

Thanks, this is helpful to know.

For the time being, I suppose we could simply not support opaque pointers in uc-crux-llvm, and we could have uc-crux-llvm always invoke Clang with the necessary options to disable opaque pointers in LLVM bitcode. Unfortunately, this is not a viable long-term strategy, as LLVM 17 will drop support for non-opaque pointers entirely. We will need to have some kind of coping strategy if excising uc-crux-llvm of pointee types proves too difficult.

@langston-barrett
Copy link
Contributor

Yeah for sure!

We will need to have some kind of coping strategy if excising uc-crux-llvm of pointee types proves too difficult.

We can always replace pointer types with a pointer to an unbounded array of bytes, UC-Crux will just give you bad results (and give up often).

The original UC-symex paper had a lazy memory model that let them avoid this issue around allocation sizes, but again, that's a big project.

RyanGlScott added a commit that referenced this issue Apr 23, 2023
For now, we simply ignore the explicit `Type` field. In a subsequent commit, we
will use this type rather than inspecting the pointee type of the pointer
argument. See #1075.
RyanGlScott added a commit that referenced this issue Apr 23, 2023
For now, we simply ignore the explicit `Type` fields. In a subsequent commit,
we will use these types rather than inspecting the pointee type of the pointer
arguments. See #1075.
RyanGlScott added a commit that referenced this issue Apr 23, 2023
This adds a `PtrOpaqueType` data constructor to `MemType`, which encodes opaque
pointer types in `crucible-llvm`. There also adds cases for `PtrOpaqueType` in
several functions. See #1075.

There is more work to be done to robustly support opaque pointers, but I will
split this out into separate commits.
RyanGlScott added a commit that referenced this issue Apr 23, 2023
This patch:

1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are
   parsed like LLVM's `ptr` type.
2. Makes the LLVM override matching machinery treat opaque pointers as equal to
   non-opaque pointers. This means that most of the existing overrides that use
   non-opaque pointers will now also work for opaque pointers...
3. ...well, for _most_ overrides, that is. An exception to this rule are certain
   classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such
   an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque
   pointer, where the argument type is an `i8*`, but it would be named
   `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is
   a `ptr`.

   We could devise some clever string hackery to match both
   `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same
   override, but that seems like overkill. There aren't _too_ many intrinsics
   of this sort, so I've opted to just make copies of these intrinsics
   specialized to opaque pointers where appropriate.

See #1075.
RyanGlScott added a commit that referenced this issue Apr 23, 2023
This tweaks the translator code to avoid inspecting pointers' pointee types, an
approach that won't work for opaque pointers. See #1075. The following
instructions' translations are adjusted in this patch:

* `getelementptr`
* `load`
* `store`
* `cmpxchg`
* `atomicrmw`
RyanGlScott added a commit that referenced this issue May 4, 2023
This adds a `PtrOpaqueType` data constructor to `MemType`, which encodes opaque
pointer types in `crucible-llvm`. There also adds cases for `PtrOpaqueType` in
several functions. See #1075.

There is more work to be done to robustly support opaque pointers, but I will
split this out into separate commits.
RyanGlScott added a commit that referenced this issue May 4, 2023
Currently, `uc-crux-llvm` does not support opaque pointers (see the discussion
at #1075). For now, we make `uc-crux-llvm` error if given an opaque pointer as
input. We also instruct the `uc-crux-llvm` test suite to invoke Clang in such a
way that it will not produce opaque pointers.
RyanGlScott added a commit that referenced this issue May 4, 2023
This patch:

1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are
   parsed like LLVM's `ptr` type.
2. Makes the LLVM override matching machinery treat opaque pointers as equal to
   non-opaque pointers. This means that most of the existing overrides that use
   non-opaque pointers will now also work for opaque pointers...
3. ...well, for _most_ overrides, that is. An exception to this rule are certain
   classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such
   an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque
   pointer, where the argument type is an `i8*`, but it would be named
   `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is
   a `ptr`.

   We could devise some clever string hackery to match both
   `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same
   override, but that seems like overkill. There aren't _too_ many intrinsics
   of this sort, so I've opted to just make copies of these intrinsics
   specialized to opaque pointers where appropriate.

See #1075.
RyanGlScott added a commit that referenced this issue May 4, 2023
This tweaks the translator code to avoid inspecting pointers' pointee types, an
approach that won't work for opaque pointers. See #1075. The following
instructions' translations are adjusted in this patch:

* `getelementptr`
* `load`
* `store`
* `cmpxchg`
* `atomicrmw`
RyanGlScott added a commit that referenced this issue May 4, 2023
Currently, `uc-crux-llvm` does not support opaque pointers (see the discussion
at #1075). For now, we make `uc-crux-llvm` error if given an opaque pointer as
input. We also instruct the `uc-crux-llvm` test suite to invoke Clang in such a
way that it will not produce opaque pointers.
RyanGlScott added a commit that referenced this issue May 4, 2023
This patch:

1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are
   parsed like LLVM's `ptr` type.
2. Makes the LLVM override matching machinery treat opaque pointers as equal to
   non-opaque pointers. This means that most of the existing overrides that use
   non-opaque pointers will now also work for opaque pointers...
3. ...well, for _most_ overrides, that is. An exception to this rule are certain
   classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such
   an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque
   pointer, where the argument type is an `i8*`, but it would be named
   `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is
   a `ptr`.

   We could devise some clever string hackery to match both
   `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same
   override, but that seems like overkill. There aren't _too_ many intrinsics
   of this sort, so I've opted to just make copies of these intrinsics
   specialized to opaque pointers where appropriate.

See #1075.
RyanGlScott added a commit that referenced this issue May 4, 2023
This tweaks the translator code to avoid inspecting pointers' pointee types, an
approach that won't work for opaque pointers. See #1075. The following
instructions' translations are adjusted in this patch:

* `getelementptr`
* `load`
* `store`
* `cmpxchg`
* `atomicrmw`
RyanGlScott added a commit that referenced this issue May 30, 2023
For now, we simply ignore the explicit `Type` field. In a subsequent commit, we
will use this type rather than inspecting the pointee type of the pointer
argument. See #1075.
RyanGlScott added a commit that referenced this issue May 30, 2023
For now, we simply ignore the explicit `Type` fields. In a subsequent commit,
we will use these types rather than inspecting the pointee type of the pointer
arguments. See #1075.
RyanGlScott added a commit that referenced this issue May 30, 2023
This adds a `PtrOpaqueType` data constructor to `MemType`, which encodes opaque
pointer types in `crucible-llvm`. There also adds cases for `PtrOpaqueType` in
several functions. See #1075.

There is more work to be done to robustly support opaque pointers, but I will
split this out into separate commits.
RyanGlScott added a commit that referenced this issue May 30, 2023
Currently, `uc-crux-llvm` does not support opaque pointers (see the discussion
at #1075). For now, we make `uc-crux-llvm` error if given an opaque pointer as
input. We also instruct the `uc-crux-llvm` test suite to invoke Clang in such a
way that it will not produce opaque pointers.
RyanGlScott added a commit that referenced this issue May 30, 2023
This patch:

1. Makes the LLVM override quasiquoter aware of opaque pointer types, which are
   parsed like LLVM's `ptr` type.
2. Makes the LLVM override matching machinery treat opaque pointers as equal to
   non-opaque pointers. This means that most of the existing overrides that use
   non-opaque pointers will now also work for opaque pointers...
3. ...well, for _most_ overrides, that is. An exception to this rule are certain
   classes of polymorphic LLVM intrinsics, such as `llvm.lifetime.start.*`. Such
   an intrinsic would be named `llvm.lifetime.start.p0i8` with a non-opaque
   pointer, where the argument type is an `i8*`, but it would be named
   `llvm.lifetime.start.p0` with an opaque pointer, where the argument type is
   a `ptr`.

   We could devise some clever string hackery to match both
   `llvm.lifetime.start.p0i8` and `llvm.lifetime.start.p0` with the same
   override, but that seems like overkill. There aren't _too_ many intrinsics
   of this sort, so I've opted to just make copies of these intrinsics
   specialized to opaque pointers where appropriate.

See #1075.
RyanGlScott added a commit that referenced this issue May 30, 2023
This tweaks the translator code to avoid inspecting pointers' pointee types, an
approach that won't work for opaque pointers. See #1075. The following
instructions' translations are adjusted in this patch:

* `getelementptr`
* `load`
* `store`
* `cmpxchg`
* `atomicrmw`
@kquick
Copy link
Member

kquick commented Mar 21, 2024

I think another, and possibly more sound approach would be to use the type specified at the usage points rather than the function declaration. Using the previous example:

int f(int *x) {
  return x[5] + 1;
}

the resulting llvm bitcode (from clang 17 with only opaque pointers) is:

define i32 @f(ptr nocapture noundef readonly %0) ... {
  %2 = getelementptr i32, ptr %0, i64 5
  ...
}

From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 i32 elements and the value of that 5th element may have constraints.

The soundness aspect I referred to earlier is that if the pointer is dereferenced with different types, uc-crux-llvm should be able to identify preconditions for all of the type uses rather than just those for the declared input type.

I realize this is probably a significant extension of the current capabilities of uc-crux-llvm (or perhaps not: I haven't looked at the implementation details), but I think it is probably a reasonable path forward. Let me know if I've oversimplified or overlooked something here, however.

@langston-barrett
Copy link
Contributor

langston-barrett commented Mar 21, 2024

From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 i32 elements and the value of that 5th element may have constraints.

Note that pointer type information was static, so UC-Crux had an idea of what shape to make arguments before executing the function. While in this simple example one could reasonably propagate the type at the use-site back up to the argument, in general that would require heap-aware data-flow analysis: for example, the pointer coming from the function argument could be stored to the heap, only later to be read out and used in some callee of a callee of the function under analysis.

We could consider dynamic tracking of pointer usages, but this gets weird if e.g., we add a symbolic offset to a pointer before casting it to a particular type.

Regardless, anything based on GEP types would only work until they get rid of GEP.

IMO the correct solution is to view memory as untyped, and to expand allocations when the function under analysis hits a memory error that indicates that an allocation was too small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
crucible llvm uc-crux Issues specific to under-constrained crux
Projects
None yet
Development

No branches or pull requests

3 participants