Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Map-based data structure for LLVM allocation histories. #550

Merged
merged 8 commits into from
Nov 9, 2020

Conversation

brianhuffman
Copy link
Contributor

Instead of recording allocations sequentially, we now store
collections of allocations together in a map indexed by block
ID. This makes lookups much faster in the common case where
the block ID we are checking is concrete.

Fixes #549.

Copy link
Contributor

@robdockins robdockins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments about the data structures involved, but this otherwise looks OK to me.


instance Semigroup (MemAllocs sym) where
(MemAllocs lhs_allocs) <> (MemAllocs rhs_allocs)
| Just (lhs_head_allocs, Allocations lhs_m) <- List.unsnoc lhs_allocs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this semigroup operation is called a lot, it may be worth optimizing access to the tail of the list by using some double-sided data structure, like Data.Sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice, I believe this semigroup operation is really only used to cons new operations onto the front, so a list probably isn't too bad. Changing the datatype would also require changing all the functions that traverse MemAllocs, which currently use pattern matching on the list constructors. On the other hand, making that traversal code more abstract wouldn't be a bad thing.

In either case, if we switch MemAllocs to use Data.Sequence, we should probably do MemWrites as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favor, generally, of making those traversals and such more abstract to provide us with more flexibility in the future with representation choices. I don't know if it should be part of this PR or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revision 151e1f9 makes MemAllocs into an abstract datatype, so now all the traversals are done using the abstract interface provided by Lang.Crucible.LLVM.MemModel.MemLog.

Copy link
Contributor

@andreistefanescu andreistefanescu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a substantial improvement to how allocation info is stored, let's take this opportunity to add a few tests

crucible-llvm/src/Lang/Crucible/LLVM/MemModel/Generic.hs Outdated Show resolved Hide resolved
@brianhuffman
Copy link
Contributor Author

I could use some advice/suggestions for how to make some proper tests for this code.

@brianhuffman
Copy link
Contributor Author

It looks like the macos CI builds are failing in the same way that the saw-script builds were doing recently:

inplace-ghc8.6.5.dylib: unknown file type, first eight bytes: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

I think @jared-w managed to fix the problem for saw-script by changing some cache settings; we probably need to do the same for the crucible repo.

@hazelweakly
Copy link
Contributor

Unfortunately the only reliable thing I've been able to do so far is just remove the cache by changing the cache key in the github actions file and seeing if that does anything. I've gotten the cache as narrow as I can get it at this point.

One thing I have next on my list to try is to use cabal-cache which has more advanced caching logic than just grabbing all of dist-newbuild.

@andreistefanescu
Copy link
Contributor

Regarding tests, take a look at

testMemWritesIndexed = testCase "indexed memory writes" $ withMem BigEndian $ \sym mem0 -> do
. Let's add a few tests using doAlloc and/or the allocation API.

@robdockins
Copy link
Contributor

I agree that adding tests for this is worthwhile, but I don't think we should let this PR linger unmerged.

Brian Huffman added 8 commits November 6, 2020 18:11
Instead of recording allocations sequentially, we now store
collections of allocations together in a map indexed by block
ID. This makes lookups much faster in the common case where
the block ID we are checking is concrete.

Fixes #549.
All operations that traverse a MemAllocs log are now implemented
within the Lang.Crucible.LLVM.MemModel.MemLog module.
Internal invariants of this abstract datatype are moved to an
ordinary comment because they are not relevant to the exported API.
@brianhuffman
Copy link
Contributor Author

9a304ce adds a few simple tests alongside testMemWritesIndexed, as suggested.

@brianhuffman brianhuffman merged commit 4ec5674 into master Nov 9, 2020
@travitch travitch deleted the MemAllocs branch December 16, 2021 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

isAllocated is a performance bottleneck
4 participants