Implement SHA256 SIMD intrinsics on x86 #3752

Kixunil · 2024-07-17T14:19:51Z

Disclaimer: this is my first contribution to miri's code. It's quite possible I'm missing something. This code works but may not be the cleanest/best possible.

It'd be useful to be able to verify code implementing SHA256 using SIMD since such code is a bit more complicated and at some points requires use of pointers. Until now miri didn't support x86 SHA256 intrinsics. This commit implements them.

Kixunil · 2024-07-17T14:29:50Z

How do I properly convey that I'm sure side effects don't happen? Just allow the lint?

RalfJung · 2024-07-18T09:47:27Z

Thanks for the PR! There are some other PRs we have to review before we get to this one, so unfortunately it may take a bit -- sorry for that.

From a quick glance, it seems like this PR actually implements sha256 in Miri. Ideally I'd like to avoid being responsible for cryptographic code. Is there a small crate, akin to aes, that has a suitable license for us to be able to use it here as a sha256 implementation?

If we do carry a sha256 implementation, I would prefer one that was not written by ChatGPT, but by a human that actually understands sha256. Maybe there's a larger Rust crypto crate that we can copy an implementation from.

apoelstra · 2024-07-18T15:55:42Z

@RalfJung there is the bitcoin_hashes crate which you're welcome to steal code from (it is CC0 licensed) which is written by professional human cryptographers. You probably don't want to depend on it because it does a bunch of other stuff (and you'll get complaints about having bitcoin in a dependency name).

There is also the sha2 crate written by a different set of professional human cryptographers, which is MIT/Apache licensed. And you could use it as a direct dependency (there is a good chance it is already in your dep tree somewhere) since it's narrowly focused and maintained. Though again, it does more than SHA256 which is really the only thing you want.

apoelstra · 2024-07-18T15:59:39Z

I'll also add, that while your aversion to carrying/maintaining crypto code is well-placed, hashes are a low-burden thing to maintain for a few reasons. By their nature you don't need to worry about constant-timeness so you can write them once and forget about the code forever (especially in something like Miri where you aren't going to want to go microoptimizing this). And also by their nature it's super hard to have subtle correctness bugs; pretty-much any bug will result in every single output being visibly totally wrong. Finally hash functions are pretty small self-contained pure algorithms.

RalfJung · 2024-07-18T17:06:04Z

Thanks, those are helpful comments!

Kixunil · 2024-07-18T17:11:00Z

There are some other PRs we have to review before we get to this one, so unfortunately it may take a bit -- sorry for that.

No problem! There's no rush, we can run my fork for now.

it seems like this PR actually implements sha256 in Miri

Not really, just the three primitives that need to be used together with a bunch of other code to actually implement sha256. I also suspect it's not trivial to just pull out these from existing impls - at least I tried and the code looked so different that I couldn't do it. That's why I gave up and translated the Intel's psueudocode which literally describe how the specific instructions work.

Ideally I'd like to avoid being responsible for cryptographic code.

Isn't miri only supposed to be used for testing (or rather sanitizing) the code? I don't believe anyone would ever run this in production where it would actually matter. The SHA256 tests that take under a second to run in debug mode run several minutes under miri. Further we can add a big fat warning about this.

Anyway, everything @apoelstra said is true. The code is constant-time and any single tiny mistake would just completely break it - as I experienced this first hand. It also happens to make debugging the code incredibly hard. (I had to put dbg! statements both into miri and into the tested code to figure out what was wrong.)

For reference this is what the actual sha256 implementation using SIMD looks like: https://github.com/rust-bitcoin/rust-bitcoin/blob/8804fa63b49dc5ee639e768b73650b46a5e0a8ab/hashes/src/sha256.rs#L494-L722

RalfJung · 2024-07-21T18:28:26Z

Not really, just the three primitives that need to be used together with a bunch of other code to actually implement sha256. I also suspect it's not trivial to just pull out these from existing impls - at least I tried and the code looked so different that I couldn't do it. That's why I gave up and translated the Intel's psueudocode which literally describe how the specific instructions work.

So the sha2 crate wouldn't be useful? That seems like the closest thing.
Cc @newpavlov -- could I ask for your help with this PR?

Isn't miri only supposed to be used for testing (or rather sanitizing) the code? I don't believe anyone would ever run this in production where it would actually matter. The SHA256 tests that take under a second to run in debug mode run several minutes under miri. Further we can add a big fat warning about this.

Yes, but that doesn't change the fact that Miri has a small set of maintainers, and we can't just ship code we all have no clue about -- that's not living up to our responsibility as maintainers. It's not about performance, it's about the fact that I can't check whether your code makes any sense and I don't have the time to become an expert in every problem domain that is covered by an Intel intrinsic.

"Code was translated by Chat GPT" also raises some very big alarm bells, since it seems to indicate that you also didn't check whether the code makes any sense. At the very least I would expect a link to the Intel docs, and a statement by you as PR author that you are sure that the Chat GPT translation is correct. Someone from the Miri team will have to double-check this before we can land this (if we can't find an alternative to carrying this code ourselves), but it's not reasonable to ask us to be the first humans to check this.

And of course the PR needs to come with some tests.

src/shims/x86/sha.rs

newpavlov · 2024-07-22T05:48:07Z

We probably could handle emulation of cryptographic intrinsics on the RustCrypto side. For example, we have a similar request for AES: RustCrypto/block-ciphers#389

But, personally, I would prefer to do it in separate crates, since having an emulation of SHA2 hardware intrinsics in the sha2 crate is not worthwhile in my opinion.

@tarcieri WDYT?

Kixunil · 2024-07-22T07:22:49Z

So the sha2 crate wouldn't be useful?

Looking at its code it does actually have the same primitives inside so copying them would be arguably better but they still need a bit of wiring code - the same there is at the beginning that just shuffles values in arrays.

I can't check whether your code makes any sense and I don't have the time to become an expert in every problem domain that is covered by an Intel intrinsic.

OK, fair. How can we solve it then? The code should exactly (bug-for-bug) emulate the Intel's intrinsic and the intrinsic has documented pseudo code implementation should just comparing the implementation be enough? Or is there anything else I can do?

"Code was translated by Chat GPT" also raises some very big alarm bells, since it seems to indicate that you also didn't check whether the code makes any sense.

I did test it by running bitcoin_hashes test suite in miri which passed perfectly and due to the property of hashes that any tiny mistake causes all the tests to reliably fail I'm confident it has the same behavior as the real instructions. The only thing I didn't yet check is if there are performance optimizations left. I thought this is not urgent.

If this is not sufficient please suggest specific steps I could do and I'll do it.

but it's not reasonable to ask us to be the first humans to check this.

That's absolutely something I wouldn't make you do, I'm sorry if it looked like I did. That's what I meant by "this code works". The disclaimer is about the style of the code. The things I'm unsure about is the miri part - does the way I do loads and stores make sense? I have no clue about miri internals, I just looked how other intrinsics do it and copied it. This is something I need guidance in.

And of course the PR needs to come with some tests.

Of course, I first wanted to see if my approach makes sense so that I wouldn't need to refactor the test (running SHA256 is the ultimate test). I'm also not sure what kind of testing strategy would be most appropriate. I kinda wish to just run the real instructions on real hardware with random inputs and compare the results but I don't know if this is a viable solution. Or just bake in the real SHA256 and run it on a few inputs. Of course I can also grab a bunch of random vectors and bake them in.

RalfJung · 2024-07-22T09:31:47Z

Looking at its code it does actually have the same primitives inside so copying them would be arguably better but they still need a bit of wiring code - the same there is at the beginning that just shuffles values in arrays.

Sounds like a good idea.

OK, fair. How can we solve it then? The code should exactly (bug-for-bug) emulate the Intel's intrinsic and the intrinsic has documented pseudo code implementation should just comparing the implementation be enough? Or is there anything else I can do?

Manually comparing Intel's pseudocode with the Miri code would be a way to get additional confidence on top of the testing.

The things I'm unsure about is the miri part - does the way I do loads and stores make sense? I have no clue about miri internals, I just looked how other intrinsics do it and copied it. This is something I need guidance in.

We don't usually load things into arrays on the host side, we just process them element-by-element, but here that doesn't seem to make sense so what you did looks reasonable.

Of course, I first wanted to see if my approach makes sense so that I wouldn't need to refactor the test (running SHA256 is the ultimate test). I'm also not sure what kind of testing strategy would be most appropriate. I kinda wish to just run the real instructions on real hardware with random inputs and compare the results but I don't know if this is a viable solution. Or just bake in the real SHA256 and run it on a few inputs. Of course I can also grab a bunch of random vectors and bake them in.

We should definitely have some test vectors directly checking the intrinsic, yes. If there are any corner cases, they should be covered here.

How big is a (not performance-oriented) sha256 implementation on top of these intrinsics? If that's small enough to put in a test, then we can also have that in addition.

src/shims/x86/sha.rs

Kixunil · 2024-07-22T12:06:16Z

OK, I'll look into it later, I have currently some urgent things to do.

RalfJung · 2024-08-17T14:30:08Z

@rustbot author
based on the last comment

Regarding the decision about how to implement these intrinsics; @rust-lang/miri see the discussion above -- what do you think?

Kixunil · 2024-08-17T15:47:59Z

@rustbot ready

Funny how you've looked at it at around the same time I did. :)

I've pushed an updated version that copies stuff from RustCrypto. I have addressed all your review comments including adding tests. The tests are somewhat large so LMK if I should remove anything.

RalfJung · 2024-08-17T15:55:04Z

Thanks!

Regarding the tests, it seems I should have been more clear -- what I meant is tests in tests/pass that invoke the intrinsics from a Rust program that is interpreted by Miri. We want to know that the intrinsic actually works, after all. :) Unit tests inside src are not really necessary.

Kixunil · 2024-08-17T18:10:16Z

I felt like something is off... Anyway, it's moved now.

src/shims/x86/sha.rs

saethlin · 2024-08-20T03:27:58Z

@rust-lang/miri see the discussion above -- what do you think?

I am similarly opposed to checking in any code that comes from ChatGPT. In addition to the issues with its accuracy and our ability to read the code we are checking in, it seems like the project has no regard for the licensing of its training set.

As to reviewing the implementation, I also cannot vouch for its correctness. I might be able to, given some time to learn SHA256.

That being said, I think there's a pretty low chance that the implementation here is subtly broken if it passes a few tests. The implementation has no branches and I don't see any edge cases, so just a handful of test cases should provide full coverage.

Kixunil · 2024-08-20T05:15:34Z

I am similarly opposed to checking in any code that comes from ChatGPT.

This is no longer relevant, the implementation now copies sha2 crate.

As to reviewing the implementation, I also cannot vouch for its correctness.

Since it's a copy of sha2 we can just rely on them.

That being said, I think there's a pretty low chance that the implementation here is subtly broken if it passes a few tests.

It's pretty much impossible to write an incorrect sha256 since any mistake will produce a completely different output. It's the same property as for the input.

RalfJung

LGTM, I just got some nits. :)

src/shims/x86/sha.rs

RalfJung · 2024-08-20T10:22:56Z

src/shims/x86/sha.rs

+ add(v0, sigma0x4(sha256load(v0, v1)))
+}
+
+fn sha256msg2_epu32(v4: [u32; 4], v3: [u32; 4]) -> [u32; 4] {


Please rename to sha256msg2.

src/shims/x86/sha.rs

tests/pass/shims/x86/intrinsics-sha.rs

It'd be useful to be able to verify code implementing SHA256 using SIMD since such code is a bit more complicated and at some points requires use of pointers. Until now `miri` didn't support x86 SHA256 intrinsics. This commit implements them.

Kixunil · 2024-08-20T13:05:48Z

Addressed the nits.

RalfJung · 2024-08-20T13:22:31Z

Looks good, thanks. :)

@bors r+

bors · 2024-08-20T13:22:34Z

📌 Commit 7949e92 has been approved by RalfJung

It is now in the queue for this repository.

bors · 2024-08-20T13:23:41Z

⌛ Testing commit 7949e92 with merge dcd2112...

bors · 2024-08-20T13:48:52Z

☀️ Test successful - checks-actions
Approved by: RalfJung
Pushing dcd2112 to master...

Kixunil · 2024-08-20T14:24:49Z

I can't find any information about the release schedule. Will this be available in the next nightly or will I have to wait longer?

RalfJung · 2024-08-20T14:34:35Z

It will be available once Miri get synced into the rustc repo. I usually do this each weekend, though this weekend I am traveling so it might happen later.

Kixunil mentioned this pull request Jul 17, 2024

Add software impls of sha intrinsics for miri rust-bitcoin/rust-bitcoin#3066

Closed

RalfJung reviewed Jul 21, 2024

View reviewed changes

src/shims/x86/sha.rs Outdated Show resolved Hide resolved

src/shims/x86/sha.rs Outdated Show resolved Hide resolved

RalfJung reviewed Jul 22, 2024

View reviewed changes

src/shims/x86/sha.rs Outdated Show resolved Hide resolved

rustbot added the S-waiting-on-author Status: Waiting for the PR author to address review comments label Aug 17, 2024

Kixunil force-pushed the simd-sha256 branch 3 times, most recently from de228a3 to 9f737d0 Compare August 17, 2024 15:18

rustbot added S-waiting-on-review Status: Waiting for a review to complete and removed S-waiting-on-author Status: Waiting for the PR author to address review comments labels Aug 17, 2024

Kixunil force-pushed the simd-sha256 branch from 9f737d0 to 99f66a6 Compare August 17, 2024 17:45

RalfJung reviewed Aug 18, 2024

View reviewed changes

src/shims/x86/sha.rs Show resolved Hide resolved

Kixunil force-pushed the simd-sha256 branch from 99f66a6 to 2151e4a Compare August 18, 2024 12:19

RalfJung reviewed Aug 20, 2024

View reviewed changes

Implement SHA256 SIMD intrinsics on x86

7949e92

It'd be useful to be able to verify code implementing SHA256 using SIMD since such code is a bit more complicated and at some points requires use of pointers. Until now `miri` didn't support x86 SHA256 intrinsics. This commit implements them.

Kixunil force-pushed the simd-sha256 branch from 2151e4a to 7949e92 Compare August 20, 2024 13:05

bors merged commit dcd2112 into rust-lang:master Aug 20, 2024
8 checks passed

Kixunil deleted the simd-sha256 branch August 20, 2024 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement SHA256 SIMD intrinsics on x86 #3752

Implement SHA256 SIMD intrinsics on x86 #3752

Kixunil commented Jul 17, 2024

Kixunil commented Jul 17, 2024 •

edited

Loading

RalfJung commented Jul 18, 2024

apoelstra commented Jul 18, 2024

apoelstra commented Jul 18, 2024

RalfJung commented Jul 18, 2024

Kixunil commented Jul 18, 2024 •

edited

Loading

RalfJung commented Jul 21, 2024

newpavlov commented Jul 22, 2024 •

edited

Loading

Kixunil commented Jul 22, 2024 •

edited

Loading

RalfJung commented Jul 22, 2024

Kixunil commented Jul 22, 2024

RalfJung commented Aug 17, 2024

Kixunil commented Aug 17, 2024

RalfJung commented Aug 17, 2024

Kixunil commented Aug 17, 2024

saethlin commented Aug 20, 2024

Kixunil commented Aug 20, 2024

RalfJung left a comment

RalfJung Aug 20, 2024

Kixunil commented Aug 20, 2024

RalfJung commented Aug 20, 2024

bors commented Aug 20, 2024

bors commented Aug 20, 2024

bors commented Aug 20, 2024

Kixunil commented Aug 20, 2024 •

edited

Loading

RalfJung commented Aug 20, 2024

Implement SHA256 SIMD intrinsics on x86 #3752

Implement SHA256 SIMD intrinsics on x86 #3752

Conversation

Kixunil commented Jul 17, 2024

Kixunil commented Jul 17, 2024 • edited Loading

RalfJung commented Jul 18, 2024

apoelstra commented Jul 18, 2024

apoelstra commented Jul 18, 2024

RalfJung commented Jul 18, 2024

Kixunil commented Jul 18, 2024 • edited Loading

RalfJung commented Jul 21, 2024

newpavlov commented Jul 22, 2024 • edited Loading

Kixunil commented Jul 22, 2024 • edited Loading

RalfJung commented Jul 22, 2024

Kixunil commented Jul 22, 2024

RalfJung commented Aug 17, 2024

Kixunil commented Aug 17, 2024

RalfJung commented Aug 17, 2024

Kixunil commented Aug 17, 2024

saethlin commented Aug 20, 2024

Kixunil commented Aug 20, 2024

RalfJung left a comment

Choose a reason for hiding this comment

RalfJung Aug 20, 2024

Choose a reason for hiding this comment

Kixunil commented Aug 20, 2024

RalfJung commented Aug 20, 2024

bors commented Aug 20, 2024

bors commented Aug 20, 2024

bors commented Aug 20, 2024

Kixunil commented Aug 20, 2024 • edited Loading

RalfJung commented Aug 20, 2024

Kixunil commented Jul 17, 2024 •

edited

Loading

Kixunil commented Jul 18, 2024 •

edited

Loading

newpavlov commented Jul 22, 2024 •

edited

Loading

Kixunil commented Jul 22, 2024 •

edited

Loading

Kixunil commented Aug 20, 2024 •

edited

Loading