Optimize Rust impls #108

ChillFish8 · 2024-04-05T20:27:41Z

Related to #107

Optimizes the native implementation in a way that the compiler can actually vectorize the implementations despite the IEE rules.

Although not the simplest, it is more realistic of a 'native' implementation if you are trying to get the maximum speed by going down to intrinsic instructions like AVX.

ashvardanian · 2024-04-08T17:41:35Z

Hi @ChillFish8! Thanks for your contribution!
Indeed, your loop-unrolled variant is much faster than the naive Rust approach, even the procedural code.

     Running rust/benches/cosine.rs (target/release/deps/cosine-e0cccefbe212a606)
Gnuplot not found, using plotters backend
SIMD Cosine/SimSIMD/0   time:   [91.178 ns 91.296 ns 91.444 ns]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  4 (4.00%) high mild
  4 (4.00%) high severe
SIMD Cosine/Rust Procedural/0
                        time:   [793.02 ns 796.96 ns 802.25 ns]
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) high mild
  12 (12.00%) high severe
SIMD Cosine/Rust Functional/0
                        time:   [794.70 ns 797.24 ns 801.14 ns]
Found 12 outliers among 100 measurements (12.00%)
  4 (4.00%) high mild
  8 (8.00%) high severe
SIMD Cosine/Rust Unrolled/0
                        time:   [208.64 ns 209.64 ns 211.12 ns]
Found 14 outliers among 100 measurements (14.00%)
  5 (5.00%) high mild
  9 (9.00%) high severe

I am mostly working on recent CPUs and on Intel Sapphire Rapids SimSIMD currently wins thanks to AVX-512 support. I wouldn't expect much difference for f32 on AVX2-only machines. For other types, it may be noticeable. Maybe it makes sense to add benchmarks for i8, the wins can be very noticeable there 🤗

# [4.3.0](v4.2.2...v4.3.0) (2024-04-08) ### Add * `toBinary` for JavaScript ([1f1fd3a](1f1fd3a)) ### Improve * Procedural Rust benchmarks ([e01ec6c](e01ec6c)) * Unrolled Rust benchmarks (#108) ([508e7a0](508e7a0)), closes [#108](#108)

ashvardanian · 2024-04-08T18:12:00Z

🎉 This PR is included in version 4.3.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

ChillFish8 · 2024-04-08T20:48:49Z

@ashvardanian Do you have a rough idea what the performance difference is between the Intel rapids AVX512 vs something like an AMD 7700 or Epyc chip? Just curious since I develop mostly on AMD type CPUs which can be a bit difficult to predict how performance goes on intel chipsets.

ashvardanian · 2024-04-08T21:04:33Z

@ChillFish8 on Zen4 most of AVX512 is available, except for FP16 extensions. Everything except for that should work great.

If you are on Zen3 or older, SimSIMD will use F16C extensions for FMA. They are quite slow, but still much better than serial code for half-precision, as modern compilers can't handle that type well. For single-precision you may not get any gains on older CPUs.

For int8, SimSIMD should work great on both old and new CPUs. That type is often used in heavily-quantized embedding models.

ChillFish8 mentioned this pull request Apr 5, 2024

[Rust Bindings] Poor performance VS ndarray (BLAS) and optimized iteration impls #107

Open

ashvardanian changed the base branch from main to main-dev April 8, 2024 00:01

ChillFish8 and others added 2 commits April 8, 2024 17:31

Optimize Rust impls

8dbcf00

Merge branch 'main-dev' into faster-rust

6190d32

ashvardanian force-pushed the faster-rust branch from cbe82b2 to 6190d32 Compare April 8, 2024 17:38

ashvardanian merged commit 508e7a0 into ashvardanian:main-dev Apr 8, 2024
36 checks passed

ashvardanian added the released label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Rust impls #108

Optimize Rust impls #108

ChillFish8 commented Apr 5, 2024

ashvardanian commented Apr 8, 2024

ashvardanian commented Apr 8, 2024

ChillFish8 commented Apr 8, 2024

ashvardanian commented Apr 8, 2024

Optimize Rust impls #108

Optimize Rust impls #108

Conversation

ChillFish8 commented Apr 5, 2024

ashvardanian commented Apr 8, 2024

ashvardanian commented Apr 8, 2024

ChillFish8 commented Apr 8, 2024

ashvardanian commented Apr 8, 2024