✨ support f16 + 🧹 some minor refactoring #1

jvdd · 2022-11-05T09:15:51Z

This PR does the following;

P.S. f16 is supported through converting it to (what I call) i16ord - which is an ordinal (i.e., monotonic) mapping of f16 to i16;

ord_transform(v: i16) = ((v >> 15) & 0x7FFF) ^ v)

(to apply this on a f16, just transmute f16 to i16 first)

🙌 As ordinality is preserved, we can use fast built-in i16 (SIMD) instructions for comparison.
↔️ As the transformation is symmetric we can - as long as we don't change the i16(ord) values - transform the outcome back to f16 without needing a lookup table.
⚡ (bonus): transformation only performs binary (bitwise) operations, ensuring minimal overhead
=> these operations can easily implemented in SIMD instructions 🎉

Visualization of the transformation

Illustration of ord_transform on all possible float16 numbers.
You can observe the montonic rising slope 🥳

Illustration of the symmetry propetry.
When applying the ord_transform twice on the same value, we get back the original value!!

The f16 support that leverages the ord_transform:

f16 SIMD ~ 2x faster than f32 SIMD 🔥
f16 scalar ~ 1.25x slower than f32 scalar (:face_exhaling:)
- 🐎 ~10x faster than generic scalar code (which f32 uses) on half::f16
- 🤯 ~3x faster than f32 upcasting (i.e., replacing ord_transform with to_f32 in the implementation)

jvdd added 6 commits October 6, 2022 11:14

♻️ faster scalar implementation + no Option output ⚡

633793f

🔥 support f16 efficiently

6e1a3f6

🧹 rename scalar

6b6e14d

🧹

768a6df

♻️ format

ba60c71

🤖 add CI-CD

f7e3097

jvdd merged commit f2d036f into main Nov 5, 2022

jvdd deleted the refactoring branch November 16, 2022 20:22

This was referenced Feb 4, 2023

🚧 POC - support NaNs for SSE & AVX2 f32 #18

Closed

💪 handle NaNs #16

Closed

jvdd mentioned this pull request Feb 26, 2023

🚀 float NaN handling #21

Merged

23 tasks