Hmmm... I think this argument is solid. Albeit biased from GMP's perspective, bu...

serentty · on Dec 14, 2021

> RISC-V has a bunch of competing vector instructions.

There is only one standard V extension. Alibaba made a chip with a prerelease version of that V extension which is thus incompatible with the final version, but in practice that just means that the vector unit on that chip is not used because it is incompatible, not that there are now competing standards

zik · on Dec 2, 2021

GMP is basically a worst-case example since it uses a lot of overflow. The RISC-V architecture has been extensively studies and for most cases it's a little more dense than (say) ARM when compared like-for-like.

throwaway81523 · on Dec 3, 2021

> So 7-instructions to perform 512-bits of bignum addition is 73-bits-per-clock cycle, far superior in speed to the 32-bits-per-clock cycle from add + adc (the 64-bit code with implicit condition codes).

add+adc should still be 64 bits per cycle. adc doesn't just add the carry bit, it's an add instruction which includes the usual operands, plus the carry bit from the previous add or adc.

Teknoman117 · on Dec 2, 2021

Can you treat the whole vector register as a single bignum on x86? If so, I totally missed that.

dragontamer · on Dec 2, 2021

No.

Which is why I'm sure add / adc will still win at 128-bits, or 256-bits.

The main issue is that the vector-add instructions are missing carry-out entirely, so recreating the carry will be expensive. But with a big enough number, that carry propagation is parallelizable in log2(n), so a big enough bignum (like maybe 1024-bits) will probably be more efficient for SIMD.

expnkx · on Dec 5, 2021

even AVX512 dies