Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't generate instructions for ARMv7 NEON with GCC 7.4.1 #2276

Closed
zambony opened this issue Jul 22, 2024 · 4 comments
Closed

Can't generate instructions for ARMv7 NEON with GCC 7.4.1 #2276

zambony opened this issue Jul 22, 2024 · 4 comments

Comments

@zambony
Copy link

zambony commented Jul 22, 2024

I'm new to SIMD usage, so I'm a bit unsure if I've missed something.

I am trying to cross-compile an application from my x86 linux desktop to an SoC using ARMv7, but Highway's target detection seems to be using scalar code (from what I see in the assembly output).

The SoC's info, from cat /proc/cpuinfo:

processor       : 0
model name      : ARMv7 Processor rev 0 (v7l)
BogoMIPS        : 666.66
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0

processor       : 1
model name      : ARMv7 Processor rev 0 (v7l)
BogoMIPS        : 666.66
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x3
CPU part        : 0xc09
CPU revision    : 0

Hardware        : Xilinx Zynq Platform
Revision        : 0003

I'm using this for my CMake to enable NEON

target_compile_options(application PRIVATE -march=armv7-a -mfpu=neon -mfloat-abi=hard)

My compiler:

arm-linux-gnueabihf-g++ (Linaro GCC 7.4-2019.02) 7.4.1 20181213 [linaro-7.4-2019.02 revision 56ec6f6b99cc167ff0c2f8e1a2eed33b1edc85d4]
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I have enabled -DHWY_CMAKE_ARM7:BOOL=ON, too.

However, HWY_NAMESPACE seems to refer to N_SCALAR, as in the assembly.

simd::N_SCALAR::myFunction(...)

The performance of the SIMD code is also worse than the hand-written scalar version, so I am pretty sure it's not working correctly.

Have I missed some setup? Is this a limitation of the old version of GCC I'm using?

@johnplatts
Copy link
Contributor

The HWY_NEON_WITHOUT_AES target on Armv7 requires support for VFPv4 in addition to NEON.

@zambony
Copy link
Author

zambony commented Jul 22, 2024

Oh, darn. Is this a limitation of Highway? As in, could I use the intrinsics manually? I'm trying to operate on a large dataset of doubles, so I was hoping to somehow leverage SIMD. If I have to write it by hand I guess I'll do it.

@jan-wassenberg
Copy link
Member

You could try removing the check for HWCAP_VFPv4 in targets.cc and seeing if it works for your application.
I think the main things missing in VFPv3 are fp16 and FMA support. FMA is quite important, and I have seen VFPv3 so rarely, that I doubt we'd remove the requirement. Also, ArmV7 does not have vectorized f64 instructions, which is exactly what you want - I think the SoC is too old for this use case :)

@zambony
Copy link
Author

zambony commented Jul 23, 2024

Also, ArmV7 does not have vectorized f64 instructions, which is exactly what you want - I think the SoC is too old for this use case :)

I see :( I'll have to come up with a different solution then. Thanks for the responses.

@zambony zambony closed this as completed Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants