Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starch config: Add aarch64 #108

Merged
merged 2 commits into from
Feb 9, 2021

Conversation

gtjoseph
Copy link

@gtjoseph gtjoseph commented Feb 8, 2021

  • Added aarch64 to dsp/starchgen.py and Makefile.
  • Regenerated files

* Added aarch64 to dsp/starchgen.py and Makefile.
* Regenerated files
dsp/starchgen.py Outdated Show resolved Hide resolved
@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

When run in 64 bit mode, all you get for features is fp asimd evtstrm crc32 cpuid as opposed to the half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 you get in 32 bit mode. There's no neon or vfpv4 and no -mfpu option to gcc.

The arches available are armv8-a armv8.1-a armv8.2-a armv8.3-a and the default is the lowest compatibility (armv8-a) so I think the -march isn't needed.

@mutability
Copy link

Well, if there's nothing special needed to compile for aarch64 and it doesn't enable any features, then we don't need a special flavor for it, generic will be fine. (Maybe we should move -ffast-math to the generic flavor, it should be fine in all cases, we don't care about the non-exactness it can cause)

However I don't think that's true - aarch64 / armv8 support implies neon support, so we should have a flavor that enables the neon intrinsics in that case. (I have not tested the neon stuff under aarch64 at all - it may need some tweaking - but AFAIK the A64 neon instruction set is a superset of the A32 neon instruction set so I doubt any intrinsics will be missing)

@mutability
Copy link

The neon stuff at least compiles OK (runtime not tested) with -march=armv8-a+simd -mfpu=neon-fp-armv8

@mutability
Copy link

mutability commented Feb 8, 2021

So I think the way to go here is to have an armv8_neon flavor with:

features = ['neon']
compile_flags = ['-march=armv8-a+simd', '-mfpu=neon-fp-armv8', '-ffast-math']
test_function = 'cpu_supports_armv8_simd'

where cpu_supports_armv8_simd should test for Aarch64Info.features.asimd

and an aarch64 mix that includes generic + armv8_neon

@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

Works for me. Coming up.

@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

Well, almost. As I said, there are no -mfpu options for aarch64 so it'll just have to be -march=armv8-a+simd. I'm also going to try a flavor with sve2.

@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

I think we've got a catch-22 situation here. Let's say you use -march=armv8-a+sve2 in starchgen and you add a test in cpu.c to test for aarch64_info()->features.sve2, on a processor that doesn't support it, the compile is going to fail because -march=armv8-a+sve2 was included in the compile command line. Doesn't the test for flags have to happen in a module that is compiled without the flags?

@mutability
Copy link

-mfpu works just fine on my (Raspbian buster) gcc here (and in fact it is necessary when using neon intrinsics)

pi@piaware:~ $ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/8/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../src/configure -v --with-pkgversion='Raspbian 8.3.0-6+rpi1' --with-bugurl=file:https:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1) 
pi@piaware:~ $ echo '#include <arm_neon.h>' >test.c
pi@piaware:~ $ gcc -march=armv8-a+simd -mfpu=neon-fp-armv8 -c test.c
pi@piaware:~ $ 

@mutability
Copy link

mutability commented Feb 8, 2021

I think we've got a catch-22 situation here.

There's no catch-22. Only the stuff in dsp/generated is compiled with special flags. cpu.c and the bulk of dump1090 itself are compiled without any -march etc.

The assumption is that for a given mix, the compiler is capable of generating all code for the flavors that make up the mix, even if the current CPU can't execute that code. (which is fine, given that gcc's code generation isn't affected by the choice of host machine, only by choice of the target)

@mutability
Copy link

Maybe this is an aarch64-native vs armv8-on-32-bit target issue. I do notice that gcc seems to be producing 32-bit object files even in armv8 mode. You may need to experiment yourself to find the correct set of compiler flags to get neon intrinsics etc working; I don't have an aarch64 system on hand to try it on.

@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

gcc running natively on aarch64 does NOT support -mfpu.

gcc10 supports -march=armv8+sve2 but gcc8 only supports -march=armv8+sve. I added a flavor for armv8_sve2 with a -march=armv8-a+sve2 option along with the armv8_neon_simd flavor. When I compile on my rpi4 with native aarch64 and gcc8, I get...

cc -I. -DMODES_DUMP1090_VERSION=\"unknown\" -DMODES_DUMP1090_VARIANT=\"dump1090-fa\" -D_DEFAULT_SOURCE -DENABLE_CPUFEATURES -Icpu_features/include -DENABLE_RTLSDR -DSTARCH_MIX_AARCH64 -std=c11 -O3 -g -Wall -Wmissing-declarations -Werror -W -D_DEFAULT_SOURCE -fno-common -I/ -c -DSTARCH_MIX_AARCH64 -march=armv8-a+simd -ffast-math dsp/generated/flavor.armv8_neon_simd.c -o dsp/generated/flavor.armv8_neon_simd.o
cc -I. -DMODES_DUMP1090_VERSION=\"unknown\" -DMODES_DUMP1090_VARIANT=\"dump1090-fa\" -D_DEFAULT_SOURCE -DENABLE_CPUFEATURES -Icpu_features/include -DENABLE_RTLSDR -DSTARCH_MIX_AARCH64 -std=c11 -O3 -g -Wall -Wmissing-declarations -Werror -W -D_DEFAULT_SOURCE -fno-common -I/ -c -DSTARCH_MIX_AARCH64 -march=armv8-a+sve2 -ffast-math dsp/generated/flavor.armv8_sve2.c -o dsp/generated/flavor.armv8_sve2.o
cc1: error: invalid feature modifier in ‘-march=armv8-a+sve2’
make: *** [dsp/generated/makefile.aarch64:28: dsp/generated/flavor.armv8_sve2.o] Error 1

It's attempting to compile all the flavors before it even knows what flavors are valid.

@mutability
Copy link

mutability commented Feb 8, 2021

Well, yeah, that's how it works. If you tell starch to build a given mix, it'll build the flavors associated with the mix, that's how it's designed to work. If you need to build different combinations of flavors depending on the compiler in use, those would need to be separate mixes. See what I said above about the assumption that the compiler can build all flavors in the mix you request.

I deliberately did not put any sort of compiler/architecture detection into starch because it is a real can of worms; those decisions need to be made in the surrounding makefiles when selecting a mix to use.

The flavor is now armv8_neon_simd
@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

Gotcha. Updated to use simd anyway.

@gtjoseph
Copy link
Author

gtjoseph commented Feb 8, 2021

Just FYI... It's almost impossible to rebase when there are starch changes because of conflicts in the generated files.

Copy link

@mutability mutability left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need a wisdom.aarch64.

There's a lot of noise changes in the generated code that are just changes in iteration ordering, I'll take a look at making that more deterministic.

Comment on lines +57 to +61
gen.add_mix(name = 'aarch64',
description = 'AARCH64',
flavors = ['armv8_neon_simd', 'generic'],
wisdom_file = 'wisdom.aarch64')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suitable wisdom.aarch64 to add?

@mutability
Copy link

Just FYI... It's almost impossible to rebase when there are starch changes because of conflicts in the generated files.

Maybe the the thing to do here is don't include the starch generated code changed in your PR; I can regenerate after merging.

@mutability mutability merged commit 5600d3f into flightaware:dev Feb 9, 2021
@mutability
Copy link

I went ahead and merged this since I have some starch changes about to land that would have caused a bunch of conflicts

@gtjoseph
Copy link
Author

gtjoseph commented Feb 9, 2021

I was going to ask you about how you wanted to handle wisdom generation. I did add new wisdom.aarch64.pi4b and wisdom.aarch64.tegra to the wisdom directory but wasn't sure if you wanted me to put the pi4b one in the top level directory.

@gtjoseph gtjoseph deleted the dev-fa-starch-aarch64 branch February 9, 2021 12:51
@mutability
Copy link

The files in the wisdom subdir are just for reference, they're not directly used.

wisdom.arm etc in the top level dir are read by starchgen to provide the default ordering of functions in the generated code, if no custom wisdom is provided at runtime. So we want a good default there. For arm (32-bit) I essentially took the top pi4 armv7-specific result and the top pi0w generic result and used those.

I put together a wisdom.aarch64 for now based on the Pi 4 aarch64 results, with the Pi 0W results as a fallback if for some reason the armv8/neon detection says that it isn't supported (I don't think this is possible with the armv8-a profile, but..). If you want you can run a generic-flavors-only benchmark (-F generic) on aarch64 to generate something more representative for the fallback case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants