Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build failure in OpenBLAS #44517

Closed
fxcoudert opened this issue Mar 8, 2022 · 12 comments · Fixed by #45409 or #45391
Closed

Build failure in OpenBLAS #44517

fxcoudert opened this issue Mar 8, 2022 · 12 comments · Fixed by #45409 or #45391
Labels
domain:building Build system, or building Julia or its dependencies external dependencies Involves LLVM, OpenBLAS, or other linked libraries system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips

Comments

@fxcoudert
Copy link
Contributor

Default build of OpenBLAS on macOS ARM (Apple Silicon) within Julia:

make prefix=/tmp/toto VERBOSE=1 USE_BINARYBUILDER=0 PYTHON=python3 MACOSX_VERSION_MIN=12 USE_SYSTEM_LLVM=1 USE_SYSTEM_LIBUNWIND=1

gives the following build error:

gfortran -mmacosx-version-min=12 -mcpu=apple-a12 -Wa,-q -O2 -fPIC  -fdefault-integer-8 -O2 -fdefault-integer-8 -Wall -frecursive -fno-optimize-sibling-calls -fPIC -march=armv8-a  -c -o spotrf2.o spotrf2.f
f951: Error: unknown value 'apple-a12' for '-mcpu'
f951: note: valid arguments are: cortex-a34 cortex-a35 cortex-a53 cortex-a57 cortex-a72 cortex-a73 thunderx thunderxt88p1 thunderxt88 octeontx octeontx81 octeontx83 thunderxt81 thunderxt83 emag xgene1 falkor qdf24xx exynos-m1 phecda thunderx2t99p1 vulcan thunderx2t99 cortex-a55 cortex-a75 cortex-a76 cortex-a76ae cortex-a77 cortex-a78 cortex-a78ae cortex-a78c cortex-a65 cortex-a65ae cortex-x1 ares neoverse-n1 neoverse-e1 octeontx2 octeontx2t98 octeontx2t96 octeontx2t93 octeontx2f95 octeontx2f95n octeontx2f95mm a64fx tsv110 thunderx3t110 zeus neoverse-v1 saphira neoverse-n2 cortex-a57.cortex-a53 cortex-a72.cortex-a53 cortex-a73.cortex-a35 cortex-a73.cortex-a53 cortex-a75.cortex-a55 cortex-a76.cortex-a55 cortex-r82 generic
make[4]: *** [spotrf2.o] Error 1
make[3]: *** [lapacklib] Error 2
make[2]: *** [netlib] Error 2
*** Clean the OpenBLAS build with 'make -C deps clean-openblas'. Rebuild with 'make OPENBLAS_USE_THREAD=0' if OpenBLAS had trouble linking libpthread.so, and with 'make OPENBLAS_TARGET_ARCH=NEHALEM' if there were errors building SandyBridge support. Both these options can also be used simultaneously. ***

I tried passing OPENBLAS_TARGET_ARCH=VORTEX to make as an additional parameter, which is passed down to openblas as make TARGET=VORTEX. But that does not appear to solve the issue.

@giordano giordano added domain:building Build system, or building Julia or its dependencies external dependencies Involves LLVM, OpenBLAS, or other linked libraries labels Mar 8, 2022
@ViralBShah ViralBShah added the system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips label Mar 8, 2022
@giordano
Copy link
Contributor

giordano commented Mar 8, 2022

I presume apple-a12 is coming from

julia/Make.inc

Line 905 in f731c38

MCPU:=apple-a12
It's unfortunate GCC doesn't support (yet?) the same set of CPU names as LLVM.

@fxcoudert
Copy link
Contributor Author

Even if the -mcpu option is forcibly removed, the build of openblas from source on macOS does not succeed:

/private/tmp/julia/usr/bin/objconv @objconv.def ../libopenblas64__armv8p-r0.3.20.a ../libopenblas64__armv8p-r0.3.20.a.osx.renamed
make[3]: /private/tmp/julia/usr/bin/objconv: No such file or directory
make[3]: *** [../libopenblas64__armv8p-r0.3.20.a.osx.renamed] Error 1
make[2]: *** [shared] Error 2
*** Clean the OpenBLAS build with 'make -C deps clean-openblas'. Rebuild with 'make OPENBLAS_USE_THREAD=0' if OpenBLAS had trouble linking libpthread.so, and with 'make OPENBLAS_TARGET_ARCH=NEHALEM' if there were errors building SandyBridge support. Both these options can also be used simultaneously. ***
make[1]: *** [scratch/openblas-0b678b19dc03f2a999d6e038814c4c50b9640a4e/build-compiled] Error 1
make: *** [julia-deps] Error 2

The Julia makefile passes OBJCONV=/private/tmp/julia/usr/bin/objconv to make, but there that file does not exist. There is a deps/scratch/objconv/objconv, which is then copied as usr/tools/objconv, but not usr/bin/objconv 🤷

@giordano
Copy link
Contributor

giordano commented May 5, 2022

@fxcoudert I understand your frustration and much appreciate your willingness to help, but I'd just like to ask you to bear in mind that all the problems you're facing are because the build from source is barely tested, if at all, by developers and users, as you probably well figured out by now. Apple Silicon in particular is even more problematic because only a handful of developers have access to this system. We prefer instead to reuse the binaries built in Yggdrasil precisely to avoid everybody, users and developers, to go through all the hoops of supporting building complicated libraries on dozens of different platforms and even more so configurations. With BinaryBuilder, instead, we build only once in a single controlled environment for all supported platforms, and then no one else needs to fight with stubborn build systems, niche operating systems, disappearing versions of python and such. [Side note: it isn't advertised anywhere because we don't have strong guarantees, but in practice builds with BinaryBuilder are usually reproducible and verifiable independently by other users using the same tool, which at least should relieve people worried about what nasty bits we may put into those prebuilt binaries.] This also sensibly cuts down CI time for the project, as we don't need to rebuild large binary dependencies like LLVM for every single commit pushed to the master branch or pull requests.

Supporting a full build from source would be great, but we don't have enough resources to keep up with CI for that code path, which hopefully explains all of your issues and why no one seems to be worried (which isn't the case!). We'll eventually sort all these problems out, but it'll take some time. I'm not even that familiar myself with Julia's build system (and I'm quite stretched with my non-Julia job until next week), I'm learning with you 🙂

@fxcoudert
Copy link
Contributor Author

fxcoudert commented May 6, 2022

@giordano I do understand the current state of things, and I thank you for your help in my various bug reports.

I'd just like to point out that there are many users (distros, etc) for which installing binaries compiled on a black-box build system somewhere is not acceptable. The nice thing about open source software is the possibility for users to build from source, tune, or even if they install binaries know that they come from a trusted source, independent from individual package authors.

That's why I find it surprising that build-from-source is considered a second-class citizen, and I hope that by raising issues and proposing patches (when I can) I can help — I am not even a Julia user myself, despite investing a significant amount of time to try and get it in buildable shape. I know as a maintainer of my own open source stuff how useful it is to have well-documented, reproducible bug reports with debug info, so I try to provide those in the clearest fashion.

@fxcoudert
Copy link
Contributor Author

For this specific issue, I think it was introduced by #42538
@staticfloat could you have a look?

@ViralBShah
Copy link
Member

ViralBShah commented May 7, 2022

@DilumAluthge it may perhaps be nice to test the full source only build every night on CI. Maybe just Linux may be a good start. Is such a thing possible?

@ViralBShah
Copy link
Member

ViralBShah commented May 7, 2022

@fxcoudert Broadly we do agree with you, and are happy for your contributions. As @giordano explained, it is a matter of dev time but we would love to have non BB be more robust.

M1 is just also too new and it will take some time to shake issues out on that architecture.

@fxcoudert
Copy link
Contributor Author

M1 is just also too new and it will take some time to shake issues out on that architecture.

I understand, but note that most of the issues raised occur on macOS Intel as well.

@DilumAluthge
Copy link
Member

@DilumAluthge it may perhaps be nice to test the full source only build every night on CI. Maybe just Linux may be a good start. Is such a thing possible?

We actually already do this. Once per day, we build and test Julia with USE_BINARYBUILDER=0 on x86_64-linux-gnu. Here are the logs from the most recent such run, which passed:

@fxcoudert
Copy link
Contributor Author

fxcoudert commented May 19, 2022

The objconv issue above can be fixed by this patch:

diff --git a/deps/openblas.mk b/deps/openblas.mk
index a025580bcc..770ca978de 100644
--- a/deps/openblas.mk
+++ b/deps/openblas.mk
@@ -29,7 +29,7 @@ endif
 ifeq ($(USE_BLAS64), 1)
 OPENBLAS_BUILD_OPTS += INTERFACE64=1 SYMBOLSUFFIX="$(OPENBLAS_SYMBOLSUFFIX)" LIBPREFIX="libopenblas$(OPENBLAS_LIBNAMESUFFIX)"
 ifeq ($(OS), Darwin)
-OPENBLAS_BUILD_OPTS += OBJCONV=$(abspath $(build_bindir)/objconv)
+OPENBLAS_BUILD_OPTS += OBJCONV=$(abspath $(build_depsbindir)/objconv)
 $(BUILDDIR)/$(OPENBLAS_SRC_DIR)/build-compiled: | $(build_prefix)/manifest/objconv
 endif
 endif

Looking at the other dependencies and tools, it looks like build_depsbindir is more appropriate than build_bindir. Even the objconv.mk has this path in its clean-objconv target:

deps/objconv.mk:	-rm -f $(BUILDDIR)/objconv/build-compiled $(build_depsbindir)/objconv

Does that seem right?

@ViralBShah
Copy link
Member

This may be a @staticfloat question.

@fxcoudert
Copy link
Contributor Author

I've opened a PR for the objconv part: #45391

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:building Build system, or building Julia or its dependencies external dependencies Involves LLVM, OpenBLAS, or other linked libraries system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips
Projects
None yet
4 participants