Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PolarFire SoC - C++ Exception Exits Before Catch #11912

Open
sastel opened this issue Mar 13, 2024 · 20 comments
Open

PolarFire SoC - C++ Exception Exits Before Catch #11912

sastel opened this issue Mar 13, 2024 · 20 comments

Comments

@sastel
Copy link

sastel commented Mar 13, 2024

Summary

On the PolarFire Icicle board, when building from master with a config based on icicle:nsh (plus config and flag changes detailed below), the cxxtest built-in app exception test fails. It throws the exception and then exits without catching the exception.

I've tried this on both the Icicle board and a PolarFire Renode simulation. I'm building with riscv-none-elf-* from xPack: gcc version 13.2.0 (xPack GNU RISC-V Embedded GCC x86_64).

I'm hoping someone knows what the issue might be as I'm not familiar with the behind the scenes of exception handling. Meanwhile I'll keep working on this.

Config

Changed optimization to -Og for debugging and set -mstrict-align to prevent a crash due to misaligned instructions.

Ran make savedefconfig and the following. The task size configs are set because I had to increase CONFIG_DEFAULT_TASK_STACKSIZE to prevent a stack overflow during exception unwinding.

$ diff boards/risc-v/mpfs/icicle/configs/nsh/defconfig  defconfig
9a10,12
> # CONFIG_NSH_DISABLE_MB is not set
> # CONFIG_NSH_DISABLE_MH is not set
> # CONFIG_NSH_DISABLE_MW is not set
21a25,26
> CONFIG_CXX_EXCEPTION=y
> CONFIG_CXX_RTTI=y
25a31
> CONFIG_DEFAULT_TASK_STACKSIZE=32768
30a37
> CONFIG_HAVE_CXX=y
34a42,43
> CONFIG_LIBCXX=y
> CONFIG_LIBCXXABI=y
37a47
> CONFIG_LIBM=y
50a61
> CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=2048
51a63
> CONFIG_PTHREAD_STACK_DEFAULT=2048
58a71
> CONFIG_SCHED_HPWORKSTACKSIZE=2048
59a73
> CONFIG_SCHED_LPWORKSTACKSIZE=2048
66a81
> CONFIG_SYSTEM_NSH_STACKSIZE=2048
68a84
> CONFIG_TESTING_CXXTEST=y
69a86
> CONFIG_TESTING_GETPRIME_STACKSIZE=2048
70a88,89
> CONFIG_TESTING_OSTEST_FPUSTACKSIZE=2048
> CONFIG_TLS_NELEM=4
@sastel
Copy link
Author

sastel commented Mar 13, 2024

Scripts for Debugging with GDB and Renode

Here is what I've been using to debug this:

These scripts go in nuttxspace/nuttx.

$ cat gdb.sc
target remote :3333

break cxa_exception.cpp:279
break exit.c:55
$ cat polarfire-nuttx.resc
:name: PolarFire NuttX
:description: This is a sample script running NuttX on Icicle Kit with PolarFire SoC

emulation SetGlobalSerialExecution true
emulation SetGlobalQuantum "0.0008"

using sysbus
mach create

machine LoadPlatformDescription @platforms/cpus/polarfire-soc.repl
showAnalyzer mmuart1

macro reset
"""
    sysbus LoadELF @./nuttx
"""
runMacro $reset

e51 IsHalted true
#u54_1 IsHalted true
u54_2 IsHalted true
u54_3 IsHalted true
u54_4 IsHalted true

machine StartGdbServer 3333

Backtraces

image

@acassis
Copy link
Contributor

acassis commented Mar 13, 2024

@jrosberg did you face similar issue with using C++ on PolarFire?

@sastel
Copy link
Author

sastel commented Mar 14, 2024

@acassis @jrosberg the eh_frame and eh_frame_hdr sections are missing from the linker script boards/risc-v/mpfs/icicle/scripts/ld.script. That's part of the problem, though there seems to be other things missing because I was not able to fix it by adding those sections.

@acassis
Copy link
Contributor

acassis commented Mar 14, 2024

@sastel maybe you can look some other board with C++ support to trace the differences, probably there some other details. Also if you could try in some ARM (like STM32F4Discovery board) it could help to see if everything still working as expected.

@acassis
Copy link
Contributor

acassis commented Mar 17, 2024

Nice @jrosberg is not around, maybe @pussuw could help us! Ville, do you have some idea?

@pussuw
Copy link
Contributor

pussuw commented Mar 17, 2024

I don't use exceptions so no idea what's going on

@acassis
Copy link
Contributor

acassis commented Mar 17, 2024

Thanks @pussuw

@sastel
Copy link
Author

sastel commented Mar 18, 2024

@acassis I tried STM32F4 Discovery NSH build from 12.4 in Renode today, but it crashed on startup. I don't have the physical board.

I cloned gcc and compared the libgcc source with the asm from GDB as I stepped through the exception handling. It is clear that __cxa_throw calls _Unwind_RaiseException from gcc/libgcc/unwind.inc. That calls uw_init_context_1 from unwind-dw2.c which is calling abort() because it can't find an FDE (the call to uw_frame_state_for returns _URC_END_OF_STACK after _Unwind_Find_FDE returns NULL. I suspect that the FDE's are not being initialized correctly. I have to do some reading to understand how this is supposed to work.

Here's a great resource: https://maskray.me/blog/2020-12-12-c++-exception-handling-abi

@acassis
Copy link
Contributor

acassis commented Mar 18, 2024

Hi @sastel that is very interesting investigation. If you can, please write down about it in some blog, probably it will help more people in the future.

@sastel
Copy link
Author

sastel commented Mar 20, 2024

I made some progress on this today. _Unwind_Find_FDE was returning NULL because the unseen_objects linked list wasn't initialized. This linked list needs to be initialized at startup, and then during exception handling it has nodes added and moved into seen_objects. The list is used for finding an FDE by program counter.

Looks like users of libgcc can either link with crtbegin.o or write their own calls to libgcc __register_frame_info* function. The crtstuff.c file provides a frame_dummy() function which is supposed to be placed in the .init_array section by the linker, so that it gets invoked by the initialization logic pre-main. This frame_dummy function does the registration using __EH_FRAME_BEGIN__. The function was getting called by NuttX lib_cxx_initialize() but, __EH_FRAME_BEGIN__ was NULL so initialization didn't happen.

I hacked up the Toolchain.defs file to link crtbegin.o and crtend.o so that I get frame_dummy. With a new .eh_frame section in the linker script, __EH_FRAME_BEGIN__ points to the beginning of .eh_frame and initialization works.

image

image

The next problem is that during libgcc FDE searching, I get a RISC-V load address misaligned crash. I'm not sure what to do about this yet.

@acassis
Copy link
Contributor

acassis commented Mar 20, 2024

@pussuw do you have some suggestion how it could be integrated back into mainline?
Maybe we can use CONFIG_CXX_EXCEPTION to decide if we want to include crtbegin.o and crtend.o right?

@sastel
Copy link
Author

sastel commented Mar 20, 2024

@acassis to be clear, exceptions still aren't being caught - the load address misalignment crash is happening during the _Unwind_Find_FDE call from the throw.

I think we should use CONFIG_CXX_EXCEPTION, but whether we want to link with crtbegin.o or crtend.o versus extern void __register_frame(void* begin) and call it in our own init function is another question. Including .eh_frame also uses a bunch of extra memory, so we won't want to include it unless it is used for CXX exceptions.

@sastel
Copy link
Author

sastel commented Mar 21, 2024

@acassis one of my colleagues built the gcc toolchain (including libgcc.a) for RISC-V with -mstrict-align. That fixed the crash. Now there's another problem! 🤣

libc++abi: terminating due to uncaught exception of type std::runtime_error: runtime error

I'll chase this one down next.

@acassis
Copy link
Contributor

acassis commented Mar 21, 2024

Hi @sastel nice to know you guys found a solution! And too sad you are facing a new issue. This is why I think it could be a good idea to document all the process to help other people facing similar issue, maybe on other arch in the future.

@sastel
Copy link
Author

sastel commented Mar 21, 2024

This is why I think it could be a good idea to document all the process to help other people facing similar issue, maybe on other arch in the future.

It's a good idea. I'll have to solve all of the issues before I'm remotely qualified to write the blog post though. 😄

@sastel
Copy link
Author

sastel commented Mar 21, 2024

libc++abi: terminating due to uncaught exception of type std::runtime_error: runtime error

I have a solution to this. Exceptions are being thrown and caught now in cxxtest. The GCC toolchain rebuild is not necessary. The LSDA (.gcc_except_table) was not being initialized for some reason. An edit to the linker script fixed it. Will open a pull request including an Icicle board cxxtest defconfig if you're interested.

The exception handling stack usage is quite high (I needed to set the stack size for the task which throws the exception to 9 kB), but at least exceptions work. I'm investigating what is using all of that stack. It appears to be mostly consumed by the libgcc exception stack unwinding functions.

A big thank you to my colleague @sholford for his support on this!

@acassis
Copy link
Contributor

acassis commented Mar 21, 2024

Congratulations @sastel, it was a quick journey and it succeed! Kudos!!!

Now you are qualified to write a blog exampling to others how to fix this issue!

I like to document all the steps when I'm fixing an issue, because I know it will help others.

For instance, a post I created almost two years ago: https://acassis.wordpress.com/2022/10/31/trying-to-fix-a-mqtt-communication-issue-with-nuttx-and-tagoio/ helped a friend of mine today. Yes was trying to use Wireshark to work as MITM with his embedded board.

I hope you remember all the steps you took! ;-)

@royfengsss
Copy link
Contributor

Hi @sastel.

I have a solution to this. Exceptions are being thrown and caught now in cxxtest. The GCC toolchain rebuild is not necessary. The LSDA (.gcc_except_table) was not being initialized for some reason. An edit to the linker script fixed it. Will open a pull request including an Icicle board cxxtest defconfig if you're interested.

Could you kindly share your pull request on fixing this issue? I am facing a similar exception issue on ESP32s3. I hope your solution on icicle board could be a good reference of fixing on ESP board.

Thanks a lot.

@sastel
Copy link
Author

sastel commented Jun 28, 2024

@royfengsss this patch summarizes it.

diff --git a/arch/risc-v/src/common/Toolchain.defs b/arch/risc-v/src/common/Toolchain.defs
index 3d0e5296a9..e37598e2c4 100644
--- a/arch/risc-v/src/common/Toolchain.defs
+++ b/arch/risc-v/src/common/Toolchain.defs
@@ -285,6 +285,10 @@ endif
 # Add the builtin library

 EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-libgcc-file-name))
+ifeq ($(CONFIG_CXX_EXCEPTION),y)
+  EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtbegin.o))
+  EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=crtend.o))
+endif

 ifeq ($(CONFIG_LIBM_TOOLCHAIN),y)
   EXTRA_LIBS += $(wildcard $(shell $(CC) $(ARCHCPUFLAGS) --print-file-name=libm.a))
diff --git a/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig b/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig
new file mode 100644
index 0000000000..f536046464
--- /dev/null
+++ b/boards/risc-v/mpfs/icicle/configs/cxxtest/defconfig
@@ -0,0 +1,90 @@
+#
+# This file is autogenerated: PLEASE DO NOT EDIT IT.
+#
+# You can use "make menuconfig" to make any modifications to the installed .config file.
+# You can then do "make savedefconfig" to generate a new defconfig file that includes your
+# modifications.
+#
+# CONFIG_DISABLE_OS_API is not set
+# CONFIG_NSH_DISABLE_LOSMART is not set
+# CONFIG_NSH_DISABLE_MB is not set
+# CONFIG_NSH_DISABLE_MH is not set
+# CONFIG_NSH_DISABLE_MW is not set
+CONFIG_ARCH="risc-v"
+CONFIG_ARCH_BOARD="icicle"
+CONFIG_ARCH_BOARD_COMMON=y
+CONFIG_ARCH_BOARD_ICICLE_MPFS=y
+CONFIG_ARCH_CHIP="mpfs"
+CONFIG_ARCH_CHIP_MPFS250T_FCVG484=y
+CONFIG_ARCH_CHIP_MPFS=y
+CONFIG_ARCH_INTERRUPTSTACK=2048
+CONFIG_ARCH_RISCV=y
+CONFIG_ARCH_STACKDUMP=y
+CONFIG_BOARD_LOOPSPERMSEC=54000
+CONFIG_BUILTIN=y
+CONFIG_CXX_EXCEPTION=y
+CONFIG_CXX_RTTI=y
+CONFIG_DEBUG_ASSERTIONS=y
+CONFIG_DEBUG_FEATURES=y
+CONFIG_DEBUG_FULLOPT=y
+CONFIG_DEBUG_SYMBOLS=y
+CONFIG_DEFAULT_TASK_STACKSIZE=16384
+CONFIG_DEV_ZERO=y
+CONFIG_EXPERIMENTAL=y
+CONFIG_FS_PROCFS=y
+CONFIG_FS_ROMFS=y
+CONFIG_FS_SHMFS=y
+CONFIG_HAVE_CXX=y
+CONFIG_IDLETHREAD_STACKSIZE=2048
+CONFIG_INIT_ENTRYPOINT="nsh_main"
+CONFIG_INIT_STACKSIZE=3072
+CONFIG_INTELHEX_BINARY=y
+CONFIG_LIBCXX=y
+CONFIG_LIBCXXABI=y
+CONFIG_LIBC_HOSTNAME="icicle"
+CONFIG_LIBC_PERROR_STDOUT=y
+CONFIG_LIBC_STRERROR=y
+CONFIG_LIBM=y
+CONFIG_MEMSET_64BIT=y
+CONFIG_MEMSET_OPTSPEED=y
+CONFIG_MPFS_ENABLE_DPFPU=y
+CONFIG_MPFS_UART1=y
+CONFIG_NSH_ARCHINIT=y
+CONFIG_NSH_BUILTIN_APPS=y
+CONFIG_NSH_DISABLE_IFUPDOWN=y
+CONFIG_NSH_DISABLE_MKDIR=y
+CONFIG_NSH_DISABLE_RM=y
+CONFIG_NSH_DISABLE_RMDIR=y
+CONFIG_NSH_DISABLE_UMOUNT=y
+CONFIG_NSH_LINELEN=160
+CONFIG_NSH_STRERROR=y
+CONFIG_POSIX_SPAWN_DEFAULT_STACKSIZE=2048
+CONFIG_PREALLOC_TIMERS=4
+CONFIG_PTHREAD_STACK_DEFAULT=2048
+CONFIG_RAM_SIZE=1048576
+CONFIG_RAM_START=0x80200000
+CONFIG_RAW_BINARY=y
+CONFIG_READLINE_CMD_HISTORY=y
+CONFIG_READLINE_TABCOMPLETION=y
+CONFIG_RR_INTERVAL=200
+CONFIG_SCHED_HPWORK=y
+CONFIG_SCHED_HPWORKSTACKSIZE=2048
+CONFIG_SCHED_LPWORK=y
+CONFIG_SCHED_LPWORKSTACKSIZE=2048
+CONFIG_SCHED_WAITPID=y
+CONFIG_SERIAL_NPOLLWAITERS=2
+CONFIG_STACK_COLORATION=y
+CONFIG_START_MONTH=4
+CONFIG_START_YEAR=2021
+CONFIG_SYSTEM_CLE_CMD_HISTORY=y
+CONFIG_SYSTEM_NSH=y
+CONFIG_SYSTEM_NSH_STACKSIZE=2048
+CONFIG_SYSTEM_TIME64=y
+CONFIG_TASK_NAME_SIZE=20
+CONFIG_TESTING_CXXTEST=y
+CONFIG_TESTING_GETPRIME=y
+CONFIG_TESTING_GETPRIME_STACKSIZE=2048
+CONFIG_TESTING_OSTEST=y
+CONFIG_TESTING_OSTEST_FPUSTACKSIZE=2048
+CONFIG_TLS_NELEM=4
+CONFIG_UART1_SERIAL_CONSOLE=y
diff --git a/boards/risc-v/mpfs/icicle/scripts/ld.script b/boards/risc-v/mpfs/icicle/scripts/ld.script
index 85dc8503f8..503371e3a7 100644
--- a/boards/risc-v/mpfs/icicle/scripts/ld.script
+++ b/boards/risc-v/mpfs/icicle/scripts/ld.script
@@ -18,6 +18,8 @@
  *
  ****************************************************************************/

+#include <nuttx/config.h>
+
 MEMORY
 {
     progmem (rx) : ORIGIN = 0x80000000, LENGTH = 2M    /* w/ cache */
@@ -45,11 +47,26 @@ SECTIONS
         *(.glue_7)
         *(.glue_7t)
         *(.got)
-        *(.gcc_except_table)
         *(.gnu.linkonce.r.*)
         _etext = ABSOLUTE(.);
     } > progmem

+#if defined(CONFIG_HAVE_CXX)
+    .gcc_except_table : ALIGN(4) {
+        *(.gcc_except_table)
+        *(.gcc_except_table.*)
+    } > progmem
+
+#if defined(CONFIG_CXX_EXCEPTION)
+    .eh_frame_hdr : ALIGN(4) {
+        *(.eh_frame_hdr)
+    } > progmem
+    .eh_frame : ALIGN(4) {
+        KEEP(*(.eh_frame))
+    } > progmem
+#endif
+#endif
+
     .init_section : ALIGN(4) {
         _sinit = ABSOLUTE(.);
         KEEP(*(SORT_BY_INIT_PRIORITY(.init_array.*) SORT_BY_INIT_PRIORITY(.ctors.*)))

@royfengsss
Copy link
Contributor

@sastel Many thanks your patch. I will try based on your changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants