Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programe exited after call pthread_completejoin() many times on ESP32 #12096

Closed
HongxiaWangSSSS opened this issue Apr 8, 2024 · 7 comments · Fixed by #12106
Closed

Programe exited after call pthread_completejoin() many times on ESP32 #12096

HongxiaWangSSSS opened this issue Apr 8, 2024 · 7 comments · Fixed by #12106

Comments

@HongxiaWangSSSS
Copy link

Hi, @anchao

When I tested a multi-threaded stress test case, I found that after introducing (#11898), my program would exit with status code = 12.

The test log as below:

nsh> iwasm --max-threads=12 stress_test_threads_creation_wasi.aot
Spawning stress test is 10% finished
Spawning stress test is 20% finished
exit status=12
exit status=12
nsh>

After debugging I found in the function pthread_findjoininfo it will malloc some memory, but doesn't free it, like: https://github.com/apache/nuttx/blob/master/sched/pthread/pthread_completejoin.c#L100C1-L101C1 (there are other places similar to this),

So I'm a little curious about the reason and hope if you could help me~ Thank U

BR,
Hongxia

@HongxiaWangSSSS HongxiaWangSSSS changed the title Programe exit after called pthread_completejoin() many times on ESP32 Programe exited after call pthread_completejoin() many times on ESP32 Apr 8, 2024
@anchao
Copy link
Contributor

anchao commented Apr 8, 2024

Hi, @HongxiaWangSSSS
Do you have other test cases that can reproduce this issue?

joininfo will not be released until the task group is destroyed:
https://github.com/apache/nuttx/blob/master/sched/pthread/pthread_release.c#L60-L80

@anchao
Copy link
Contributor

anchao commented Apr 8, 2024

I tried to run the test code in the C native, the test case works as expected:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/lib-wasi-threads/stress-test/stress_test_threads_creation.c#L52

  1. replace the test code:
$ git status
HEAD detached at origin/master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	modified:   examples/hello/hello_main.c

no changes added to commit (use "git add" and/or "git commit -a")
  1. remove the wasi api:

https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/lib-wasi-threads/stress-test/stress_test_threads_creation.c#L67-L71

  1. run the test demo:

./tools/configure.sh sabre-6quad/nsh

$ qemu-system-arm -semihosting -M sabrelite -m 1024 -smp 4 -nographic -kernel ./nuttx

NuttShell (NSH) NuttX-10.4.0
nsh> hello
Spawning stress test is 10% finished
Spawning stress test is 20% finished
Spawning stress test is 30% finished
Spawning stress test is 40% finished
Spawning stress test is 50% finished
Spawning stress test is 60% finished
Spawning stress test is 70% finished
Spawning stress test is 80% finished
Spawning stress test is 89% finished
Spawning stress test finished successfully executed 50000 threads with retry ratio 1.000160
nsh> free
                   total       used       free    maxused    maxfree  nused  nfree
        Umem: 1065190036      10132 1065179904    1247684 1065166264     24      3
nsh> hello
Spawning stress test is 10% finished
Spawning stress test is 20% finished
Spawning stress test is 30% finished
Spawning stress test is 40% finished
Spawning stress test is 50% finished
Spawning stress test is 60% finished
Spawning stress test is 70% finished
Spawning stress test is 80% finished
Spawning stress test is 89% finished
Spawning stress test finished successfully executed 100000 threads with retry ratio 1.000120
nsh> free
                   total       used       free    maxused    maxfree  nused  nfree
        Umem: 1065190036      10132 1065179904    1247684 1065166264     24      3
nsh> 

@HongxiaWangSSSS
Copy link
Author

Hi,

Thank you for the test !!!🙇‍
Yes, I also found the exit and memory leak only occurs when built this sample into WASM/AOT, and during the run time, it continued malloc until exited and never released.
Maybe I should investigate the pthread_join's implementation in WASI-LIBC side.
Anyway, thank you for your explanation~ If there is any progress I will update.

BR,
Hongxia

@anchao
Copy link
Contributor

anchao commented Apr 8, 2024

@HongxiaWangSSSS

I probably know the root cause. after the introduction of #11898, the detached thread will not destroy the joininfo. In your test case, a large number of pthreads will be created with detached attribute, so there will be many joininfo pending in the task group,each of joininfo is to ensure that pthread_join() could gets the correct return value:

joining a detached/canceled thread should return EINVAL, not ESRCH

apache/nuttx-apps#2329

https://github.com/apache/nuttx/blob/master/sched/pthread/pthread_join.c#L92-L102

since many of joininfo will consume too much memory, I think this is not friendly to embedded MCU devices,maybe we can keep the semantics before #11898 was introduced, @xiaoxiang781216 how do you think?

@xiaoxiang781216
Copy link
Contributor

thread should release joininfo after detaching, since nobody will call pthread_join.

@anchao
Copy link
Contributor

anchao commented Apr 9, 2024

@HongxiaWangSSSS please try PR #12106

@HongxiaWangSSSS
Copy link
Author

yes, it worked!!
Thank you @anchao for the quick action.👍
Let me close the issue~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants