Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OS_API_Init() failure due to 0 stack size #525

Closed
adam-stamand opened this issue Jun 26, 2020 · 18 comments · Fixed by #528 or #529
Closed

OS_API_Init() failure due to 0 stack size #525

adam-stamand opened this issue Jun 26, 2020 · 18 comments · Fixed by #528 or #529
Assignees
Labels
Milestone

Comments

@adam-stamand
Copy link

Describe the bug
OS_API_Init() fails on generic-linux due to a stack size of 0 being used for the console task.

To Reproduce

  1. Build the provided example using the 'generic-linux' BSP.
  2. Execute the provided example.

Expected behavior
The three test tasks should execute.

Actual behavior
OS_API_Init() fails with the following error message (debug messages enabled):

OS_Posix_InternalTaskCreate_Impl():473:pthread_attr_setstacksize error in OS_TaskCreate: Invalid argument

Code snips
The error occurs on the following call to pthread stack size in OS_Posix_InternalTaskCreate_Impl():

return_code = pthread_attr_setstacksize(&custom_attr, stacksz);

The reason it fails is because the stacksz is set to zero in OS_ConsoleCreate_Impl():

return_code = OS_Posix_InternalTaskCreate_Impl(&consoletask, OS_CONSOLE_TASK_PRIORITY, 0, OS_ConsoleTask_Entry, local_arg.opaque_arg);

System observed on:

  • Hardware: Dell Precision 7540 Laptop
  • OS: WSL2 Ubuntu 18.04.4 LTS
  • Versions: OSAL master

Additional context
This issue is resolved by using a stack size of PTHREAD_STACK_MIN instead of 0:

return_code = OS_Posix_InternalTaskCreate_Impl(&consoletask, OS_CONSOLE_TASK_PRIORITY, PTHREAD_STACK_MIN, OS_ConsoleTask_Entry, local_arg.opaque_arg);

Reporter Info
Adam St. Amand

@skliper skliper added the bug label Jun 29, 2020
@jphickey jphickey self-assigned this Jun 29, 2020
@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

Related to #508. Both console and timebase:

return_code = OS_Posix_InternalTaskCreate_Impl(&local->handler_thread, 0, 0, OS_TimeBasePthreadEntry, arg.opaque_arg);

return_code = OS_Posix_InternalTaskCreate_Impl(&consoletask, OS_CONSOLE_TASK_PRIORITY, 0,
OS_ConsoleTask_Entry, local_arg.opaque_arg);

@jphickey
Copy link
Contributor

I will look into this..... Not clear why this didn't show up when testing #506.

@jphickey
Copy link
Contributor

Might have to rethink the previous change which removed the stack size tweaking .... Reviewing the manpage of pthread_attr_setstacksize again, there are a couple caveats listed here:

pthread_attr_setstacksize() can fail with the following error:
       EINVAL The stack size is less than PTHREAD_STACK_MIN (16384) bytes.
       On some systems, pthread_attr_setstacksize() can fail with the error EINVAL if stacksize is not a multiple of the system page size.

So while I still think we should NOT be accepting zero as a stack size, probably cannot just pass a nonzero value straight through due to the additional requirements of pthread stack size.

@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

I will look into this..... Not clear why this didn't show up when testing #506.

Yeah, I don't follow why it didn't show up either. Just rebuilt the unit tests and was digging in a bit but no answer yet. A bit concerning...

@jphickey
Copy link
Contributor

I will look into this..... Not clear why this didn't show up when testing #506.

Yeah, I don't follow why it didn't show up either. Just rebuilt the unit tests and was digging in a bit but no answer yet. A bit concerning...

I have an explanation, actually...

The reason the previous testing missed this is because we run everything as non-root using the "permissive" mode setting when running CI, as well as myself with my own local testing. Only when running as "root" in a dedicated VM does the error show up. Enhanced CI should include this mode of operation....

@jphickey
Copy link
Contributor

The stack size setting code is actually in the same block as the scheduling policy setting. So if you run as non-root with permissive mode, it skips setting the stack size too (which might itself be a different bug - seems like stack size should still be applicable even if non-root).

@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

Wouldn't be that hard to add one non-permissive test to the matrix in travis also.

@jphickey
Copy link
Contributor

Wouldn't be that hard to add one non-permissive test to the matrix in travis also.

Actually doesn't really need to change the matrix - technically do not need to rebuild without OSAL_CONFIG_DEBUG_PERMISSIVE_MODE - only need to run it as a user that has permission to set task priority. The permissive mode is just a fallback, in that it doesn't error out if the priorty setting fails.

Clearly, the admins of Travis CI has not granted random users this permission (understandably) so this will have to be limited to our own local testing environment.

@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

Sudo is enabled on the Travis Ubuntu environments. We did this in the past.

@joelsherrill
Copy link

Strictly following POSIX, pthread_attr_setstacksize() will fail if stack_size is less than the minimum stack size.

FWIW RTEMS doesn't follow this philosophy on at least the Classic API and will give the thread a stack of minimum size if the call is less than or equal to the minimum. We judged the ability to set a minimum stack size in configuration and have all the thread stacks increase in size without touching code as a desirable behavior.

Personally, I think the POSIX requirement is a portability barrier. Your code could specify a low requested stack size but the pthreads implementation could have something large and your code would break. It should be treated as a floor on the allocated stack size.

@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

And yes, we should add to the matrix. Changing everything to a user with elevated privileges would then miss the case without elevated privileges. It's not clear to me why we'd drop the non-elevated case.

@jphickey
Copy link
Contributor

Sudo is enabled on the Travis Ubuntu environments. We did this in the past.

Hmm, then maybe it can be as simple as executing at least one of the existing matix builds in "sudo" - worth a shot at least. My suggestion would be to run the debug build as a normal non-root user, and release build as sudo. Prefer not to rebuild the code again if we don't have to as this consumes resources and extends the test time.

@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

Sudo is enabled on the Travis Ubuntu environments. We did this in the past.

Hmm, then maybe it can be as simple as executing at least one of the existing matix builds in "sudo" - worth a shot at least. My suggestion would be to run the debug build as a normal non-root user, and release build as sudo. Prefer not to rebuild the code again if we don't have to as this consumes resources and extends the test time.

Adding to the matrix does not extend test time. Unless the test added is longer than all the rest... That's the point of the matrix.

@jphickey
Copy link
Contributor

Adding to the matrix does not extend test time. Unless the test added is longer than all the rest... That's the point of the matrix.

Aren't projects limited with respect to the total time they can consume for the CI scripts running in the cloud? That was my understanding of the free-to-use cloud environment we are using, anyway.

Running the tests again in parallel may not consume extra wall time but will consume extra CPU time, especially if another code rebuild is involved. That would be the main reason for pursuing a lighter-weight approach at least, but maybe it is not a concern.

@skliper
Copy link
Contributor

skliper commented Jun 29, 2020

Adding to the matrix does not extend test time. Unless the test added is longer than all the rest... That's the point of the matrix.

Aren't projects limited with respect to the total time they can consume for the CI scripts running in the cloud? That was my understanding of the free-to-use cloud environment we are using, anyway.

Running the tests again in parallel may not consume extra wall time but will consume extra CPU time, especially if another code rebuild is involved. That would be the main reason for pursuing a lighter-weight approach at least, but maybe it is not a concern.

There's a jobs limit, and a time limit per job. The elevated privileges build addition does not put us at risk of exceeding either of these.

jphickey added a commit to jphickey/osal that referenced this issue Jun 29, 2020
The pthread_attr_setstacksize() function stipulates that it
may fail if the user-supplied stack size is not at least
PTHREAD_STACK_MIN and also possibly a multiple of page size.

This partially reverts previous PR nasa#508 and adds back rounding
up to PTHREAD_STACK_MIN and also adds rounding up to a system
page size.  However the check for zero stack still remains
at the shared level so attempts to create a task with zero
stack will still fail.  This allows internal helper threads
to be created with a default minimum stack size, however.
@mlouielu
Copy link

mlouielu commented Jun 30, 2020

I didn't saw sudo: true in travis.yml: https://travis-ci.com/github/nasa/osal/jobs/355530024/config

Also, not sure why the test will pass when running with no-root user but the setschedparam failed (I expected the test should failed by this):

$ ctest -R bin-sem-flush-test --verbose
...
45: [ PASS] 01.017 bin-sem-flush-test.c:267 - Task 3 failures = 0
45: [ PASS] 01.018 bin-sem-flush-test.c:279 - Task 1 delete Rc=0
45: [ PASS] 01.019 bin-sem-flush-test.c:282 - Task 2 delete Rc=0
45: [ PASS] 01.020 bin-sem-flush-test.c:285 - Task 3 delete Rc=0
45: [  END] 01 BinSemFlushTest      TOTAL::20    PASS::20    FAIL::0      MIR::0      TSF::0      N/A::0   
45: 
45: COMPLETE: 1 tests Segment(s) executed
45: 
45: OS_BSP_Initialize():Maximum user msg queue depth = 10
45: OS_Posix_GetSchedulerParams():188:Policy 1: available, min-max: 1-99
45: OS_Posix_GetSchedulerParams():188:Policy 2: available, min-max: 1-99
45: OS_Posix_TaskAPI_Impl_Init():374:Selected policy 2 for RT tasks, root task = 99
45: OS_Posix_TaskAPI_Impl_Init():389:Could not setschedparam in main thread: Operation not permitted (1)
1/1 Test #45: bin-sem-flush-test ...............   Passed    6.05 sec

But if using sudo ctest ...:

$ sudo ctest -R bin-sem-flush-test --verbose
...
45: [  TTF] 01.019 bin-sem-flush-test.c:282 - Task 2 delete Rc=-16
45: [  TTF] 01.020 bin-sem-flush-test.c:285 - Task 3 delete Rc=-16
45: [  END] 01 BinSemFlushTest      TOTAL::20    PASS::11    FAIL::3      MIR::0      TSF::3      N/A::0   
45: 
45: COMPLETE: 1 tests Segment(s) executed
45: 
45: OS_Posix_GetSchedulerParams():188:Policy 1: available, min-max: 1-99
45: OS_Posix_GetSchedulerParams():188:Policy 2: available, min-max: 1-99
45: OS_Posix_TaskAPI_Impl_Init():374:Selected policy 2 for RT tasks, root task = 99
45: OS_Posix_InternalTaskCreate_Impl():470:MINIMUM: 16384                                         # OS_DEBUG added
45: OS_Posix_InternalTaskCreate_Impl():471:Stack size: 0
45: OS_Posix_InternalTaskCreate_Impl():475:pthread_attr_setstacksize error in OS_TaskCreate: Invalid argument
45: OS_Posix_InternalTaskCreate_Impl():470:MINIMUM: 16384
45: OS_Posix_InternalTaskCreate_Impl():471:Stack size: 4096
45: OS_Posix_InternalTaskCreate_Impl():475:pthread_attr_setstacksize error in OS_TaskCreate: Invalid argument
45: OS_Posix_InternalTaskCreate_Impl():470:MINIMUM: 16384
45: OS_Posix_InternalTaskCreate_Impl():471:Stack size: 4096
45: OS_Posix_InternalTaskCreate_Impl():475:pthread_attr_setstacksize error in OS_TaskCreate: Invalid argument
45: OS_Posix_InternalTaskCreate_Impl():470:MINIMUM: 16384
45: OS_Posix_InternalTaskCreate_Impl():471:Stack size: 4096
45: OS_Posix_InternalTaskCreate_Impl():475:pthread_attr_setstacksize error in OS_TaskCreate: Invalid argument
1/1 Test #45: bin-sem-flush-test ...............***Failed    6.00 sec

0% tests passed, 1 tests failed out of 1

@mlouielu
Copy link

So, for non-root user, it will failed at OS_Posix_TaskAPI_Impl_Init(), so the EnableTaskPriorities will be false.

When doing OS_Posix_InternalTaskCreate_Impl(), since non-root user's EnableTaskPriorities is false, it will not set the pthread attributes. But root user will setup pthread attributes, so it failed at this moment.

jphickey added a commit to jphickey/osal that referenced this issue Jun 30, 2020
The pthread_attr_setstacksize() function stipulates that it
may fail if the user-supplied stack size is not at least
PTHREAD_STACK_MIN and also possibly a multiple of page size.

This partially reverts previous PR nasa#508 and adds back rounding
up to PTHREAD_STACK_MIN and also adds rounding up to a system
page size.  However the check for zero stack still remains
at the shared level so attempts to create a task with zero
stack will still fail.  This allows internal helper threads
to be created with a default minimum stack size, however.
@jphickey
Copy link
Contributor

@mlouielu there is a configuration setting OSAL_CONFIG_DEBUG_PERMISSIVE_MODE that allows operation to continue even if certain privileged items fail, like setting the scheduling priority. The intent of this feature is to make debugging easier by not requiring admin privileges to run the code. This setting is enabled when building with sample config for "native" arch, hence why it doesn't fail when running as non-root. The fact that stack size was mixed with this was incorrect (fixed in #528).

astrogeco added a commit that referenced this issue Jul 2, 2020
Fix #525, ensure POSIX stack size meets requirements
@astrogeco astrogeco added this to the v5.1.0 milestone Sep 21, 2020
jphickey pushed a commit to jphickey/osal that referenced this issue Aug 10, 2022
jphickey pushed a commit to jphickey/osal that referenced this issue Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
6 participants