Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #470, Binary sem task delete issues #472

Merged
merged 2 commits into from
May 26, 2020

Conversation

jphickey
Copy link
Contributor

@jphickey jphickey commented May 19, 2020

Describe the contribution
Corrects issue when a task waiting on a binary semaphore is deleted, it left the mutex in a locked state preventing other tasks from using the mutex.

This has 3 parts:

  1. A functional test enhancement to bin-sem-test which replicates the specific conditions for the observed bug to occur. It deletes the task calling OS_BinSemTake() and then attempts to use the semaphore after this. (without the rest of this change, it fails).
  2. Employ a pthread "cleanup handler" to handle the situation where a task is canceled during the pthread_cond_wait() call. This ensures that the mutex is unlocked as part of the cleanup, so other tasks may continue using the semaphore. This is the main fix as it directly addresses the root cause of the issue, which is that the mutex was left in a locked state.
  3. Change all initial mutex locking to be a finite "timed" wait rather than an infinite wait. In all cases, the condition variable is only held for brief periods of time and should be readily available; tasks should almost never block on this initial lock. If a task blocks for a long time, this considers the mutex "broken" and aborts, thereby avoiding deadlock. This is a "contingency" fix in that if an exception or signal or other unknown/unhandled async event occurs that leaves the mutex permanently locked, it at least does not deadlock the system and allows it to be restarted.

Fixes #470
Fixes nasa/cFE#701

Testing performed
Build and run all unit tests.

Confirm that including only the unit test change (item 1 above) reliably reproduces the failure. In this mode, bin-sem-test will get stuck when attempting to give the semaphore after task deletion.

Confirm that the bin sem changes (items 2 and 3 above) correct the issue. Note each one taken individually will avoid deadlock in a different way. Item 3 (timed mutex) alone will cause the subsequent calls to return OS_SEM_FAILURE rather than deadlock, so shutdown will continue and the test exits with a failed status rather than pending forever.

Sanity check CFE operation and CTRL+C handling - all works OK.

Expected behavior changes
Binary semaphores after task deletion continue to work as expected and are usable by other tasks.

System(s) tested on
Ubuntu 20.04

Additional context
Add any other context about the contribution here.

Third party code
If included, identify any third party code and provide text file of license

Contributor Info - All information REQUIRED for consideration of pull request
Joseph Hickey, Vantage Systems, Inc.

astrogeco and others added 2 commits May 13, 2020 11:49
Integration Candidate COMBINED 2020-04-29 and 2020-05-06
Corrects issue when a task waiting on a binary semaphore is
deleted, it left the mutex in a locked state preventing other
tasks from using the mutex.
@jphickey jphickey added the CCB:Ready Pull request is ready for discussion at the Configuration Control Board (CCB) label May 19, 2020
@jphickey
Copy link
Contributor Author

@excaliburtb this should resolve nasa/cFE#701

@jphickey jphickey requested a review from skliper May 19, 2020 14:43
@skliper skliper added the bug label May 20, 2020
@astrogeco
Copy link
Contributor

CCB 20200520 - APPROVED

@skliper skliper added CCB-20200520 and removed CCB:Ready Pull request is ready for discussion at the Configuration Control Board (CCB) labels May 20, 2020
@astrogeco astrogeco changed the base branch from master to integration-candidate May 26, 2020 13:46
@astrogeco astrogeco added IC-20200520 CCB:Approved Indicates code review and approval by community CCB labels May 26, 2020
@astrogeco astrogeco merged commit 22e1465 into nasa:integration-candidate May 26, 2020
@skliper skliper added this to the 5.1.0 milestone Jun 1, 2020
@jphickey jphickey deleted the fix-470-binsem-lock branch June 19, 2020 16:13
@astrogeco astrogeco removed the bug label Sep 22, 2020
jphickey pushed a commit to jphickey/osal that referenced this pull request Aug 10, 2022
- only missing two small sections which are not at all possible to cover
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CCB:Approved Indicates code review and approval by community CCB
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Binary Semaphore locked after thread cancellation cfe/SCH deadlocks on exit on Linux
3 participants