Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions / dependencies between CFE core apps #73

Closed
skliper opened this issue Sep 30, 2019 · 14 comments
Closed

Race conditions / dependencies between CFE core apps #73

skliper opened this issue Sep 30, 2019 · 14 comments
Labels
Milestone

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2019

The "core" applications have significant dependencies between them that need to be more pro-actively satisfied. There are some race conditions during the startup phase that can pose some serious problems if things are not executed in the right order.

The summary of what happened is below, but here is a list of the basic problems:

  • Start up code should synchronize at least the "core" applications and ensure that each one has reached it's respective "RunLoop" before starting the next one, regardless of what the platform config sets the priority to (likely depends on CFE ES "StartupSyncSemaphore" subject to multiple race conditions #71).
  • EVS_IsFiltered should range check before doing the table lookups based on passed-in values
  • CFE SB and EVS (at least) populate different values into their own "AppID" global variable before initialization. SB does nothing (0 by default, which is in fact a valid AppID for a different app) but EVS initializes this to 0xFFFFFFFFF, which has very ill-effects if actually used for something, and nothing really checks for this.

For those interested, here are the details of the specific sequence of events discovered when debugging application startup on the Microblaze processor used by the EVA team at GRC:

  1. As dictated by the table within "cfe_es_objtab.c", the CFE core applications are started (Tasks Created) in the order of EVS, SB, ES, TIME, BL.
  2. In the default/example platform configuration, these have respective priorities of 61(EVS), 64(SB), 68(ES), 60(TIME), and 70(TBL).
  3. TIME task will run it's TaskMain first even though it is 4th in the start sequence.
  4. As part of this init sequence, it calls CFE_SB_CreatePipe() which in turn calls CFE_EVS_SendEventWithAppID() in several places (for errors as well as an unconditional "debug event" at the end). The AppId supplied is "CFE_SB.AppId" which is uninitialized since SB has not executed yet. In this case the value used is actually 0.
  5. In turn this eventually calls EVS_NotRegistered() (since CFE_EVS_TaskInit has not run) and then EVS_SendEvent() as part of that.
  6. EVS_SendEvent() calls EVS_IsFiltered() with the contents of CFE_EVS_GlobalData.EVS_AppId, which is also uninitialized but set to "0xFFFFFFFF", not zero like CFE_SB.AppId.
  7. This appID value is not range-checked by EVS_IsFiltered and ultimately segfaults and crashes CFE core.
@skliper skliper added this to the 6.4.2 milestone Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Imported from trac issue 42. Created by jphickey on 2015-05-06T17:04:36, last modified: 2016-08-11T15:39:40

@skliper skliper self-assigned this Sep 30, 2019
@skliper skliper added the bug label Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2015-05-07 10:41:17:

Working on a fix for this. There are two possible approaches to solving this, some CCB discussion might be good to decide which is best:

  1. Move more of the critical initialization bits into the "EarlyInit" calls. Some of this initialization work might be in the task itself due to the fact that it is dependent on the AppId (e.g. CFE_ES_GetAppID() needs to work). However, for the core applications, they are always present and always get the same AppIDs, these could be changed to a fixed enumeration for Early Init purposes so it would not depend on CFE_ES_GetAppId.

  2. Keep the initialization as part of the task itself, but use something like the existing startup sync semaphore (after fixing it) to wait for each CORE task to reach the run loop before starting the next task. This would only need to be done for the core tasks -- the other tasks already call WaitForStartupSync if they depend on another task already being initialized for part of their initialization.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2015-05-08 13:39:51:

DEPENDENCY NOTE: Trac #71 implements the first part of this fix, which is to synchronize the start up of the core applications.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2015-05-08 16:58:43:

Related PSP ticket: [https://babelfish.arc.nasa.gov/trac/cfs_psp/ticket/23]

I created this ticket in the PSP trac to capture the issue brought up by Chris Monaco during today's CCB meeting while discussing this ticket. In this case the PSP is possibly calling into the Timer 1Hz function before CFE TIME has been initialized.

This issue can be tracked/resolved separately. If any changes are necessary for this one, they will likely be done in the PSP.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2015-05-11 09:54:07:

Pushed commit [changeset:0c00114] for remainder of this fix

This commit makes two additional minor modifications to EVS:

  • During EVS_SendEvent, confirm that the global "EVS_AppID" value is within range before calling EVS_IsFiltered. Otherwise do nothing.
  • Set the EVS_AppID value as the last thing in CFE_EVS_TaskInit, so it can be assured that once the AppID is valid that all other initialization is also done.

Associated branch name is "trac-42-rangecheck-evs-appid"

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sstrege on 2015-05-15 16:47:47:

Approve

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by acudmore on 2015-05-18 15:40:11:

Approve

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jwilmot on 2015-05-19 10:15:17:

Approve

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sduran on 2015-05-19 13:46:30:

approve

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-05-19 13:47:46:

approve

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-05-19 13:52:53:

Note that you can use ![cfs_psp:23] to link to [cfs_psp:23] -- yay InterTrac ;)

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by glimes on 2015-05-22 10:51:16:

Commit [changeset:e08eada] has this fix, backported to master.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by wmoleski on 2015-06-16 13:37:12:

cFE 6.4.2.0 was tested using an MCP750 running vxworks 6.4. All build verification tests normally executed on a cFE release were successfully executed. An additional test was performed in an attempt to verify that the race conditions identified did not appear. The UART (minicom) output for the test is attached to trac #81.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by sstrege on 2016-08-11 15:39:18:

Merged to development on Wed May 20 09:39:24 2015 -0400

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant