Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential for recursive loop if Event TLM MsgId is incorrect #1952

Closed
jphickey opened this issue Sep 16, 2021 · 4 comments · Fixed by #1954 or #2001
Closed

Potential for recursive loop if Event TLM MsgId is incorrect #1952

jphickey opened this issue Sep 16, 2021 · 4 comments · Fixed by #1954 or #2001
Assignees
Milestone

Comments

@jphickey
Copy link
Contributor

jphickey commented Sep 16, 2021

Describe the bug
If the software bus CFE_SB_TransmitMsg() fails to send a message due to a validation failure, it will send an event through EVS.

Event Services, in turn, generates a message (LongEventTlm/ShortEventTlm) which is broadcast via software bus.

However if the Event Telemetry MID value (CFE_EVS_LONG_EVENT_MSG_MID) is not set correctly (or some other EVS config is bad) such that EVS tries to send event messages which do not validate, a recursive loop ensues and the software eventually segfaults.

To Reproduce
(Mis)configure CFE_EVS_LONG_EVENT_MSG_MID to a value which will not pass the CFE_SB_TransmitMsgValidate tests. Run CFE, it will get in a recursive loop and eventually segfault/crash as soon as any app sends an event.

Expected behavior
Should not do a recursive loop

System observed on:
Ubuntu

Additional context
Should be protection against recursive event loops like this, where if an event fails to send, it should not cause another event to be sent. This protection appears this is not working correctly right now, at least not for EVS messages.

NOTE: To be absolutely clear - the issue described here is a mis-configuration issue. It will not happen in a properly configured system, so long as EVS generates messages which are "transmittable". BUT - there are other events that might be triggered by a CFE_SB_TransmitMsg call, such as a MsgLim error, and its not clear of a similar recursive loop might be possible there (have not tested/investigated).

Reporter Info
Joseph Hickey, Vantage Systems, Inc.

@jphickey jphickey changed the title Potential for recursive loop of Event TLM MsgId is incorrect Potential for recursive loop if Event TLM MsgId is incorrect Sep 16, 2021
@jphickey jphickey added the bug label Sep 16, 2021
@jphickey
Copy link
Contributor Author

It seems the "recursive event" protection is only enabled for CFE_SB_SEND_NO_SUBS_EID. Not sure if this is a historical thing (i.e. if there is a reason this was only done for that particular event) or something recently introduced/oversight with the caelum changes.

See Here:

/* Determine if event can be sent without causing recursive event problem */
if (CFE_SB_RequestToSendEvent(TskId, CFE_SB_SEND_NO_SUBS_EID_BIT) == CFE_SB_GRANTED)
{
CFE_EVS_SendEventWithAppID(CFE_SB_SEND_NO_SUBS_EID, CFE_EVS_EventType_INFORMATION,
CFE_SB_Global.AppId, "No subscribers for MsgId 0x%x,sender %s",
(unsigned int)CFE_SB_MsgIdToValue(*MsgIdPtr),
CFE_SB_GetAppTskName(TskId, FullName));
/* clear the bit so the task may send this event again */
CFE_SB_FinishSendEvent(TskId, CFE_SB_SEND_NO_SUBS_EID_BIT);
} /* end if */

However the same "RequestToSendEvent" is not done for the other events that this function might generate.

For a short term fix, any+all events that might be generated from the CFE_SB_TransmitMsg path should be wrapped with this same check.

However, from an architectural standpoint, it doesn't make a whole lot of sense for this recursion protection to be in SB at all. It's really an EVS problem, and SB seems to be protecting EVS from itself here.

Rather, EVS should track its per-task state, and if a task tries to send the same event ID while in the process of sending that event ID, it should drop it itself, to break the loop.

@jphickey
Copy link
Contributor Author

A skim through the other functions in this path (CFE_SB_TransmitMsg, CFE_SB_BroadcastBufferToRoute) show that other events also do have this protection:

  • CFE_SB_GET_BUF_ERR_EID
  • CFE_SB_MSGID_LIM_ERR_EID
  • CFE_SB_Q_FULL_ERR_EID
  • CFE_SB_Q_WR_ERR_EID

So the problem is likely limited to these Event IDs specifically from CFE_SB_TransmitMsgValidate:

  • CFE_SB_SEND_BAD_ARG_EID
  • CFE_SB_SEND_INV_MSGID_EID
  • CFE_SB_MSG_TOO_BIG_EID

All of these EIDs point to a configuration issue as opposed to a runtime issue, which reduces the severity of this issue.

@jphickey
Copy link
Contributor Author

But important to note in a longer term/system level viewpoint, since Caelum allows components to be overridden/modified, it is not really appropriate for SB to "know" that EVS calls CFE_SB_TransmitMsg() as part of its work and therefore specifically protect those EIDs from recursion. For example, EVS also calls CFE_MSG_Init() and CFE_ES_GetAppName() (just to name a couple). If any of these generate an event, the same issue exists.

It really calls for EVS to protect itself from recursion in a more generic sense, to ensure that ANY event from ANY app will not get itself into a loop and crash the system.

jphickey added a commit to jphickey/cFE that referenced this issue Sep 16, 2021
Adds CFE_SB_RequestToSendEvent/CFE_SB_FinishSendEvent wrappers around
all events generated by CFE_SB_TransmitMsgValidate.  This is avoids
the potential for a recursive loop if configured improperly.
@astrogeco astrogeco added this to the cFS-Draco milestone Sep 16, 2021
@skliper
Copy link
Contributor

skliper commented Sep 19, 2021

I'm in favor of treating the current PR as an optional patch if folks want it (since it's a misconfiguration issue), but plan for protecting from within EVS in a future release.

jphickey added a commit to jphickey/cFE that referenced this issue Sep 29, 2021
Adds CFE_SB_RequestToSendEvent/CFE_SB_FinishSendEvent wrappers around
all events generated by CFE_SB_TransmitMsgValidate.  This is avoids
the potential for a recursive loop if configured improperly.
jphickey added a commit to jphickey/cFE that referenced this issue Sep 29, 2021
Adds CFE_SB_RequestToSendEvent/CFE_SB_FinishSendEvent wrappers around
all events generated by CFE_SB_TransmitMsgValidate.  This is avoids
the potential for a recursive loop if configured improperly.
@astrogeco astrogeco self-assigned this Oct 28, 2021
astrogeco added a commit to astrogeco/cFE that referenced this issue Oct 29, 2021
astrogeco added a commit to astrogeco/cFE that referenced this issue Nov 2, 2021
jphickey added a commit to jphickey/cFE that referenced this issue Nov 2, 2021
Adds CFE_SB_RequestToSendEvent/CFE_SB_FinishSendEvent wrappers around
all events generated by CFE_SB_TransmitMsgValidate.  This is avoids
the potential for a recursive loop if configured improperly.
astrogeco added a commit to astrogeco/cFE that referenced this issue Nov 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants