Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libc/semaphore:sem_init change defult protocol #5070

Merged
merged 4 commits into from
Oct 22, 2022

Conversation

anjiahao1
Copy link
Contributor

@anjiahao1 anjiahao1 commented Dec 24, 2021

Semaphores are generally used in two ways, one is to count resources or notify event, the other is to use as a mutual exclusion lock
In Nuttx, the default behavior of semaphores is mutual exclusion lock. I think it is better to change the default behavior to count resources or notify event, and it will be more general in Linux programs.

Why would I do this:
In porting some programs which confirm POSIX standard, the semaphore is used to notify event, but its default behavior is a mutual exclusion lock(nuttx): One thread post the semaphore to indicate something happen, another thread gets the semaphore and exits. when the first thread post the semaphore again which will cause a crash when CONFIG_PRIORITY_INHERITANCE is enable, because htcb field in semholder_s point to a freed memory.

so, it's better to change the default sem behavior to no priority inheritance.

@anjiahao1 anjiahao1 force-pushed the sem_init branch 2 times, most recently from 44f2d72 to 9d14eb6 Compare December 24, 2021 09:13
@anjiahao1 anjiahao1 force-pushed the sem_init branch 2 times, most recently from b61bc35 to b48cac1 Compare December 24, 2021 09:23
sched/semaphore/sem_holder.c Outdated Show resolved Hide resolved
libs/libc/semaphore/sem_getprotocol.c Outdated Show resolved Hide resolved
@anjiahao1 anjiahao1 force-pushed the sem_init branch 2 times, most recently from 9638e8f to 076bf25 Compare December 24, 2021 10:29
@davids5 davids5 self-requested a review December 24, 2021 14:39
Copy link
Contributor

@davids5 davids5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive breaking change. Please do not merge this until there is time after the holidays to get proper feed back from the reviews listed.

@xiaoxiang781216
Copy link
Contributor

This is a massive breaking change. Please do not merge this until there is time after the holidays to get proper feed back from the reviews listed.

Yes, this is a partial change to show the issue and potential modification. If the community agree the fix after review, all [nx]sem_init need review and modify.

@xiaoxiang781216 xiaoxiang781216 marked this pull request as draft December 24, 2021 15:10
@xiaoxiang781216 xiaoxiang781216 added the Standards NuttX application interfaces must compy to standards label Dec 24, 2021
@hartmannathan
Copy link
Contributor

hartmannathan commented Dec 24, 2021

This will also break many things for downstream projects using NuttX.

We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.

To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:

  • If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.
  • If this is Linux behavior and is in contradiction to POSIX, then we should not do it.

Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.

include/semaphore.h Outdated Show resolved Hide resolved
Copy link
Contributor

@hartmannathan hartmannathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned elsewhere, there are likely other parts of our tree that need review/changes.

Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Dec 24, 2021

This will also break many things for downstream projects using NuttX.

Only when the project enable CONFIG_PRIORITY_INHERITANCE, otherwise no difference before and after apply this patch.

We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.

Sure.

To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:

  • If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.

As far as I know, POSIX never mention the priority inheritance in spec, so it's the special behavior implemented by NuttX to improve the real time capability. But, sem_init without nxsem_set_protocol(SEM_PRIO_NONE) activate the priority inheritance by default which make the follow simple POSIX compliant program crash after the second sem_post:

#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>

static sem_t g_sem;

static void *thread1_cb(void *arg)
{
  sem_wait(&g_sem);
  return NULL;
}

static void *thread2_cb(void *arg)
{
  sleep(2);
  sem_wait(&g_sem);
  return NULL;
}

static void *thread3_cb(void *arg)
{
  sleep(1);
  sem_post(&g_sem);
  sleep(2);
  sem_post(&g_sem);
  return NULL;
}

int main(int argc, char *argv[])
{
  sem_init(&g_sem);

  thread1 = pthread_create(thread1_cb...);
  thread2 = pthread_create(thread2_cb...);
  thread3 = pthread_create(thread3_cb...);

  pthread_join(&thread1);
  pthread_join(&thread2);
  pthread_join(&thread3);

  sem_destroy(&g_sem);
  return 0;
}

So, it's very dangerous to enable priority inheritance without explicit indicator(e.g. a function call or an initial flag).

  • If this is Linux behavior and is in contradiction to POSIX, then we should not do it.

Of course.

Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.

Sure.

As mentioned elsewhere, there are likely other parts of our tree that need review/changes.

Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.

Ok, let's wait your research.

@davids5
Copy link
Contributor

davids5 commented Dec 24, 2021

Drivers that use the semaphores for signaling should not be involved with priority inheritance (when enabled). That use case was assumed to be the non default.

It seams to me this change can be done with adding an Kconfig option for the default and leave it as it was. But allow the default to be changed if someone is using Linux code semantics.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Dec 25, 2021

Drivers that use the semaphores for signaling should not be involved with priority inheritance (when enabled). That use case was assumed to be the non default.

Why non default? semaphore use as signaling is a normal practice in POSIX. On the other hand, most program use POSIX pthread mutex as lock, so semaphore seldom use as lock except NuttX kernel.
I have to say that it's very bad choice to enable priority inheritance by default when priority inheritance get implemented, because:

  1. It's an optional feature, which mean that the code which use sem_t but not call nxsem_set_protocol(NuttX specific), the behavior will change with/without enable this feature.
  2. Since priority inheritance is the default setting, which mean the wrong usage will crash the system. It's more serious than the priority inversion.

It seams to me this change can be done with adding an Kconfig option for the default and leave it as it was.

Kconfig is good to disable/enable the optional feature, but it isn't a good solution here. Do you want sem_t change it's behavior by an option? And all place which call sem_init/nxsem_init need check the option and call nxsem_set_protocol with the different value.

But allow the default to be changed if someone is using Linux code semantics.

First, it isn't Linux semantics. Second, my demo show that isn't a minor issue because the program will crash the system.

@pkarashchenko
Copy link
Contributor

pkarashchenko commented Dec 29, 2021

Seems that disabling priority inheritance by default is the right way of doing things. My arguments are that by default priority inheritance is used to struggle with resource guarding issue (mutual exclusion use case) and does not have much sense in case of event signaling use case. The POSIX semaphores however have have more wide range then mutual exclusion use case and it is more a coincidence that phtread_mutexs are implemented on top of POSIX semaphores in NuttX (I just want to emphasize that phtread_mutexs could have their own implementation as an alternative and are not tight together with semaphores in general). pthread_mutexs will still have priority inheritance enabled by default and that is fine because I truly believe that when NuttX user enables priority inheritance that is the exactly use case that needs to be handled by this feature.
Summarizing above: my vote is to change the default protocol for POSIX semaphores and proceed with this change.

@zhaoxiu-zeng
Copy link
Contributor

This will also break many things for downstream projects using NuttX.

Only when the project enable CONFIG_PRIORITY_INHERITANCE, otherwise no difference before and after apply this patch.

We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.

Sure.

To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:

  • If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.

As far as I know, POSIX never mention the priority inheritance in spec, so it's the special behavior implemented by NuttX to improve the real time capability. But, sem_init without nxsem_set_protocol(SEM_PRIO_NONE) activate the priority inheritance by default which make the follow simple POSIX compliant program crash after the second sem_post:

#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>

static sem_t g_sem;

static void *thread1_cb(void *arg)
{
  sem_wait(&g_sem);
  return NULL;
}

static void *thread2_cb(void *arg)
{
  sleep(2);
  sem_wait(&g_sem);
  return NULL;
}

static void *thread3_cb(void *arg)
{
  sleep(1);
  sem_post(&g_sem);
  sleep(2);
  sem_post(&g_sem);
  return NULL;
}

int main(int argc, char *argv[])
{
  sem_init(&g_sem);

  thread1 = pthread_create(thread1_cb...);
  thread2 = pthread_create(thread2_cb...);
  thread3 = pthread_create(thread3_cb...);

  pthread_join(&thread1);
  pthread_join(&thread2);
  pthread_join(&thread3);

  sem_destroy(&g_sem);
  return 0;
}

After successfully sem_wait, the task becomes the semphore's holder. Then the task exit, but not be removed from the semphore's holder list, so it will crash if visit this holder in the future.
I will do a commit to fix this problem.

So, it's very dangerous to enable priority inheritance without explicit indicator(e.g. a function call or an initial flag).

  • If this is Linux behavior and is in contradiction to POSIX, then we should not do it.

Of course.

Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.

Sure.

As mentioned elsewhere, there are likely other parts of our tree that need review/changes.
Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.

Ok, let's wait your research.

@pkarashchenko
Copy link
Contributor

@zhaoxiu-zeng how do you plan to fix this? I was looking into the code as well and was looking for a solution, but didn't found a lightweight solution. The code only idea that I had was to create a global list of semaphores that have at least one holder and iterate that list on task exit in order to find a semaphore that contain task in the holder list or create a list of semaphores in task control block and add semaphore to that list when task is added as a semaphore holder. But both seems to me quite "heavy" in terms of both memory and performance.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Dec 30, 2021

@anjiahao1 has a simple fix which change tcb_s to pid, so we can detect the abnormal case and remove the orphan node easily. But, we want to fix the root cause first, and then add the safe guard in case of the wrong usage.

@pkarashchenko
Copy link
Contributor

pkarashchenko commented Dec 30, 2021

@xiaoxiang781216 changing tcb_s to pid adds at least some reliability, but still seems to be not bulletproof. In such case I would suggest to store both tcb_s and pid, so we can use next mechanism to improve validity check: if (pholder->htcb == nxsched_get_tcb(pholder->pid)), so the probability that same memory block will be allocated for a task and same pid will be assigned is relatively low. In this case we can insert a sanity check and "dead" nodes clean-up into each semaphore operation API call.

@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Oct 19, 2022

this is the last patchset to fix the default value of priority inheritance, please review them, @hartmannathan @davids5 @masayuki2009 @pkarashchenko . All related changes are recorded in #5184.

arch/arm/src/kinetis/kinetis_i2c.c Show resolved Hide resolved
include/nuttx/mutex.h Outdated Show resolved Hide resolved
@anjiahao1 anjiahao1 force-pushed the sem_init branch 2 times, most recently from d9b38f7 to 3f79ae6 Compare October 19, 2022 15:59
Copy link
Contributor

@hartmannathan hartmannathan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, mutual exclusion semaphores had priority inheritance enabled by default; now it is opt-in.

So I wonder if the opposite of commit 48d8383 ("sem:remove sem default protocl") should be done also: Locate all mutual exclusion semaphores and call (nx)sem_set_protocol(..., SEM_PRIO_INHERIT)?

Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
Documentation/reference/os/mutex.rst Outdated Show resolved Hide resolved
@xiaoxiang781216
Copy link
Contributor

xiaoxiang781216 commented Oct 20, 2022

Before, mutual exclusion semaphores had priority inheritance enabled by default; now it is opt-in.

So I wonder if the opposite of commit 48d8383 ("sem:remove sem default protocl") should be done also: Locate all mutual exclusion semaphores and call (nx)sem_set_protocol(..., SEM_PRIO_INHERIT)?

@hartmannathan it was done by converting all mutual exclusion semaphores to mutex lock in this PR #6320.

@anjiahao1
Copy link
Contributor Author

anjiahao1 commented Oct 20, 2022

#6965 this patch repaced all sem use as lock with mutex

@hartmannathan
Copy link
Contributor

#6965 this patch repaced all sem use as lock with mutex

Great! This is a much cleaner design. Thank you!

@anjiahao1 anjiahao1 force-pushed the sem_init branch 2 times, most recently from b3a62ee to 315ee8f Compare October 20, 2022 07:31
@xiaoxiang781216 xiaoxiang781216 merged commit 1c416b1 into apache:master Oct 22, 2022
@masayuki2009
Copy link
Contributor

masayuki2009 commented Oct 22, 2022

@xiaoxiang781216
I think we need to merge apache/nuttx-apps#1358 as well.
Otherwise, qemu-intel64:ostest would fail.

@hartmannathan
Copy link
Contributor

I have documented the mitigation entry for the next release's Release Notes here: https://cwiki.apache.org/confluence/display/NUTTX/NuttX+11.1.0#NuttX11.1.0-DefaultSemaphoreProtocolChanged

Please check and let me know if there are any mistakes!

@xiaoxiang781216
Copy link
Contributor

The description looks good, @hartmannathan thanks.

@jerpelea jerpelea added this to To-Add in Release Notes - 12.0.0 Dec 29, 2022
@jerpelea jerpelea moved this from To-Add to Added in Release Notes - 12.0.0 Jan 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change This change requires a mitigation entry in the release notes. Standards NuttX application interfaces must compy to standards
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Replace all place which use semaphore as lock with mutex wrapper
7 participants