Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sched: Unify the main thread and pthread behaivour #1187

Merged
merged 2 commits into from
Jun 15, 2020

Conversation

xiaoxiang781216
Copy link
Contributor

@xiaoxiang781216 xiaoxiang781216 commented Jun 3, 2020

Summary

1.Make pthread_cleanup_[push|pop] callable from the main thread
2.Robust mutex also support in main thread now

Impact

Testing

nxstyle error can be ignored(mixed case and file rename):

include/stdlib.h:166:19: error: Mixed case identifier found
sched/task/task_atexit.c: fatal: Failed to open
sched/task/task_onexit.c: fatal: Failed to open

@xiaoxiang781216 xiaoxiang781216 changed the title sched: nxsig_abnormal_termination should terminate the whole task sched/group: Change group_kill_children's argument from task_tcb_s to tcb_s Jun 3, 2020
@xiaoxiang781216 xiaoxiang781216 changed the title sched/group: Change group_kill_children's argument from task_tcb_s to tcb_s sched: Main thread should support cleanup and robust mutex too Jun 5, 2020
@xiaoxiang781216
Copy link
Contributor Author

@patacongo could you give some feedback? so other committer can merge the code.

@xiaoxiang781216 xiaoxiang781216 changed the title sched: Main thread should support cleanup and robust mutex too sched: Unify the main thread and pthread behaivour Jun 7, 2020
@xiaoxiang781216
Copy link
Contributor Author

Does anyone can help review the change?

@btashton btashton self-requested a review June 10, 2020 04:38
Copy link
Contributor

@yamt yamt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR is complex.
can you consider to separate quick_exit into another PR?

* immediately, interrupting the thread with its processing."
*
* REVISIT: Does this rule apply to equally to both SIGKILL and
* SIGINT?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't understand this REVISIT comment. how signals are related here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is copied from sig_default, I will remove it.

@@ -85,6 +76,15 @@ int pthread_cancel(pthread_t thread)
pthread_exit(PTHREAD_CANCELED);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean to make pthread_cancel a cancellation point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour is same as before, the change is just an optimizition. If you look at nxnotify_cancellation, the main job is call nxsem_wait_irq/nxsig_wait_irq/nxmq_wait_irq but the code unnecessary if some thread call pthread_cancel on self because the thread is impossible in the wait state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think pthread_cancel(self) is supposed to terminate the thread in the case of deferred cancellation.
after your changes, it seems to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-reading the related document, you are right we need defer the termination even pthread_cancel self. I will drop this patch from PR.

****************************************************************************/

#define EXIT_NORMAL 0
#define EXIT_BYPASS 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you document what this bypasses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

* task group if this_task is a pthread.
*/

group_kill_children(rtcb);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary?
why is this rtcb, not dtcb?

Copy link
Contributor Author

@xiaoxiang781216 xiaoxiang781216 Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this necessary?

I think that It's dangerous to terminate the main thread but leave other threads in the task keep running. And other similar function also do the same thing(e.g. task_restart, nxsig_abnormal_termination, exit...).

why is this rtcb, not dtcb?

good catch, I will fix it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it dangerous? it's something i believe the most of pthread implementations allow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most people call task_delete mean to kill the whole task/process, just like you kill the process from task manager windows on Linux/Windows, but now this function just delete the main thread but let other threads in the same task keep running. And worse you can't kill other threads in this task by task_delete, then we don't have any API to terminate all threads in a task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR summary does not does not describe the changes of all of the 11 commits. This must be broken into separately traceable PRs.

Because nobody review my PR more than 7 days and some late patch depend on the previous one, I have to accumulate the patch in this PR. Anyway, I will remove the new patch from the list and send new PR when this PR get merged.

@@ -173,8 +165,7 @@ void pthread_cleanup_push(pthread_cleanup_t routine, FAR void *arg)
*/

sched_lock();
if ((tcb->cmn.flags & TCB_FLAG_TTYPE_MASK) == TCB_FLAG_TTYPE_PTHREAD &&
tcb->tos < CONFIG_PTHREAD_CLEANUP_STACKSIZE)
if (tcb->tos < CONFIG_PTHREAD_CLEANUP_STACKSIZE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not bailing out here when called with a non pthread type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because main/kernel thread type is TCB_FLAG_TTYPE_TASK/TCB_FLAG_TTYPE_KERNEL, but both types should be able to call pthread_cleanup_push/pthread_cleanup_pop. The major target of whole patch set is ensure the main/kernel thread can call all most pthread API like pthread.

Copy link
Contributor

@patacongo patacongo Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because main/kernel thread type is TCB_FLAG_TTYPE_TASK/TCB_FLAG_TTYPE_KERNEL, but both types should be able to call pthread_cleanup_push/pthread_cleanup_pop. The major target of whole patch set is ensure the main/kernel thread can call all most pthread API like pthread.

But kernel threads cannot create pthreads. pthreads reside only in user space.

What would that even mean in a kernal build. It makes no sense. Which user space?

No pthread APIs should be availble to kernel threads. That must be forbidden by the OS. I believe that would result in severe, catastrophic breakage if permitted.

Copy link
Contributor

@patacongo patacongo Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that use any pthread APIs should be permitted within the OS. Nor do I believe that the kernel thread TCB should carry the burden of the pthread data structures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tcb_s is shared by three entities(main task, kernel thread and pthread). To confirm POSIX standard, both main thread and pthread MUST support pthread_cleanup_[push|pop] and robust mutex. For kernel thread, it's not bad to support the cleanup stack. Yes, the robust mutex isn't need. But we only waste four byes for the late case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it is a totally unacceptable partitioning of user space and kernel interfaces. I am very opposed to supported this. It is wrong. It is a serious architectural mistake. Let's not go that way... it is very sickening design concept. Let's keep logic in place to prohibit the use of any pthread APIs from kernel threads. Please. Don't shit on the OS! We are entrusted you to maintain good functional partitioning and not to trash all of the interfaces.

I am super opposed to the concept. I am very disappointed in what you are doing to the OS. Very disappointed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My long term goals include:

  1. To separate kernel threads from all groups. To eliminate streams and file descriptors from strings,
  2. Move all pthread logic possible into user space as will all other Unix system.

You are ruining that.

Copy link
Contributor

@patacongo patacongo Jun 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly I will never merge this code. I believe that it is an abminination to the concept of a properly modularized Unix system. But I leave that to the good sense of others if they want it or not. (God, I hope there is no one out there so foolish).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a shoddy, incorrect solution to an important issue. The correct solution that does not contaminate the kernel thread is significantly more complex. I cut corners and implemented a bad simple solution that is bad of the OS and must not be merged. You are doing one of the "Enemies of Inviolables" in INVIOLABLES.txt:

The Enemies
===========

No Short Cuts
-------------

  o Doing things the easy way instead of the correct way.
  o Reducing effort at the expense of Quality, Portability, or
    Consistency.
  o Focus on the values of the organization, not the values of the Open
    Source project.  Need to support both.
  o It takes work to support the Inviolables.  There are no shortcuts.

That is unacceptable.

Copy link
Contributor Author

@xiaoxiang781216 xiaoxiang781216 Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it is a totally unacceptable partitioning of user space and kernel interfaces. I am very opposed to supported this. It is wrong. It is a serious architectural mistake. Let's not go that way... it is very sickening design concept. Let's keep logic in place to prohibit the use of any pthread APIs from kernel threads.

Ok, we can make the change like this:

DEBUGASSERT((tcb->flags & TCB_FLAG_TTYPE_MASK) !=
                TCB_FLAG_TTYPE_KTHREAD);

or

if ((rtcb->flags & TCB_FLAG_TTYPE_MASK) != TCB_FLAG_TTYPE_KTHREAD)
{
...
}

So the kernel thread will fail or no-op for pthread API, but main/pthread can work correctly.

Please. Don't shit on the OS! We are entrusted you to maintain good functional partitioning and not to trash all of the interfaces.

I am super opposed to the concept. I am very disappointed in what you are doing to the OS. Very disappointed.

My long term goals include:

  1. To separate kernel threads from all groups. To eliminate streams and file descriptors from strings,
  2. Move all pthread logic possible into user space as will all other Unix system.

You are ruining that.

This change doesn't ruining your goal, but a path to your final target:
1.Unify the main and thread behaviour:
a.main thread can call pthread_cleanup_[push|pop]
b.robust mutex work with main thread too
c.pthread_exit in main doesn't terminate the whole process
b.cancellation point in main doesn't terminate the whole process
2.The next step is reorg tcb_s maybe like this:

            tcb_s
              |
      +-------+--------+
      |                |
tcb_kernel_s     tcb_user_s
                    |
            +-------+--------+
            |                |
        tcb_task_s     tcb_pthread_s

3.Finally merge tls_info_s into tcb_user_s and place userspace tcb at the beginning of stack
You can see the first step help this path because it make main thread same as pthread which is always required.

@xiaoxiang781216
Copy link
Contributor Author

this PR is complex.

Yes, it's better to review the patch one by one in commit tab.

can you consider to separate quick_exit into another PR?

It's too difficult because quick_exit depend the previous patch which clean up the exit logic. If need, I can remove quick_exit patch from the PR and send a new PR once this PR get merged.

@davids5
Copy link
Contributor

davids5 commented Jun 10, 2020

@xiaoxiang781216 why is this needed? Why did you take the approach of removing the union? Where are the test cases with all the permutations of the CONFIG_* that control the changes?

@patacongo
Copy link
Contributor

In any event, this should not be merged until the 9.1 release has branched. This is far to risky to go into a release unverified.

@Ouss4
Copy link
Member

Ouss4 commented Jun 10, 2020

In any event, this should not be merged until the 9.1 release has branched

We are expecting to branch for 9.1 in 5 days, it should be reasonable to wait.

@jerpelea
Copy link
Contributor

+1 for the wait

@patacongo
Copy link
Contributor

The PR summary does not does not describe the changes of all of the 11 commits. This must be broken into separately traceable PRs.

@xiaoxiang781216
Copy link
Contributor Author

xiaoxiang781216 commented Jun 10, 2020

@xiaoxiang781216 why is this needed? Why did you take the approach of removing the union?

You mean the union inside exitinfo_s, this struct is added by me in commit:
843ad79
And find that the logic can be simplified more by removing the union.

Where are the test cases with all the permutations of the CONFIG_* that control the changes?

Now we just have ostest. I run the change with ostest, no problem found so far. Yes, the test case is very little. Can you improve in this area?
If your concern is whether the change can build successfully with all possible CONFIG_* combination, I think the github action and apache jenkins already build all most board config.

@davids5
Copy link
Contributor

davids5 commented Jun 10, 2020

@xiaoxiang781216 why is this needed? Why did you take the approach of removing the union?

You mean the union inside exitinfo_s, this struct is added by me in commit:
843ad79
And find that the logic can be simplified more by removing the union.

Where are the test cases with all the permutations of the CONFIG_* that control the changes?

Now we just have ostest. I run the change with ostest, no problem found so far. Yes, the test case is very little. Can you improve in this area?
If your concern is whether the change can build successfully with all possible CONFIG_* combination, I think the github action and apache jenkins already build all most board config.

I just ran into this myself, I was thinking dma preflight was lit is the some of the test cases it was not. So it did not break the build, but it also was never tested.

Signed-off-by: Xiang Xiao <[email protected]>
Change-Id: Ifefccda6cb7e2335e11976dcec74e308d64c7f5e
Signed-off-by: Xiang Xiao <[email protected]>
Change-Id: I2d91154572805699237cfc028202021c8f8eee40
@patacongo
Copy link
Contributor

patacongo commented Jun 10, 2020

In general, I think these changes are pretty ill-conceived. But I have not looked at all of the details; that is just my impression. There are many good ideas mish-mashed together with some not so good ideas. I don't think it should go to master.

It should be broken up into multiple PRs that can be reviewed, approved, and merged. It should never come into master in this form (and we still do not want to perturb the code base prior the branching off 9.1!).

@xiaoxiang781216
Copy link
Contributor Author

xiaoxiang781216 commented Jun 10, 2020

I already split the PR into small one, please give your comment and suggestion there. Thanks.

uint8_t tos;
struct pthread_cleanup_s stack[CONFIG_PTHREAD_CLEANUP_STACKSIZE];
#endif

Copy link
Contributor

@patacongo patacongo Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to move these into TLS? They should only be used by user space applications and, hence, should be accessed in the same way that the errno variable is.

We should move as much as pthread logic as possible out of the operating system. Over time, the entire pthread implementation (other than a a few OS hooks) should be moved into the C library. That is a proper and standard roadmap for pthread support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patacongo I will provide a patch to move them to TLS when I finish upgrade libcxx to the latest version.

@patacongo patacongo merged commit fd5fc2a into apache:master Jun 15, 2020
@xiaoxiang781216 xiaoxiang781216 deleted the sched branch June 15, 2020 13:47
@btashton btashton added this to To-Add in Release Notes - 10.0.0 Oct 14, 2020
@btashton btashton moved this from To-Add to Added in Release Notes - 10.0.0 Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

Main thread should have the same functionality as background thread
7 participants