Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64 smp hangs on the sched_unlock call in nx_start function #11915

Closed
github-xiaodong opened this issue Mar 14, 2024 · 7 comments
Closed

arm64 smp hangs on the sched_unlock call in nx_start function #11915

github-xiaodong opened this issue Mar 14, 2024 · 7 comments

Comments

@github-xiaodong
Copy link

github-xiaodong commented Mar 14, 2024

For arm64 smp, in nx_start function, after nx_smp_start and nx_bringup execute, the cpu will hang on up_testset when they call sched_unlock.

CONFIG_SMP=y
CONFIG_SMP_NCPUS=2

In nx_start function, if change g_idletcb[i].cmn.affinity = (cpu_set_t)(CONFIG_SMP_DEFAULT_CPUSET & SCHED_ALL_CPUS) to g_idletcb[i].cmn.affinity = (cpu_set_t)(1<< i); then smp can run normally.

I want to ask whether the value of g_idletcb[i].cmn.affinity should be the bit of the current core or the bit of all cores.
When the comment says No, this IDLE thread can only run on its assigned CPU.

#ifdef CONFIG_SMP
      g_idletcb[i].cmn.flags = (TCB_FLAG_TTYPE_KERNEL | TCB_FLAG_CPU_LOCKED);
      g_idletcb[i].cmn.cpu   = i;

      /* Set the affinity mask to allow the thread to run on all CPUs.  No,
       * this IDLE thread can only run on its assigned CPU.  That is
       * enforced by the TCB_FLAG_CPU_LOCKED which overrides the affinity
       * mask.  This is essential because all tasks inherit the affinity
       * mask from their parent and, ultimately, the parent of all tasks is
       * the IDLE task.
       */

      g_idletcb[i].cmn.affinity =
        (cpu_set_t)(CONFIG_SMP_DEFAULT_CPUSET & SCHED_ALL_CPUS);
#else
      g_idletcb[i].cmn.flags = TCB_FLAG_TTYPE_KERNEL;
#endif

The arm64 up is normal.
If enable smp and choose CONFIG_SMP_NCPUS =1 , is normal too.

@acassis
Copy link
Contributor

acassis commented Mar 14, 2024

@hnwxd since this code is generic for all SMP chips, I think it is fine. What board are you using?

@xiaoxiang781216 @anchao did you find this issue on your side?

@github-xiaodong
Copy link
Author

github-xiaodong commented Mar 15, 2024

  1. We are adapting our own A53 quad-core chip, and at present, imx8 and rk3399 under arm64 do not support smp.
    Is there any other board that actually runs smp under arm64 ?

  2. When I made the changes described above, all four cores started properly, and the smp call test in the ostest application passed correctly.

  3. By the way, I am developing on version releases/12.4

微信截图_20240315092718
微信截图_20240315092736
微信截图_20240315092817

@xiaoxiang781216
Copy link
Contributor

  1. We are adapting our own A53 quad-core chip, and at present, imx8 and rk3399 under arm64 do not support smp.
    Is there any other board that actually runs smp under arm64 ?

qemu and fvp support arm64, you can compare the difference with your hardware.
With your change, all threads but idle will pin to CPU0, which may the reason why you pass the test.

@github-xiaodong
Copy link
Author

  1. We are adapting our own A53 quad-core chip, and at present, imx8 and rk3399 under arm64 do not support smp.
    Is there any other board that actually runs smp under arm64 ?

qemu and fvp support arm64, you can compare the difference with your hardware. With your change, all threads but idle will pin to CPU0, which may the reason why you pass the test.

You're right. It is not right for all the results debug to be CPU0 at the end of smp call test in previous picture.

It's my problem, because the implementation of arm64_gic_send_sgi function does not match our chip, because we need to use Group 1.

sgi

I found this problem when I used smp call test yesterday, because core 0 failed to wake up core 1, after changing to Group 1(ICC_SGI1R), core 1 successfully woke up.

The g_idletcb[i].cmn.affinity = (cpu_set_t)(CONFIG_SMP_DEFAULT_CPUSET & SCHED_ALL_CPUS); in nx_start function is no problem, it's been restored to the original code.

The smp call test and coremark is ok.

call
coremark

Thank you for your help! @acassis @xiaoxiang781216

@acassis
Copy link
Contributor

acassis commented Mar 15, 2024

@hnwxd I'm happy you should it! Please consider submitting your work on imx8 and rk3399, there are many people willing to use it with NuttX ;-)

@github-xiaodong
Copy link
Author

github-xiaodong commented Mar 18, 2024

@hnwxd I'm happy you should it! Please consider submitting your work on imx8 and rk3399, there are many people willing to use it with NuttX ;-)

I just changed the arm64_gic_send_sgi function.
The original implementation is,

  ARM64_DSB();

  regval = getreg32(IGROUPR(base, 0));

  if (regval & BIT(sgi_id))
    {
      write_sysreg(sgi_val, ICC_SGI1R);     /* Group 1 */
    }
  else
    {
      write_sysreg(sgi_val, ICC_SGI0R_EL1); /* Group 0 */
    }

  ARM64_ISB();

I delete this judgment and use write_sysreg(sgi_val, ICC_SGI1R) directly.

  ARM64_DSB();

  write_sysreg(sgi_val, ICC_SGI0R_EL1); /* Group 0 */
    
  ARM64_ISB();

Just like the implementation in the arm_gic_send_sgi function.

  ARM_DSB();
  CP15_SET64(ICC_SGI1R, sgi_val);
  ARM_ISB();

Why is the arm_gic_send_sgi function written ICC_SGI1R directly, but the arm64_gic_send_sgi function needs to make this judgment ? I'm trying to find out why.

@github-xiaodong
Copy link
Author

The pr "Add support for FIQ interrupts" made this change.

#10888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants