Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RX1 window Timer event never reached #38

Open
slavendam opened this issue Jan 24, 2024 · 18 comments
Open

RX1 window Timer event never reached #38

slavendam opened this issue Jan 24, 2024 · 18 comments
Assignees
Labels
bug Something isn't working

Comments

@slavendam
Copy link

slavendam commented Jan 24, 2024

Describe the bug
endPacket() is never finished after 4:40h of properly working. I've tested 3 times and it blocked at same time (4:40h).
2 times with 10min sending period between (both times it blocked at counter 29) and once tested with 5min sending period (same time period passed, but now it blocked on counter 57).
I did debugging by adding multiple print logs in header files.

Sending is done successfuly, message arrives on gateway - and than it stops, like it is in infinite loop.
RX1 window should be opened, but it seems that this part is never done.
After LoRaMacProcess() function is done once (for sending), function OnRxWindow1TimerEvent() should be called - but it is never called.

I'm using RAK3172 module, EU868 region.
I'm using STM32 Low Power library to send device in DeepSleep, and I'm using RTC Alarm timed wakeup.
Even if it is stuck between TX and RX windows, this alarms work properly (I can see alarm triggers later).
I tried to disable DeepSleep and alarms, and use delay instead, but result is the same.

Transmitted payload has 6 bytes.

  • OS: Windows
  • Arduino IDE version: 2.2.1
  • STM32 core version: 2.7.1
  • Library version: 0.2.0
  • Upload method: Serial
  • Board Name: RAK3172 (without TXCO)

To Reproduce
Set device to send messages every 10min and wait for 29th message. Or set with 5min time period and wait for 57th message.
That last message will be transmitted, endPacket() function will not be finished.

Expected behavior
This is part of logs how it should looks like when it works correctly (note that 2nd section is missing later):

LoRaMacProcess 1
LoRaMacHandleIrqEvents 1
LoRaMacHandleIrqEvents 2
ProcessRadioTxDone
...
TIMER_IF_StartTimer Start timer: time=4154841, alarm=4156109
GetTimerTicks 1
GetTimerTicks 2: -4154845, 4154844
TIMER_IF_GetTimerElapsedTime: 6
TIMER_IF_GetTime
GetTimerTicks 1
GetTimerTicks 2: -4154847, 4154846
TIMER_IF_BkUp_Read_MSBticks
LoRaMacProcess 2
LoRaMacProcess 3
LoRaMacProcess 13
LoRaMacProcess 14

TIMER_IF_SetTimerContext
GetTimerTicks 1
GetTimerTicks 2: -4156110, 4156109
OnRxWindow1TimerEvent
RxWindowSetup 1
RxWindowSetup 2
RxWindowSetup 3
GetTimerTicks 1
GetTimerTicks 2: -4156114, 4156113
TIMER_IF_GetTimerElapsedTime: 4
GetTimerTicks 1
GetTimerTicks 2: -4156116, 4156115
TIMER_IF_GetTimerElapsedTime: 6
GetTimerTicks 1
GetTimerTicks 2: -4156118, 4156117
TIMER_IF_GetTimerElapsedTime: 8
GetTimerTicks 1
GetTimerTicks 2: -4156120, 4156119
TIMER_IF_StartTimer Start timer: time=4156119, alarm=4156371
OnRadioRxTimeout
OnMacProcessNotify
... [logs continue]

This is log when it stopped:

LoRaMacProcess 1
LoRaMacHandleIrqEvents 1
LoRaMacHandleIrqEvents 2
ProcessRadioTxDone
...
TIMER_IF_StartTimer Start timer: time=4308483, alarm=4309750
GetTimerTicks 1
GetTimerTicks 2: -4308487, 4308486
TIMER_IF_GetTimerElapsedTime: 6
TIMER_IF_GetTime
GetTimerTicks 1
GetTimerTicks 2: -4308489, 4308488
TIMER_IF_BkUp_Read_MSBticks
LoRaMacProcess 2
LoRaMacProcess 3
LoRaMacProcess 13
LoRaMacProcess 14
[logs stopped]

Has anyone had the same problem, or do you have idea about possible solution?

Thanks!

@matthijskooijman
Copy link
Collaborator

@slavendam I had seen this issue in my mailbox and I was intrigued, but had not found any time to look at it yet. Now you've pointed out that Lora-net/LoRaMac-node#1349 might be relevant, and it might very well be. I have commented at that issue, with some debugging suggestions as well. Maybe best to report the results here, to not pollute that issue to much?

@slavendam
Copy link
Author

Thanks for reaching back!

I've already printed around line
TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );

Full code is this:

// Setup timers
    printf("ProcessRadioTxDone 0 - START\n"); // CHECKPOINT 1
    CRITICAL_SECTION_BEGIN( );
    uint32_t offset = TimerGetCurrentTime( ) - TxDoneParams.CurTime;
    TimerSetValue( &MacCtx.RxWindowTimer1, MacCtx.RxWindow1Delay - offset );
    TimerStart( &MacCtx.RxWindowTimer1 );
    printf("TimerStart RxWindowTimer1 - DONE\n"); // CHECKPOINT 2
    TimerSetValue( &MacCtx.RxWindowTimer2, MacCtx.RxWindow2Delay - offset );
    TimerStart( &MacCtx.RxWindowTimer2 );
    printf("TimerStart RxWindowTimer2 - DONE\n");  // CHECKPOINT 3
    CRITICAL_SECTION_END( );
    printf("ProcessRadioTxDone 1: TimerGetCurrentTime=%d, CurTime=%d, offset=%d \n",TimerGetCurrentTime(), TxDoneParams.CurTime, offset);
    printf("ProcessRadioTxDone 2: RxWindow1Delay=%d, delay-off=%d\n", MacCtx.RxWindow1Delay, MacCtx.RxWindow1Delay - offset);
    printf("ProcessRadioTxDone 3: rxWindT1=%d\n", MacCtx.RxWindowTimer1);
    printf("ProcessRadioTxDone 4: RxWindow2Delay=%d, delay-off=%d\n", MacCtx.RxWindow2Delay, MacCtx.RxWindow2Delay - offset);
    printf("ProcessRadioTxDone 5: rxWindT2=%d\n", MacCtx.RxWindowTimer2);

Left are debug prints when everything WORK, right is when it DOESN'T WORK.

image
image

I'm also attaching debug log file if you want to take a look (it is big, but you can start/look from end).
E.g. Anything between "ProcessRadioTxDone 0 - START" and "TimerStart RxWindowTimer1 - DONE" is connected with setting timer for RX1.
I even did some printings from "UTIL_TIMER_Start()".

I didn't find anything obivously wrong in this part, but I suspect that something behind this can be issue. Maybe some RTC value or something inside of IRQ handler where timing values are summed up.

Reason why I think this is timing issue is because halt happens every time after 4:40h (no matter if you send with 5min period or 10min period). Like some overflow happens (leaving it for another 4:40h doesn't release it).

Do you know if code use some other time source except GetTimerTicks()?

serial_text10.txt

@slavendam
Copy link
Author

I don't have "modem.maintain()" call directly in my code, but as I can see it is called from endPacket() function as maintainUntilIdle(); which has do { maintain(); } while(busy());
And I can confirm that this is called because LoRaMacProcess(); function is called only inside of maintain() and you can see multiple "LoRaMacProcess" logs.

@matthijskooijman
Copy link
Collaborator

So your prints show offset=28 in both cases, so the potential offset underflow issue I mentioned in the other issue (which could result from not calling maintain() quickly enough) is not the culprit here (it probably exists, but is not your issue).

I do not have any suggestions right away, but I'm a bit too tired now to really dive in. I'll see if I can find a bit of time to look at your debug log tomorrow (not guarantees, though...).

@slavendam
Copy link
Author

Thanks for taking the time to think about the problem.
If you will have any ideas what to try, let me know, because I'm out of ideas now :)

@slavendam
Copy link
Author

slavendam commented Feb 2, 2024

I think we have lucky winner :)

In this RTC_StartAlarm line alarm is set up based on subSecond (ms) value.

RTC_StartAlarm(RTC_ALARM_B, 0, 0, 0, 0, (timeout * 1000 / MS_TO_TICK + 1), RTC_HOURFORMAT12_PM, 31UL);

According to this: https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.h#L195
subSeconds are uint32_t type which gives max value of 4 294 967 295.

As you can see from my logs, issue happens when processor tries to setup alarm with value of 4 310 527 ticks, and that value is multiplied by 1000. That should give value of 4 310 527 000 which is bigger than max value. That creates overflow in calulation so end result is wrong and alarm is set to some time value which RTC will not reach when it should.

This is commit which made this issue: 54882c8

@slavendam
Copy link
Author

slavendam commented Feb 2, 2024

UPDATE
I've confirmed that issue is in subSeconds value.
Test code:

  uint32_t subloop = 0;
  for (uint32_t i = 3500000; i < 4800000; i = i + 100000){
    subloop = (i * 1000 / MS_TO_TICK + 1);
    printf("TIMER_IF_StartTimer for loop timeout to subseconds: timeout=%d, subseconds=%d\n\r", i, subloop);
  }

Test results:


TIMER_IF_StartTimer MS_TO_TICK=256

TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3500000, subseconds=13671876
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3600000, subseconds=14062501
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3700000, subseconds=14453126
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3800000, subseconds=14843751
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=3900000, subseconds=15234376
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4000000, subseconds=15625001
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4100000, subseconds=16015626
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4200000, subseconds=16406251
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4300000, subseconds=19660
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4400000, subseconds=410285
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4500000, subseconds=800910
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4600000, subseconds=1191535
TIMER_IF_StartTimer for loop timeout to subseconds: timeout=4700000, subseconds=1582160

@slavendam
Copy link
Author

slavendam commented Feb 5, 2024

During solving of this problem, I found new one.
As mentioned earlier, problem with blocking every 4:40h was overflow in this line:

RTC_StartAlarm(RTC_ALARM_B, 0, 0, 0, 0, (timeout * 1000 / MS_TO_TICK + 1), RTC_HOURFORMAT12_PM, 31UL);

Solution for this problem is:

uint64_t subSecond_tmp = ((uint64_t)((uint64_t)timeout * (uint64_t)(1000))) / MS_TO_TICK + 1;
uint32_t subSeconds_start = (uint32_t)subSecond_tmp;
RTC_StartAlarm(RTC_ALARM_B, 0, 0, 0, 0, subSeconds_start, RTC_HOURFORMAT12_PM, 31UL);

We are using variable with bigger range because during calculations, our values can overflow in uint32_t.
This solution is tested and ti works.

Second problem I found is "bomb" which will trigger in ~50 days and will block code execution.

Logic of processor is based on number of ticks which value can be between 0 and UINT32_MAX (4294967295).
RTC_StartAlarm function (STM32RTC) require subseconds as parameter:

https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L801

That is reason why we are doing calculation number_of_ticks(1000/MS_TO_TICK)* and this second part is equal to ~3.9.
As mentioned earlier, number of ticks can get value up to UINT32_MAX.
In that case subSeconds=UINT32_MAX * 3.9 which can't fit in uin32_t variable!

Example:
number_of_ticks=4 290 000 000
subSeconds = number_of_ticks*1000/256 = 16 757 812 500 (bigger than UINT32_MAX)

In RTC_StartAlarm function ti will send only 32 bits of value, not all 64 bits.
So currently biggest value of ticks we can send is UINT32_MAX/3.9=1.101.273.665 (because subSeconds variable can't receive correctly values larger that UINT32_MAX)

In RTC_StartAlarm it is calculating subSeconds back to ticks:

https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L906

As it got wrong subSecond value, ticks will be calculated wrong and alarmB will be set wrongly.

Solution is to add uint64_t parameter to RTC_StartAlarm function so we can forward correct value of subSeconds (and ticks, because they are connected).

I added new line in rtc.h next to this line:
https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.h#L195

as it follows:
void RTC_StartAlarm2(alarm_t name, uint8_t day, uint8_t hours, uint8_t minutes, uint8_t seconds, uint32_t subSeconds, hourAM_PM_t period, uint8_t mask, uint64_t ticksToSubSeconds);
Note: Added uint64_t ticksToSubSeconds parameter

In rtc.c I changed this line:

https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L801

With this:

void RTC_StartAlarm(alarm_t name, uint8_t day, uint8_t hours, uint8_t minutes, uint8_t seconds, uint32_t subSeconds, hourAM_PM_t period, uint8_t mask)
{
  RTC_StartAlarm2(name, day, hours, minutes, seconds, subSeconds, period, mask, NULL);
}

void RTC_StartAlarm2(alarm_t name, uint8_t day, uint8_t hours, uint8_t minutes, uint8_t seconds, uint32_t subSeconds, hourAM_PM_t period, uint8_t mask, uint64_t ticksToSubSeconds)
{ 

This logic enables that all library users who implemented this RTC_StartAlarm function don't need to change anything (no need to add new parameters). Maybe it is possible to set "...uint64_t ticksToSubSeconds=NULL)", without declaration of new function, but I didn't test that part.

And this line:
https://github.com/stm32duino/STM32RTC/blob/90414bcb87c41307b9a93b84a4ae589d6d621325/src/rtc.c#L906

is changed with this:

      if(ticksToSubSeconds != NULL){
        uint64_t tmp = ((uint64_t)ticksToSubSeconds*(predivSync+1)) / (uint64_t)1000;
        RTC_AlarmStructure.AlarmTime.SubSeconds = (uint32_t)UINT32_MAX -  (uint32_t)tmp;
        printf("STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-12, alarm=%lu\n", (uint32_t)RTC_AlarmStructure.AlarmTime.SubSeconds);
      }
      else{
        RTC_AlarmStructure.AlarmTime.SubSeconds = UINT32_MAX - (subSeconds * (predivSync + 1)) / 1000;
        printf("STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-3, subSeconds= %lu, ticks=%lu\n", subSeconds, RTC_AlarmStructure.AlarmTime.SubSeconds);
      }

It is checking if 64 bits value is sent. If it is, it will use it, and if not, than regular 32 bit subSecond variable will be used.

And finally when I changed STM32RTC library, I made change in STM32LoRaWAN.
This line:

RTC_StartAlarm(RTC_ALARM_B, 0, 0, 0, 0, (timeout * 1000 / MS_TO_TICK + 1), RTC_HOURFORMAT12_PM, 31UL);

Is changed with this:

uint64_t subSecond_tmp = ((uint64_t)((uint64_t)timeout * (uint64_t)(1000))) / MS_TO_TICK + 1;
uint32_t subSeconds_start = (uint32_t)subSecond_tmp;
RTC_StartAlarm2(RTC_ALARM_B, 0, 0, 0, 0, subSeconds_start, RTC_HOURFORMAT12_PM, 31UL, subSecond_tmp);

Now we are calling RTC_StartAlarm2 (and not RTC_StartAlarm) because it can receive uint64_t value. subSeconds_start value is not important because it will not be used.

I've tested this solution with hardcoded value of 4 290 000 000 ticks.
Here are printf from RTC_StartAlarm function:

STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-13, tmp=4289999872
STM32RTC rtc.c RTC_StartAlarm in ELSE() 2-12, alarm=4967423

"tmp" is result of calculating received subseconds (ticksToSubSeconds) back to ticks, and "alarm" is new alarm.
Difference between 4 290 000 000 and 4289999872 is because of precision I think.
And UINT32_MAX-4289999872 really gives 4967423 as result.

EDIT: this difference in result 429... vs 428.. was due to wrong order of calculation (during debuging I changed to first divide by 1000, than multiply by 256). It should first multiply, than divide, as it was earlier.

@fpistm probably your assistance will be necessary in solving this here and in STM32RTC library. :)

@fpistm
Copy link
Member

fpistm commented Feb 5, 2024

@fpistm probably your assistance will be necessary in solving this here and in STM32RTC library. :)

Hi @slavendam,
yes I will check that, thanks for the detailed explanation.

@FRASTM
Copy link
Contributor

FRASTM commented Mar 12, 2024

Hi @slavendam,
Based on your proposal I raised

The main difference is that a SubSecond parameter is expressed on 64bit, instead of passing the uint32_t subSeconds plus the uint32_t ticksToSubSeconds.

Could you please check the conversion and if this fix the issue in the same way as yours

      if (subSeconds > (uint64_t)UINT32_MAX) {
        uint64_t tmp = ((uint64_t)subSeconds * (predivSync + 1)) / (uint64_t)1000;
        RTC_AlarmStructure.AlarmTime.SubSeconds = (uint32_t)UINT32_MAX - (uint32_t)tmp;
      } else {
        RTC_AlarmStructure.AlarmTime.SubSeconds = (uint32_t)((uint32_t)UINT32_MAX - (uint32_t)(subSeconds * (predivSync + 1)) / 1000);
      }

@fpistm
Copy link
Member

fpistm commented Apr 3, 2024

@slavendam
Any feedback on this, please ?

@slavendam
Copy link
Author

Sorry for delay. I'll give you feedback in day or two

@snh
Copy link

snh commented Jun 2, 2024

Thank you for all your work on this issue, I am experiencing the same issue and was relieved to find such a detailed investigation had already been undertaken. ❤

Unfortunately the PRs you mentioned above @FRASTM (#39 and stm32duino/STM32RTC#109) don't appear to resolve this issue for me. After applying these patches locally I continue to find myself stuck in STM32LoRaWAN::maintainUntilIdle at approximately the 4 hour 40 minute mark.

I am a little stuck as to the next steps to try and resolve this, but happy to test other proposed fixes if there are any suggestions.

@janvrska
Copy link

janvrska commented Jun 3, 2024

Thank you for all your work on this issue, I am experiencing the same issue and was relieved to find such a detailed investigation had already been undertaken. ❤

Unfortunately the PRs you mentioned above @FRASTM (#39 and stm32duino/STM32RTC#109) don't appear to resolve this issue for me. After applying these patches locally I continue to find myself stuck in STM32LoRaWAN::maintainUntilIdle at approximately the 4 hour 40 minute mark.

I am a little stuck as to the next steps to try and resolve this, but happy to test other proposed fixes if there are any suggestions.

I have the same problem as you mentioned: getting stuck at maintainUntilIdle after a wakeup from deepsleep which lasted 12 hours. This is really bad because I need the chip to go to sleep after sending data. I tried to workaround it by creating a function with a timeout that forces the chip to sleep if it gets stuck at maintainUntilIdle. However, this didn't solve the problem because, on the next wakeup, the data is not sent. I am really frustrated by this issue, as it is the last problem I need to resolve to finish my personal project.

@janvrska
Copy link

Any progress @fpistm @FRASTM ?

@beneman6
Copy link

beneman6 commented Jul 8, 2024

I also ran into this problem with my work project where I created a MBus to LoRa WAN Gateway. I have tried to apply multiple patches mentioned here but it seems nothing really works and testing all of this is really hard because I always have to wait almost five hours. I hope there will be some progress on this soon and I am curious if there is a different way to testing solutions than waiting.

@fpistm
Copy link
Member

fpistm commented Jul 8, 2024

Currently no progress. As we are not able to reproduce with Nucleo WL55. Seems related to other hardware setup.

@beneman6
Copy link

beneman6 commented Jul 8, 2024

Okay thank you for the update. I am using a Seeed Studio LoRa E5 mini Board wit the STM32WLE5JC Module on Board witch seems to be very similar to the STM32WL55JC1 Chip but maybe I am missing something here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants