Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queued downlinks are not received by node #1228

Closed
elsalahy opened this issue Aug 22, 2019 · 17 comments
Closed

Queued downlinks are not received by node #1228

elsalahy opened this issue Aug 22, 2019 · 17 comments
Assignees
Labels
bug Something isn't working

Comments

@elsalahy
Copy link

elsalahy commented Aug 22, 2019

Summary

Scheduled downlink messages are not received by the end node.

Steps to Reproduce

  1. Establish communication with an end node.
  2. Publish any downlink message using your preferred method.
    Similar to here
  3. Observe if the downlink is received by the end node.

Please Note:

  • We tried restarting the stack.
  • We tried restarting the gateway used.
  • We tried two gatways, Tektelic and TTIG.
  • TTIG were not able to achieve successful OTAA.
  • Tektelic allows for successful OTAA and fails to send any scheduled downlinks.
  • Might be a problem with Basic Station or gateways used.

What do you see now?

I see the queued message on the server-side as seen below

/xxxx/down/queued', ... (366 bytes))
{"end_device_ids":{"device_id":"xxx","application_ids":{"application_id":"xxx-xxx"}},"correlation_ids":["as:conn:xxxxx","as:downlink:xxxx"],"downlink_queued":{"f_port":15,"frm_payload":"vu8=","priority":"NORMAL","correlation_ids":["xxxxxxx"]}}

but the messge is not received by end node.

What do you want to see instead?

The node receives the scheduled message.

Environment

  • Tektelic gateway (end node joins correctly)
  • Demo environment

How do you propose to implement this?

Don't know.

Can you do this yourself and submit a Pull Request?

None

@elsalahy elsalahy added bug Something isn't working prio/medium labels Aug 22, 2019
@johanstokking johanstokking added this to the August 2019 milestone Aug 22, 2019
@elsalahy
Copy link
Author

More notes related to issue testing:

  • A generic node is used for testing.
  • Class A is used for testing.
  • Uplinks tested interval-based 1 minute separated.
  • Uplinks tested 20 seconds separated.
  • Both cases, no downlinks received by the node, but uplinks are registered and seen by the server.

@KrishnaIyer
Copy link
Member

@elsalahy: Could you please cross check with a Things Uno? You can run the Downlink sketch.
I'm assuming the gateway is connected to our demo server?

@elsalahy
Copy link
Author

@KrishnaIyer Good idea but @johanstokking tried as well the demo environment just now and the downlink is also not working with him.

@johanstokking johanstokking added c/network server This is related to the Network Server prio/high blocking release This is blocking a release and removed prio/medium c/network server This is related to the Network Server blocking release This is blocking a release labels Aug 23, 2019
@johanstokking
Copy link
Member

  • What are you RX2 settings?
  • Please try to reproduce this locally with a local stack and provide debug server output from when you schedule the downlink message

@johanstokking johanstokking modified the milestones: August 2019, Next Up Aug 23, 2019
@elsalahy
Copy link
Author

elsalahy commented Aug 23, 2019

What are you RX2 settings?

Current RX2 settings :
Rx2 window delay after tx -> 6088 ms
Window length -> 179.8 ms
Max Preamble length -> 8

Also for added information
Rx1 window delay after tx -> 5039 ms

Please try to reproduce this locally with a local stack and provide debug server output from when you schedule the downlink message

Will set this up and report back, could you please be more specific about what you mean with debug server output, I'm not familiar with terms yet, do you mean MQTT server subscription output?

@johanstokking
Copy link
Member

What are you RX2 settings?

I meant data rate and frequency.

could you please be more specific about what you mean with debug server output, I'm not familiar with terms yet, do you mean MQTT server subscription output?

I mean the server logs; you'll see them flowing when you have a local setup.

@johanstokking
Copy link
Member

Please reopen with reproduction instructions

@rvolosatovs
Copy link
Contributor

ff1705 (class C device) on 510704c has no issues whatsoever:

Joining network...
[DBG ][LSTK]: Transmission completed
[DBG ][LMAC]: RX1 slot open, Freq = 868500000
[DBG ][LSTK]: OTAA Connection OK!
Joined network
[DBG ][LMAC]: Changing device class to -> CLASS_C
[DBG ][LMAC]: RX2 slot open, Freq = 869525000
Switched to class C
[INFO][LMAC]: RTS = 0 bytes, PEND = 0, Port: 1
[DBG ][LMAC]: Frame prepared to send at port 1
[DBG ][LMAC]: TX: Channel=3, TX DR=5, RX1 DR=5
Sending 0 bytes
[DBG ][LSTK]: Transmission completed
[DBG ][LMAC]: RX2 slot open, Freq = 869525000
[DBG ][LMAC]: RX1 slot open, Freq = 867100000
[DBG ][LMAC]: RX2 slot open, Freq = 869525000
Message sent
[DBG ][LSTK]: Packet Received 4 bytes, Port=2
[DBG ][LSTK]: Sending empty uplink message...
Message sent
Received message on port 2 (4 bytes, flags = 2): 33 25 60 5d 
Synchronized clock to Fri Aug 23 17:41:07 2019
[INFO][LMAC]: RTS = 0 bytes, PEND = 0, Port: 2
[DBG ][LMAC]: Acking to NS
[DBG ][LMAC]: Frame prepared to send at port 0
[DBG ][LMAC]: TX: Channel=7, TX DR=5, RX1 DR=5
[DBG ][LSTK]: Transmission completed
[DBG ][LSTK]: Awaiting ACK
[DBG ][LMAC]: RX2 slot open, Freq = 869525000
[DBG ][LMAC]: RX1 slot open, Freq = 867900000
[DBG ][LMAC]: RX2 slot open, Freq = 869525000
[DBG ][LSTK]: Ack=OK, NbTrials=0
[DBG ][LSTK]: Packet Received 2 bytes, Port=1
Received message on port 1 (2 bytes, flags = 1): 11 11 
[DBG ][LSTK]: Packet Received 3 bytes, Port=120
Received message on port 120 (3 bytes, flags = 1): 33 33 33 

@rvolosatovs rvolosatovs removed their assignment Aug 23, 2019
@rvolosatovs
Copy link
Contributor

@elsalahy if you use direnv already and are not afraid of trying new stuff, but want to get things up and running quickly and consistently then you can try this:

  1. (if you don't have already) install nix: https://nixos.org/nix/download.html
  2. curl -sL https://github.com/rvolosatovs/lorawan-stack/commit/75d9a900d60bb70901b8448bba4030eaa65c5969.patch | git am
  3. Modify the EUIs and keys in shell.nix (e.g. rvolosatovs/lorawan-stack@75d9a90#diff-5204cbe23b218cae9de0b1aedd4edcd2R180)
  4. Run start-clean-stack in 1 terminal -that's where you're gonna see the logs
  5. Run test-class-a-device <model> (e.g. test-class-a-device uno) - it will let you know what you should do(e.g. trigger uplink on the device)

The test-class-{a,c}-device scripts is what I use for testing, and e.g. is how I got #1228 (comment) (note, that test-class-c-device uses https://github.com/TheThingsNetwork/lorawan-stack-example-clock, which must be cloned first to work)

@elsalahy
Copy link
Author

Thanks @rvolosatovs, I will try to run things locally and come back to you if there seems to be a problem, for now, it seems it was just the demo environment at the office that was causing the issues, so no need to worry about this issue for now, but appreciate the hints though 😉 .
If I find any issue I will try to provide as many logs as possible to reduce the time needed for debugging.

@elsalahy
Copy link
Author

Update on the current issue:
The issue still persists with a local v3 setup.

More on the reproduction steps:
The issue is super general and simple because I can't seem to receive downlinks with a class A device.
So reproduction steps can be ignored if downlinks are working in testing with other class A devices.
If other devices connected to stack 3.1.0 are capable of receiving downlinks, it might be an issue with the device or environment or Mqtt method of scheduling downlinks.

Some points requested earlier

  • Rx2 Freq = 869525000
  • I'm currently testing class A device, and node joins successfully every time.
  • Node Uplink always received and published on MQTT server See trace 1
  • Local server logs are in trace 2
  • I can't seem to receive queued messages (after each join if I queue a message, sometimes it would pass through) Node trace is shown in trace 3.

Some extra information that may be helpful

  • I'm using multitech conduit gateway.
  • My downlinks are sent and published using the command below and repeated again without any change.
mosquitto_pub -h localhost -t 'v3/ahmed-fuota/devices/otaa-device/down/push' -u ahmed-fuota -P 'NNSXS.BXTKCY6NLKFH52UY32WLIQ3UT6W34X3NIYGCR3Y.RU6ATV3RJBBYZZVMFT4GIBS2CPRS5EAHBZM24IMNJKXRMT2TEMHQ' -m '{"downlinks":[{"f_port": 15,"frm_payload":"vu8=","priority": "NORMAL"}]}' -d --cafile /Users/ahmedelsalahy/desktop/localv3/cert.pem
  • I noticed this in the server logs when I publish to the Mqtt server
stack_1      |   WARN Send upstream message failed             application_uid=ahmed-fuota error=EOF namespace=applicationserver/io/mqtt remote_addr=172.18.0.1:44250
  • Finally, I tested the same exact SW on the node side with V2 and was able to receive scheduled downlinks without any apparent issues.

trace1.txt
trace2.txt
trace3.txt

@elsalahy
Copy link
Author

elsalahy commented Aug 28, 2019

@johanstokking Please help me assign who can assist with the issue.
Keeping in mind new logs and information.

@rvolosatovs
Copy link
Contributor

rvolosatovs commented Aug 28, 2019

You're sending uplinks after queueing the downlinks, right?
Since it's class A, the NS needs an uplink in order to send a downlink.

I can't seem to receive queued messages (after each join if I queue a message, sometimes it would pass through)

So are you queuing multiple downlinks or a downlink? It's very probable that immediately after join the NS would have MAC commands to send and if they do not fit in FOpts, the downlink slot would be used for them.

Did you try to push downlink via CLI directly?(i.e. ttn-lw-cli device downlink push ahmed-fuota otaa-device ...)

Please send stack logs with debug logging level on

Can you also check if queued downlinks are actually there? (ttn-lw-cli device downlink list ahmed-fuota otaa-device)

@elsalahy
Copy link
Author

elsalahy commented Aug 28, 2019

@rvolosatovs, Sorry I was testing some stuff with Johan
here are two debug on traces
debugon1.txt shows a trace where the node received the downlink
debugon1.txt shows a trace where the node didn't receive the downlink
debugon1.txt
debugon2.txt

If I push using CLI directly do I have to sync with the uplink or will it be scheduled like the Mqtt command? anyhow I will try.

Can you also check if queued downlinks are actually there? (ttn-lw-cli device downlink list ahmed-fuota otaa-device)

I will double check but I see it in the Mqtt log as seen in trace1.txt and seems to be sent with the debugon2.txt.

@rvolosatovs
Copy link
Contributor

rvolosatovs commented Aug 28, 2019

No, the ttn-lw-cli device downlink push simply adds the downlink to the queue. You can execute it any time.
Please use CLI directly - let's try to find the source of the problem by eliminating the MQTT step first.

Regarding logs I would like to see the whole sequence from uplink first logged on GS to downlink scheduled by the GS in response to it.
In the trace 1 you actually have 2 downlinks being scheduled.
After device joins, push a downlink via CLI and then send an uplink and wait for downlink to be scheduled by the GS

@elsalahy
Copy link
Author

After some exhaustive debugging it seems other devices work as expected on the local setup.
This means there might be an issue with the generic node or the gateways connected in the setup.
Will close this issue and open a more relevant issue (if needed) that better describes the behaviour.
For now, closing this issue.

@elsalahy
Copy link
Author

For housekeeping and reference purposes,
The issue is resolved by adjusting the error margins of the Rx windows on the end device side as

  • Default Rx1 delay in V3 being 5 seconds makes the generic node miss the Rx slots.
  • Default Rx1 delay in V2 being 1 second makes the generic node catch the Rx slots with no issues.
  • The generic node misses the RX slots by 20 ms margin. The drift of time is caused by the use of an internal generic node oscillator and the issue is now tracked on the generic node development repo.
  • V3 provides flexibility to adjust RX setting ( to 1 second) using something like

ttn-lw-cli dev set appname devicename --mac-settings.desired-rx1-delay=RX_DELAY_1

Which will allow devices that drift in time and work on V2 to catch the downlinks correctly?
The command doesn't solve the time drifting issue, but hides it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants