-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESP32-WIFI packet clogging #12446
Comments
@casaroli , I tried to reproduce the issue here but I wasn't able to reproduce the same behavior: Eventually I got the 1st ping lost (I can't see it on
Can you share your binaries, please? (don't forget to include the bootloader and the partition table binaries) |
@acassis AFAIK, on the current |
I can confirm that after enabling it |
I will try and share the binaries, config and outcome. |
started from a clean tree
Then I reboot the device and try the following commands. I got the issue on the 3rd try:
see that I waited for the ARP table to fill before I sent the first echo. On the 3rd echo, the 1st went through also: This is with I shared the binaries here: |
Note: when I sent the packet before the ARP table is populated, and without |
@casaroli I think Greg @patacongo explained this issue in the pass, it was caused by some initialization that was done during the first packet processing. |
So here is with
Files are here, bootloader and partition table are the same: https://drive.google.com/drive/folders/1UElulLX0Qk_H43KSePl_OYuzLUVJBnQE?usp=sharing |
I could trace the packet down to I can also provide an image with |
Here are logs with info enabled:
This line I added, it shows the
It is interesting also this line:
which I think is related to the reply to the first echo (id=1) -- that we are not expecting. packet capture looks like the same: files: https://drive.google.com/drive/folders/1cdxsfDrmnlf4ZhX6-Uq_N_CJ-hBixfEv?usp=drive_link |
Thank you very much! |
This seems to be a different issue. I only recall explaining packet loss when packets with CONFIG_ARP_SEND=n. CONFIG_ARP_SEND is one of those mandatory options. Select 'n' for a broken system, select 'y' for normal behavior. I am in favor of removing mandatory options. |
I think instead of removing it could be enabled by default, this way avoid bloating NuttX and will work by default. |
But, please notice that the reported issue happens even with |
Was it delayed? Or sent twice? Notice that ECHO seq=3 is sent twice. That should not happen and if the sending were under software control, then the sequence number should have been incremented. This suggests that some driver related buffering issue may be at fault. Could a stale packet buffer have been sent? |
Sorry. I don't see it being sent twice. I see the packet with id=1 being sent just before the id=3 (the id=3 packet unclogged the packet with id=1). All the sequence numbers are 0/0 because I used So this is not retransmission and this is not ARP table problem. I think it is something inside the wifi driver or the controller. Not inside NuttX itself. |
Hi @casaroli, I couldn't reproduce the issue even when using your firmware (I tested the first firmware you sent, with Can you test with another network? |
But shouldn't the ID have changed from 3? |
I send 3 different First one (id=1) to 8.8.8.8 gets stuck somewhere (and we get a timeout) Second one (id=2) goes fine to 8.8.4.4 Then the third one goes fine to 8.8.8.8 but the first one (id=1) also gets sent at this time. This means the first icmp echo request got stuck somewhere and then later it gets unstuck. This is the weird behavior I am reporting. |
Yes I will test on a different network. I think I can also have a wifi monitor capturing lower level wifi signalling frames that might be useful for debugging. Will share the results soon. Note: if I send the packet before the ARP table gets filled, the problem does not happen (well the packet is still lost as expected, but it is not transmitted later -- as expected) |
Enable logic to send ARP requests if the target IP address mapping does not appear in the ARP table. Please check the comment in apache#12446 (comment)
After switching to a more reliable access point, I could see the packets arriving and I can confirm the problem does not happen with ICMP. However, the original problem with TCP is still hapenning, but since it requires a lot more context, I will open a separate issue. Thank you for the very fast response time. Sorry for the noise. |
Enable logic to send ARP requests if the target IP address mapping does not appear in the ARP table. Please check the comment in #12446 (comment)
Hello,
I am having a strange issue and I am not sure if this is in esp32 wifi driver or anything else in NuttX, so I describe here to see if anyone else can reproduce or point me where to look next.
I am using NuttX and apps
master
branches, configurationesp32-devkitc:wifi
and building withxtensa-esp-elf-gcc (crosstool-NG esp-13.2.0_20240305) 13.2.0
.So after booting, I connect to a managed wifi network which I can monitor with
tcpdump(8)
:In
nsh
, we connect to the wifi network.I can see it worked because there are some DHCP and ARP in tcpdump:
So then I send a ICMP echo from NuttX:
ok so packet is lost, not a big deal, however I is not even captured by
tcpdump
(i.e. it was not even sent).So I ping a different address:
So now it works and tcpdump shows it:
So far so good, nothing weird apart from the fact that the first packet was not even sent in the air. Lets try the ping to the original address again:
Now it worked but tcpdump shows the following:
WAT???
It looks like our first ICMP echo request (
id 1
) was clogged somewhere, and it was unclogged when we sent another ICMP echo request (id 3
) to the same address. And both packets were transmitted (and got a reply). If I keep sending ICMP echos to 8.8.4.4 (like the one withid 2
), it will not unclog. I could only manage to make the packet go whenever a new packet is sent to that same IP address.In fact, this is a extreme simplification of the problem, as the real problem actually happens with TCP connections (and SYN packets), however to simplify I could reproduce with ICMP. So I believe it might be the same problem (for both TCP and ICMP).
I could not reproduce this in
sim
Well, AFAIK there is nothing keeping different queues for packets with different destination addresses, so I have no idea where the problem can be.
Anyone has a clue?
The text was updated successfully, but these errors were encountered: