Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped RX packets - RPi 1, 2, and 3 #1954

Open
stamster opened this issue Apr 8, 2017 · 87 comments
Open

Dropped RX packets - RPi 1, 2, and 3 #1954

stamster opened this issue Apr 8, 2017 · 87 comments
Assignees
Labels
Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.

Comments

@stamster
Copy link

stamster commented Apr 8, 2017

Hello,
I have multiple RPi's running on two locations. I noticed that each of them report dropped packets, but only in RX direction.

RPi 3:

eth0      Link encap:Ethernet  HWaddr b8:27:eb:xx:xx:xx  
          inet addr:192.168.100.20  Bcast:192.168.100.255  Mask:255.255.255.0
          inet6 addr: fe80::906e:f3:xxxx:xxxx/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:275242 errors:0 dropped:15336 overruns:0 frame:0
          TX packets:71301 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:38165404 (36.3 MiB)  TX bytes:23844353 (22.7 MiB)

RPi 2:
Just rebooted after latest rpi-firmware patch, and only 4 mins of uptime getting: RX packets:689 errors:0 dropped:5 overruns:0 frame:0

RPi 1 model B:
RX packets:647 errors:0 dropped:28 overruns:0 frame:0

RPi 1:
RX packets:209844 errors:0 dropped:14892 overruns:0 frame:0

The only single RPi which does not seem to be affected by this issue is another RPi 1 model B, where I didn't upgraded firmware.
So if it's a firmware related issue, the last known good firmware/kernel w/o RX drops is this one:

4.1.19+ #858 Tue Mar 15 15:52:03 GMT 2016 armv6l GNU/Linux

Mar 15 2016 14:48:20 
Copyright (c) 2012 Broadcom
version 1bf9a9a77026af9128a339c82d72e331d3532ee4 (clean) (release)

10 days uptime, and not a single drop:

eth0      Link encap:Ethernet  HWaddr b8:27:eb:xx:xx:xx  
          inet addr:192.168.100.30  Bcast:192.168.100.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1335268 errors:0 dropped:0 overruns:0 frame:0
          TX packets:721742 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:81649608 (77.8 MiB)  TX bytes:185224562 (176.6 MiB)

RPi's behave exactly the same on other networks, so it's not LAN related, since other PC's and devices do not have any drops with uptime of 250 days.

@JamesH65
Copy link
Contributor

JamesH65 commented May 18, 2017

Any relevant messages in syslog? Also, any ideas when this problems started to occur?

@JamesH65 JamesH65 added Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator. Assigned for implementation/action labels May 18, 2017
@JamesH65 JamesH65 self-assigned this May 19, 2017
@JamesH65 JamesH65 removed the Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator. label May 19, 2017
@JamesH65
Copy link
Contributor

A quick glance and some minor debugging at the driver level seems to indicate that it is not the driver itself that is dropping the packet - the code that increments the rx_dropped counter is not called. So presumably somewhere else in the stack is deciding to drop the packet. I'm not sure how to find out where in the stack though! Will investigate further.

@JamesH65
Copy link
Contributor

It's quite possible this is harmless. The dropped packet count is not only used for errors, but also flagging up packets that are dropped because the linux stack doesn't handle them (some IPv6 stuff if you don't have IPv6 enabled for example). This might be one of those messages. I will try and find out which it is.

@stamster
Copy link
Author

stamster commented May 21, 2017 via email

@JamesH65
Copy link
Contributor

OK, not IPv6, but that was only given as an example of some packets that are simply discarded by the network stack. Dropped packets are not, necessarily, an error. If the stack gets a correctly formatted packet it doesn't implement, then it is dropped. It's not an error as such. So it is perfectly reasonable to be getting dropped packets and not worry about them. However, in this case I would like to know whether this is the case here.

@JamesH65
Copy link
Contributor

JamesH65 commented May 22, 2017

Here is some (probably quite old) text on why a packet may be dropped.

Also worth noting that even if IPv6 is disable on the Pi, it could still receive IPv6 packets from other devices on the network, which will cause dropped packets

Softnet backlog full -- (Measured from /proc/net/softnet_stat)

Bad / Unintended VLAN tags

Unknown / Unregistered protocols

IPv6 frames when the server is not configured for IPv6

If any frames meet those conditions, they are dropped before the protocol stack and the rx_dropped counter is incremented.

@JamesH65
Copy link
Contributor

After much faffing, I have build a utility I found called dropwatch, rebuilt the kernel to turn on a particular form of net logging and now have a list of addresses where packets are being dropped. However, cross references those addresses to the kernel isn't giving valid results, so i suspect all the drops are in modules.

What is worth noting is that the ifconfig dropped packets counter is a small subset of the actual number of packets dropped. I can see which address in my list is causing the ifconfigs drops, just need to figure out what code corresponds to that location.

@JamesH65
Copy link
Contributor

OK, thanks to a timely intervention from @pelwell I do have some idea of the code that is dropping the packets. The ifconfig dropped packet counter appears to be incremented in the __netif_receive_skb_core function, approx line 4214. Still some effort required in backtracking from there to determine the reason for the dropped packets.

@network-shark
Copy link

network-shark commented May 25, 2017

I also see these these drops on my pi3

pi@raspberrypi:~ $ uname -a Linux raspberrypi 4.9.13-v7+ #974 SMP Wed Mar 1 20:09:48 GMT 2017 armv7l GNU/Linux

pi@raspberrypi:~ $ ifconfig eth0 Link encap:Ethernet HWaddr b8:27:eb:05:2e:7f inet addr:192.168.10.11 Bcast:192.168.10.255 Mask:255.255.255.0 inet6 addr: fe80::xxxxxxx/64 Scope:Link inet6 addr: 2003:xxxxxx/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14465964 errors:0 dropped:276326 overruns:0 frame:0

If you need more infos feel free to ask !

@stamster
Copy link
Author

stamster commented May 26, 2017

@JamesH65 so we can conclude issue is located in the kernel core? On some cloud providers I also experience dropped packets in RX line.

@JamesH65
Copy link
Contributor

I don't think any conclusion can be made since I don't yet know the cause of the dropped packets - it could be entirely benign.

I think everyone sees the dropped packets on the Pi, I certainly have been seeing them for years, but ignored then.

@stamster
Copy link
Author

stamster commented May 26, 2017

Well, it started to happen only after kernel/firmware update year ago or so.

On this version I have zero dropped packets:

4.1.19+ #858 Tue Mar 15 15:52:03 GMT 2016 armv6l GNU/Linux

Mar 15 2016 14:48:20 
Copyright (c) 2012 Broadcom
version 1bf9a9a77026af9128a339c82d72e331d3532ee4 (clean) (release)

Heh, it might be that this old version has a bug of not incrementing counter 😆

@JamesH65
Copy link
Contributor

Or newer version added more places where the counter could be incremented...

@popcornmix
Copy link
Collaborator

stamster, can you identify the exact update which caused this. See:
https://github.com/Hexxeh/rpi-firmware/commits/master

If you click on each commit the end of the url contains a git hash. Run
sudo rpi-update <hash>
to revert back to that version. Report the first version with the packets dropped error.

(I suspect it will be one of the major bumps - e.g. the first 4.4 kernel)

@stamster
Copy link
Author

I've been super busy recently - I'll try to experiment with down-grades this weekend.

@JamesH65
Copy link
Contributor

I was doing some testing today with the latest Raspbian release, 5GB of data transferred with no dropped packets. This was on ethernet. If doing testing check the latest release first.

@stamster
Copy link
Author

Well, I got that kernel 5 days ago and still there were dropped packets.
Now I've fetched:

*** Updating kernel modules
 *** depmod 4.9.37-v7+
 *** depmod 4.9.37+

Let's see...

@koppenho
Copy link

Let me add my stats: RX drop was about 4% with unpatched rasbian-jessie.
After updating Jessie to 4.9.37-v7+ the drop count "dropped" to 0.005%.
I think this is an improvement.

@stamster
Copy link
Author

stamster commented Jul 17, 2017

I still have drops, even with latest kernel.

4.9.37+ #1017 Thu Jul 13 11:14:43 BST 2017 armv6l GNU/Linux
RX packets:62278 errors:0 dropped:4203 overruns:0 frame:0

@JamesH65
Copy link
Contributor

JamesH65 commented Jul 20, 2017

Hmm, just been doing some testing on this, not seeing dropped packets on the ethernet at all. It is possible that this is environmental?

Linux raspberrypi 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 armv7l GNU/Linux

RX packets:828974 errors:0 dropped:0 overruns:0 frame:0
TX packets:503029 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:1142250236 (1.0 GiB)  TX bytes:43037541 (41.0 MiB)

@JamesH65
Copy link
Contributor

A similar issue has just come up on the linux networking list, unrelated to Pi. Suggestion is RX queueing problem. I'm not sure how to go further here, since I cannot see any dropped packets on the ethernet connection at all. This is a continuation of the previous stats, after leaving it over the weekend, not though with anything hammering the connection.

RX packets:1685234 errors:0 dropped:0 overruns:0 frame:0
TX packets:515057 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000 
RX bytes:1242494098 (1.1 GiB)  TX bytes:44171593 (42.1 MiB)

@JamesH65
Copy link
Contributor

@stamster Could you give me some idea of how your network is set up? Since I (nd others) am seeing no errors on the onboard Ethernet at all, it would be interesting to know if you have some odd setup that might be causing an issue.

@JamesH65 JamesH65 added the Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator. label Jul 27, 2017
@JamesH65
Copy link
Contributor

ping @stamster Are you still seeing this? If so are there any strange things on your network that might have some effect.

Also, I'm currently using Stretch with a 4.13 kernel and seeing no drops. It would be interesting, if you have the ability to build your own kernel and install it, to see if you still see errors with the latest kernel code.

@JamesH65
Copy link
Contributor

I tested with over 10GB of data sent back and forth with no dropped packets at all. This is on Stretch with a 4.13 kernel (admittedly not yet a release kernel, but will be released eventually). I'm inclined, unless I hear otherwise to close this issue.

@tonygunter
Copy link

I have seen discussions elsewhere that seemed to indicate that the ethernet connection is the most power hungry, and low power or inconsistent power to the RPi results in RX errors as the most obvious symptom. You may want to verify that the correct power is being supplied.

@raspberrypi raspberrypi deleted a comment from alvaroslm Oct 27, 2021
@raspberrypi raspberrypi deleted a comment from JamesH65 Oct 28, 2021
@raspberrypi raspberrypi deleted a comment from alvaroslm Oct 28, 2021
@raspberrypi raspberrypi deleted a comment from alvaroslm Oct 28, 2021
@raspberrypi raspberrypi deleted a comment from JamesH65 Oct 28, 2021
@raspberrypi raspberrypi deleted a comment from alvaroslm Oct 28, 2021
@raspberrypi raspberrypi deleted a comment from alvaroslm Oct 28, 2021
@raspberrypi raspberrypi deleted a comment from JamesH65 Oct 28, 2021
@pelwell
Copy link
Contributor

pelwell commented Oct 28, 2021

[ Unconstructive diversion deleted - I don't want to lock this thread ]

@alvaroslm
Copy link

You may as well lock it since you're not addressing the issue anyway. That's how things work around here...

@JamesH65
Copy link
Contributor

No, it really isn't how things work around here. If we close the issue, we no longer have it visible, so it gets forgotten and will never get fixed. Right now, it's visible, and when time allows someone will look at it. However, we are a tiny team, with a lot to do, and since this only reduces bandwidth, rather than killing anything completely, it's lower priority than many other issues.

@6by9
Copy link
Contributor

6by9 commented Oct 28, 2021

Not having a reliable way to reproduce an issue also makes it incredibly hard to work on.

Reports of it being seen on a Pi4 make it even stranger as Pi3B+ and Pi4 have a totally different ethernet interfaces to Pi1/2/3B, which makes it seem more systemic than just the ethernet chip driver. Pi4 is over a totally different interface too (inbuilt as opposed to USB).

@stamster
Copy link
Author

@6by9
But there seems to be a reliable way to reproduce it - just install newer kernel than this one:

#1954 (comment)

@hostingnuggets
Copy link

I have this issue with RX dropped packets on all of my Ubuntu 20.04 LTS Server on Raspberry Pi 4 8GB model as you can see below:

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether e4:5f:01:xx:xx:xx brd ff:ff:ff:ff:ff:ff
    RX: bytes  packets  errors  dropped overrun mcast   
    227473191  501667   0       8487    0       2842    
    TX: bytes  packets  errors  dropped carrier collsns 
    705780728  743859   0       0       0       0       

Are there any workarounds or fixes planned?

@bfreis
Copy link

bfreis commented Sep 24, 2024

Adding a bit of info to this.

  1. The issue happens on both my Pi 4b and my Pi 5 (the 4b I bought a few years back, the 5 is brand new).
  2. It happens whether they are connected via my wi-fi or wired to my switch.
  3. It happens in any networks I put them.
  4. Other devices in my network don't show this behavior.
  5. I'm getting roughly 1 RX packet drop every 5-10 seconds on the Pi 4, regardless of volume of packets or data. This is incredibly consistent — I enabled monitoring via Grafana and I can see it's incredibly stable at a minute average of 133m packet/s dropped (i.e. 0.133 ~= 1/7.5).
  6. One apparent correlation is that, if I have an SSH connection established to the Pi, it will systematically drop within tens of seconds to 1 or 2 minutes, with a "connection corrupted" error message, and showing an error of a packet with unreasonably large length; these issues seem to happen at exactly the same instant when there's an RX packet dropped, although I couldn't scientifically validate this. SSH is so unusable that I have to use mosh instead. One thread I found online suggested changing ciphers, but it happens regardless of the cipher used.
  7. I don't see any system logs that correlate with either the SSH connection drops, or the RX packet drops.

@alvaroslm
Copy link

alvaroslm commented Sep 25, 2024 via email

@pelwell
Copy link
Contributor

pelwell commented Sep 25, 2024

The issue happens on both my Pi 4b and my Pi 5

This is very fishy - 4B and 5 have nothing in common when it comes to Ethernet except the PHY. And this thread is about Pis 1-3, which are different again - their Ethernet was attached via USB, and the OTG USB host controller has well-known limitations. Pi 4 uses the on-board GENET controller, while Pi 5 uses the MAC on RP1.

One apparent correlation is that, if I have an SSH connection established to the Pi, it will systematically drop within tens of seconds to 1 or 2 minutes, with a "connection corrupted" error message, and showing an error of a packet with unreasonably large length

I've never seen any kind of unreliability on Ethernet, other than a few issues with Energy Efficient Ethernet when the link is being brought up.

This will never be fixed or acknowledged.

There are no secret problems we can't/don't talk about - there are some which are widely experienced and easily reproduced, and others which seem to only affect a handful of people and probably the result of unknown environmental factors.

They can't ban you from github but they will ban you from their forums if you point out the flaws and make too many questions.

Comments which are abusive have been, and will continue to be, deleted. Feel free to ask awkward questions, particularly if they include new information - "me too" posts don't help anyone.

@bfreis
Copy link

bfreis commented Sep 25, 2024

Hi @pelwell - thanks for the incredibly quick reply, I really appreciate it!

The issue happens on both my Pi 4b and my Pi 5

This is very fishy - 4B and 5 have nothing in common when it comes to Ethernet except the PHY. And this thread is about Pis 1-3, which are different again - their Ethernet was attached via USB, and the OTG USB host controller has well-known limitations. Pi 4 uses the on-board GENET controller, while Pi 5 uses the MAC on RP1.

Indeed - I didn't know these details, but assumed they were very different. I was actually very surprised to see the same issue on the 5!

One apparent correlation is that, if I have an SSH connection established to the Pi, it will systematically drop within tens of seconds to 1 or 2 minutes, with a "connection corrupted" error message, and showing an error of a packet with unreasonably large length

I've never seen any kind of unreliability on Ethernet, other than a few issues with Energy Efficient Ethernet when the link is being brought up.

Yup, it's all very strange!

By reading through the thread, it seems that you folks are having a hard time being able to repro the issue, and I'd love to provide more data in case that could help.

Do you have any suggestions of what else I could try to diagnose? Do you think some screen recording from my SSH client could be useful?

Let me know if you can think of any diagnostic steps beyond what I shared above. FWIW, these Pi 4b and 5 aren't really in any kind of "production" situation, so I'm happy to try things out and provide data, even if they're on the riskier side.

Thanks again!

@pelwell
Copy link
Contributor

pelwell commented Sep 25, 2024

i. What is at the other end of your Ethernet cable(s)?
ii. Have you tried changing cables, switch ports, etc?
iii. Does ifconfig eth0 show any errors?

@JamesH65
Copy link
Contributor

Many years ago I had a problem with ethernet dropping out when I used the Pi camera. It was a power supply issue. When the camera kicked in, the voltage dropped enough for the ethernet to drop out, but everything else continued normally. Could power be an issue?

@bfreis
Copy link

bfreis commented Sep 25, 2024

i. What is at the other end of your Ethernet cable(s)?

My network is eero-based. A gateway eero connects to my ISP and to a switch. The switch then goes to 2 more eeros, and and to a few other devices (Xbox, AVR, a caldigit ts3 where I dock my laptop). And there's a bunch of devices on the wifi. Pretty standard home network.

Specifically regarding the Pi(S): on Ethernet, I tried connecting them to the switch, as well as directly to the eeros, as well as wifi. In all cases I'm seeing the same RX drop issue.

ii. Have you tried changing cables, switch ports, etc?

Yup, different cables, different ports of the eero, different eeros, etc... 😔

iii. Does ifconfig eth0 show any errors?

No errors, just drops (ie RX errors 0 dropped 4843 overruns 0 frame 0). Fun fact - the drop rate is so incredibly consistent that I can pretty much measure uptime by it. 4843*7.5 = 36322, and the Pi is showing an uptime of 10h11min 🤯

@bfreis
Copy link

bfreis commented Sep 25, 2024

Many years ago I had a problem with ethernet dropping out when I used the Pi camera. It was a power supply issue. When the camera kicked in, the voltage dropped enough for the ethernet to drop out, but everything else continued normally. Could power be an issue?

Interesting! Just had one thought now that you mention.

For context -

The Pi 5 is the one from CanaKit with the M.2 HAT+ and NVMe. It also has an active fan. Nothing else AFAIK. I've tried both the 45W PD adapter that came with it, as well as a 140W MacBook charger.

The Pi 4b is simpler, just the Pi, a MicroSD (tried both an older 8GB slower one, as well as a 128GB newer and faster), and a fan (connect to GND and +5v). I tried with a 5.1V 3A adapter, as well as the 140W MacBook charger.

The one interesting thing I noticed on the 4b is that if I connect vs disconnect the fan the behavior does change a bit. I'll grab some data on this. Also, I haven't done any such experiments with the 5.

I'll share the data tomorrow.

Thanks for the ideas, folks!

@pelwell
Copy link
Contributor

pelwell commented Sep 25, 2024

From what I can see reading the code, dropped packets have been abandoned by the network stack, either after they have been received or before they are transmitted, because it can't cope with them at that time - possibly as a result of memory exhaustion.

@pelwell
Copy link
Contributor

pelwell commented Sep 25, 2024

Thinking about memory exhaustion:
iv. How much RAM does your 4B have?
v. What config.txt settings do you have?
vi. What software are you running? Include the OS version and any active apps.

@pelwell
Copy link
Contributor

pelwell commented Sep 25, 2024

vii. Is it possible that something on the network is generating "interesting" broadcast traffic every second?

@bfreis
Copy link

bfreis commented Sep 25, 2024

A few more things I noticed:

  • the RX dropped count increases extremely regularly at 5s then 10s, alternating. I.e., 5s wait, +1, 10s wait, +1, 5s wait, +1, 10s wait, +1. That's the 7.5s average I had seen before.
  • If I run tcpdump, the issue completely disappears (!!!) — I got the idea to try this from here: https://serverfault.com/a/601186

Answering your questions @pelwell:

iv. How much RAM does your 4B have?

2GB total, barely any used:

pi@raspberrypi:~ $ free -m
               total        used        free      shared  buff/cache   available
Mem:            1846         290         952           9         679        1556
Swap:              0           0           0

v. What config.txt settings do you have?

This is the contents currently: https://gist.github.com/bfreis/82b19ad44e0e6d08d74a969ce870a52f
IIRC, I didn't manually edit it since intalling Raspbian fresh recently.

vi. What software are you running? Include the OS version and any active apps.

pi@raspberrypi:~ $ uname -a
Linux raspberrypi 6.6.51-v8+ #1798 SMP PREEMPT Tue Sep 24 12:28:56 BST 2024 aarch64 GNU/Linux

I put here some info, including the output of systemctl | grep running, and raspinfo: https://gist.github.com/bfreis/8f66dd1a45892c18e454c52f59a07620

vii. Is it possible that something on the network is generating "interesting" broadcast traffic every second?

Maybe the eeros? I'm not sure. I'll investigate and report back. The 5s and 10s extremely regular cadence seems really suspicious.

@bfreis
Copy link

bfreis commented Sep 26, 2024

Ok, I think we're getting closer now!

Some new info:

  • I ran tcpdump -i eth0 ether type 0x9104, and I can see those 0x9104 broadcast packets coming from each of my eeros.
  • At the same time, I ran watch -i 0.1 ifconfig to see the increase on RX dropped, and on eth0 there's no increase (since tcpdump is running), but in wlan0 I can see increases in lockstep with the 0x9104 packets.

So I think we can assume that all those RX dropped are caused by those 0x9104 broadcast from the eeros! I wonder if the other folks reporting this similar issue above also had something broadcasting weird packets in their network.

I'm gonna see what I can correlate with the SSH issue, and share any findings. Thanks for the ideas!

@bfreis
Copy link

bfreis commented Sep 26, 2024

Ok, I figured it out!

I used mosh to connect to one of the Pis, and from there I SSH'ed into the other. There was no SSH failure.

My investigation then led me to the issue: my Macbook! It seems that the connection was being closed on my end, rather than on the Pi's end. Apparently there's some sort of bug on the Firewall stack on MacOS Sequoia that randomly kills SSH and SSL connections. And it gets even more strange: it seems to only affect my connections when using iTerm2, but it works fine when using Apple's Terminal app! It's impacting Chrome, etc. Another fun fact: if I try to download a large file via HTTPS with curl, it will always complete successfully on Apple's Terminal app, and it will hang forever on iTerm2 (e.g curl https://ash-speed.hetzner.com/1GB.bin -o /dev/null)

@pelwell
Copy link
Contributor

pelwell commented Sep 26, 2024

Congratulations - you did most of the heavy lifting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Waiting for external input Waiting for a comment from the originator of the issue, or a collaborator.
Projects
None yet
Development

No branches or pull requests