Linux Network Performance Ultimate Guide

betaby · 2024-07-27T13:53:28

"net.core.wmem_max: the upper limit of the TCP send buffer size. Similar to net.core.rmem_max (but for transimission)."

and then we have `net.ipv4.tcp_wmem` which bring two questions: 1. why there is no IPv6 equivalent and 2. what's the difference from `net.core.wmem_max` ?

adrian_b · 2024-07-27T14:11:39

net.core.wmem_max is a maximum value, as its name says.

net.ipv4.tcp_wmem is a triple value, with minimum, default and maximum values. The maximum given here cannot exceed the previous value.

TCP is a protocol that should be the same regardless whether it is transported by IPv4 or by IPv6.

See e.g.

https://docs.redhat.com/en/documentation/red_hat_data_grid/7...

betaby · 2024-07-27T14:25:45

So `net.ipv4.tcp_wmem` applies to IPv4 and IPv6? If so it's absolutely not obvious.

c0l0 · 2024-07-27T08:53:26

This would have been such a great resource for me just a few weeks ago!

We wanted to have finally encrypt the L2 links between our DCs and got quotes from a number of providers for hardware appliances, and I was like, "no WAY this ought to cost that much!', and went off to try to build something myself that hauled Ethernet frames over a wireguard overlay network at 10Gbps using COTS hardware. I did pull it off after a tenday of work or so, undercutting the cheapest offer by about 70% (and the most expensive one by about 95% or so...), but there was a lot of intricate reading and experimentation involved.

I am looking forward to validate my understanding against the content of this article - it looks very promising and comprehensive at first and second glance! Thanks for creating and posting it.

pgraf · 2024-07-27T11:57:08

If I may ask, what is your use case so that a L3 tunnel does not suffice?

freedomben · 2024-07-27T09:01:49

Are you able to share your code? I'd be fascinated to see how you would do that.

jasonjayr · 2024-07-27T11:50:59

I just shared this a moment ago in another comment, but:

https://github.com/m13253/VxWireguard-Generator

https://gitlab.com/NickCao/RAIT

Both build a set of Wireguard configurations so you can setup a L2 mesh, and then run whatever routing protocol you want on them (Babel, BGP, etc)

(not the OP, but I use these the first one in my own multi-site network mesh between DO, AWS, 2x physical DC, and our office.)

hyperman1 · 2024-07-27T10:30:35

I wonder if it's worth it, with this amount of tunables, to write software to tune them automatically, gradient decent wise: Choose parameter from a whitelist at random and slightly increase or decrease them, inside a permitted range. Measure performance for a while, then undo if things got worse, do some more if things got better.

dakiol · 2024-07-27T08:23:08

I find this cool, but as a software engineer I rarely get the chance to run any of the commands mentioned in the article. The reason: our systems run in containers that are stripped down versions of some Linux, and I don’t have shell access to production systems (and usually reproducing a bug on a dev or qa environment is useless because they are very different from prod in terms of load and the like).

So the only chance of running any of the commands in the article are when playing around with my own systems. I guess they would be useful too if I were working as Platform engineer.

znpy · 2024-07-27T12:29:41

Most of the low level stuff wouldn’t work or would be useless anyway, as most container network interface implementation will make you work with veth pairs and will do many userspace monstrosities.

This is one of the things I don’t like much about kubernetes: the networking model assume you only have one nic (like 99.99999% of cloud instances from cloud providers) and that your application is dumb enough not to need knowledge of anything beneath.

The whole networking model could really get a 2020-era overhaul for simplification and improvement.

Emigre_ · 2024-07-27T08:54:59

If you have a staging environment as similar as possible to production you can experiment and analyze stuff in an environment that's production-like but where you have access, this could help, depending on the situation.

totallyunknown · 2024-07-27T09:32:45

What's missing a bit here is debugging and tuning for >100 Gbps throughput. Serving HTTP at that scale often requires kTLS because the first bottleneck that appears is memory bandwidth. Tools like AMD μProf are very helpful for debugging this. eBPF-based continuous profiling is also helpful to understand exactly what's happening in the kernel and user-space. But overall, a good read!

rjgonza · 2024-07-27T09:50:19

This seems pretty cool, thanks for sharing. So far, at least in my career, whenever we need "performance" we start with kernel bypass.