Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove iptables rule for SCTP checksum fixup #47952

Open
robmry opened this issue Jun 11, 2024 · 7 comments
Open

Remove iptables rule for SCTP checksum fixup #47952

robmry opened this issue Jun 11, 2024 · 7 comments
Assignees
Labels
area/networking/d/bridge area/networking kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/0-triage

Comments

@robmry
Copy link
Contributor

robmry commented Jun 11, 2024

Description

@akerouanton noted in #47871 (comment) that this code for SCTP checksum fixup can now be removed ...

if b.Proto == types.SCTP {
// Linux kernel v4.9 and below enables NETIF_F_SCTP_CRC for veth by
// the following commit.
// This introduces a problem when combined with a physical NIC without
// NETIF_F_SCTP_CRC. As for a workaround, here we add an iptables entry
// to fill the checksum.
//
// https://github.com/torvalds/linux/commit/c80fafbbb59ef9924962f83aac85531039395b18
args = []string{
"-p", b.Proto.String(),
"--sport", strconv.Itoa(int(b.Port)),
"-j", "CHECKSUM",
"--checksum-fill",
}
rule := iptRule{ipv: ipv, table: iptables.Mangle, chain: "POSTROUTING", args: args}
if err := programChainRule(rule, "MASQUERADE", enable); err != nil {
return err
}
}

@robmry robmry added status/0-triage kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. area/networking area/networking/d/bridge labels Jun 11, 2024
@robmry robmry self-assigned this Jun 11, 2024
@Hoernchen
Copy link

It's quite the coincidence that I ended up finding this ticket, because my Issue is that this specific rule basically prevents using the same sctp port for local (docker nw) and remote (->reachable via physical nw if) clients at the same time, the sctp conn for local clients basically times out after INIT/INIT_ACK

This specifically breaks having both remote and local "sw only/zmq" enbs at the same time with the https://github.com/herlesupreeth/docker_open5gs setup.

@robmry
Copy link
Contributor Author

robmry commented Jul 5, 2024

Having looked at this a bit more closely, I'm not confident about removing the rule ...

The comment says "Linux kernel v4.9 and below enables NETIF_F_SCTP_CRC for veth by the following commit" ... the commit adds NETIF_F_SCTP_CRC to VETH_FEATURES, in kernel 4.9 and later.

I think that means the net/sctp code won't add its own checksum if the packet's transmitted on a veth device.

Perhaps the issue is that the veth device doesn't actually add the checksum. In which case, if the packet ends up being transmitted via a NIC that doesn't add the checksum either, there's no checksum ... and this mangle rule is to fix it. If that's right, the mangle rule would be needed as a workaround for kernel 4.9 and above (rather than for kernels older than 4.9).

Needs further investigation / testing ...

@Hoernchen
Copy link

Yes, the comment is misleading, it is actually for newer kernels that claim hw checksum support. See https://patchwork.ozlabs.org/project/netdev/patch/[email protected]/ for a slightly more recent discussion that kind of explains the issue. I am not certain that the rule is still required tho, 4.9 was a long time ago, and it was apparently very easy to hit that (missing) checksum issue...

But I guess the safe fix here would be to at least adjust the rule to ensure it does not apply to internal traffic between containers - internal sctp traffic currently works fine for any other port, but ends up with a wrong checksum when hitting that rule.

@laf0rge
Copy link

laf0rge commented Jul 5, 2024

Just to clarify: NETIF_F_SCTP_CRC was introduced to veth in

commit c80fafbbb59ef9924962f83aac85531039395b18
Author: Xin Long <[email protected]>
Date:   Thu Aug 25 13:21:49 2016 +0800

    veth: sctp: add NETIF_F_SCTP_CRC to device features

which is present in 4.9-rc1 and newer (up to todays torvalds' master)

@laf0rge
Copy link

laf0rge commented Jul 5, 2024

What I find a bit confusing is that the CHECKSUM target (xt_CHECKSUM.c) calls skb_checksum_help, which is a function caring about the classic 1-complement checksum, while the NETIF_F_SCTP_CRC flag is about the CRC32 checksum of SCTP.

So I don't really understand how using the iptables CHECKSUM target should ever have fixed something elated to the CRC32 of SCTP.

@laf0rge
Copy link

laf0rge commented Jul 5, 2024

Furthermore, since kernel v4.19, the xt_CHECKSUM.c explicitly states it should only be used for UDP and only in the OUTPUT chain - while docker is using it for SCTP in the PREROUTING chain:

pr_warn_once("CHECKSUM should be avoided.  If really needed, restrict with \"-p udp\" and only use in OUTPUT\n");

@robmry
Copy link
Contributor Author

robmry commented Jul 9, 2024

Thank you @Hoernchen and @laf0rge, that's really helpful.

We just discussed this on the moby n/w maintainers call (@corhere, @akerouanton) ... as the code doesn't really make sense to any of us, we'll disable it. Not expecting anything to break, but we'll include an environment-variable escape hatch / way to re-enable it for a release or-so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/networking/d/bridge area/networking kind/enhancement Enhancements are not bugs or new features but can improve usability or performance. status/0-triage
Projects
None yet
Development

No branches or pull requests

3 participants