Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

current rate limiting is ineffective #4359

Closed
marten-seemann opened this issue Mar 10, 2024 · 9 comments · Fixed by #4362
Closed

current rate limiting is ineffective #4359

marten-seemann opened this issue Mar 10, 2024 · 9 comments · Fixed by #4362
Milestone

Comments

@marten-seemann
Copy link
Member

The current rate limiting only works for well-behaved clients, but fails with an active attacker. The reason is that the current implementation imposes a concurrent limit. However, the concurrency is attacker-controlled: An attacker can send a ClientHello, immediately followed by an invalid frame (potentially in the same packet). The server would then start the TLS handshake and incur the CPU cost for that, only to immediately close the connection, which would then not be counted towards the limit anymore.

What we need to do instead is impose a limit that’s independent of what the attacker is doing. One option would be a leaky-bucket-like rate limiting, enforcing that we’re not handling more than X (unvalidated / total) handshakes per second, with a burst budget of Y.

x/time/rate contains a leaky-bucket rate limiter, which we might be able to use in the API: https://pkg.go.dev/golang.org/x/time/rate

@marten-seemann
Copy link
Member Author

We actually can make the API more general by using a function, and not a rate.Limiter:

RejectUnverifiedConnection(rcvTime time.Time) bool
RejectConnection(rcvTime time.Time) bool

If unset, we’d use the AllowN function of a rate.Limiter, initialized with reasonable limits.

It is expected that most users would use a rate.Limiter (with different than the defaults limits), but this API would allow users to go fancy and implement their own rate limiting logic.

@marten-seemann
Copy link
Member Author

marten-seemann commented Mar 11, 2024

Here's an updated proposal. We actually only need a single callback:

// VerifySourceAddress decides if a connection attempt originating from unvalidated source
// addresses first needs to go through source address validation using QUIC's Retry mechanism,
// as described in RFC 9000 section 8.1.2.
// Validating the source address adds one additional network roundtrip to the handshake,
// and should therefore only be used if a suspiciously high number of incoming connection is recorded.
// If unset, a rate limit of 128 unvalidated connection attempts per second will be applied.
// For most use cases, wrapping the AllowN function of a rate.Limiter will be a reasonable
// implementation of this callback.
VerifySourceAddress func(receiveTime time.Time) bool

Question: Is it reasonable to set a default value here?

It doesn't make sense to have a separate RejectConnection callback, since:

  • If it's called before VerifySourceAddress, an attacker can just flood us with packets (potentially with spoofed source addresses), until this limit is reached.
  • That means the only option is to call it after VerifySourceAddress. However, then you might as well do the rate limiting in GetConfigForClient, which is also called after VerifySourceAddress and just before we're about to create a new connection struct. With add an AddressVerified field to the ClientHelloInfo #4358, this will allow the application to:
    1. Impose a rate limit on the total number of handshakes. Note that this only protects against getting overloaded by new connection attempts (i.e. it prevents an attacker from bringing the CPU load to 100%). An attacker can still consume the entire rate limit by just spamming us with connection attempts, from a single machine.
    2. Perform blacklisting based on the remote address. This should only be done based on addresses that were verified, otherwise an attacker will be able to prevent legitimate clients from connecting by spoofing their source address. This now means that an attacker has to control a large number of IPs to saturate our limit.

That means that a reasonable implementation would look something like this:

limiter := rate.NewLimiter(rate, burst)
quicConf := &quic.Config{
    GetConfigForClient: func(info *ClientHelloInfo) (*quic.Config, error) {
        if !limiter.Allow() {
              return errors.New("rate limited")
        }
        if info.AddrVerified {
            // do IP-address based rate limiting based on info.RemoteAddr
        }
        return &quic.Config{} /* your config here */, nil
    },
}

@MarcoPolo
Copy link
Collaborator

I like the API as a greenfield design. I'm worried that reusing GetConfigForClient will prevent quic-go from introducing a reasonable default, and thus rely on all users to implement the rate limit logic in their own GetConfigForClient. That said, I'm not sure what a reasonable default would even be since it depends on the use case. And I do appreciate having a smaller api surface. So I think this is an okay tradeoff.

Does it make sense to allow verified IP addresses to have their own rate limit? If another node went through the process of verifying their address, should they still get blocked if a malicious actor is consuming the whole rate limit? I'm thinking something like this:

unverifiedLimiter := rate.NewLimiter(rate, burst)
verifiedLimiter := rate.NewLimiter(rate, burst)
quicConf := &quic.Config{
    GetConfigForClient: func(info *ClientHelloInfo) (*quic.Config, error) {
        if info.AddrVerified {
            if !verifiedLimiter.Allow() {
                return errors.New("rate limited")
            }
            // do IP-address based rate limiting based on info.RemoteAddr
        } else if !unverifiedLimiter.Allow() {
            return errors.New("rate limited")
        }
        return &quic.Config{} /* your config here */, nil
    },
}

@marten-seemann
Copy link
Member Author

I like the API as a greenfield design. I'm worried that reusing GetConfigForClient will prevent quic-go from introducing a reasonable default, and thus rely on all users to implement the rate limit logic in their own GetConfigForClient. That said, I'm not sure what a reasonable default would even be since it depends on the use case. And I do appreciate having a smaller api surface. So I think this is an okay tradeoff.

Thanks! I really appreciate it that you're evaluating the tradeoffs here.

Does it make sense to allow verified IP addresses to have their own rate limit? If another node went through the process of verifying their address, should they still get blocked if a malicious actor is consuming the whole rate limit?

I think the flow is a little bit different here, since it only takes 1 RTT to convert an unverified into a verified client. Here's how I imagine the layers of defence to work:

  1. If the rate of incoming connections is low, no client has to verify its source address. Handshakes complete in 1 RTT. This is the common case (if limits are set correctly).
  2. If the rate of incoming connections is higher, all connections above the rate limiter's steady state are forced into Retry, incurring a 1 RTT penalty to verify their address.
  3. If the rate is getting even higher, and threatening to overwhelm the CPU capacity, a hard rate limiting applies. Applications will need to come up with some kind of blacklisting of IP addresses / ranges, to make sure their capacity is not used up by connections coming from a single IP.

Does this make sense, @MarcoPolo?

@sukunrt
Copy link
Collaborator

sukunrt commented Mar 12, 2024

Do you think it would be useful to provide the source address here:

VerifySourceAddress func(addr net.Addr, receiveTime time.Time) bool

@marten-seemann
Copy link
Member Author

How would you act upon the source address here? It's not verified yet, so you'd be using spoofed information, right in the mechanism designed to prevent this spoofing.

@sukunrt
Copy link
Collaborator

sukunrt commented Mar 13, 2024

I am not sure how useful this is in the wild, but you can whitelist intranet addresses when you know the packets from outside will be rejected at ELB/Gateway etc if they present a source IP as the intranet IP.

@marten-seemann
Copy link
Member Author

Good point. We just need to be very clear that the source address cannot be trusted.

@MarcoPolo
Copy link
Collaborator

I think the flow is a little bit different here, since it only takes 1 RTT to convert an unverified into a verified client. Here's how I imagine the layers of defence to work:

  1. If the rate of incoming connections is low, no client has to verify its source address. Handshakes complete in 1 RTT. This is the common case (if limits are set correctly).
  2. If the rate of incoming connections is higher, all connections above the rate limiter's steady state are forced into Retry, incurring a 1 RTT penalty to verify their address.
  3. If the rate is getting even higher, and threatening to overwhelm the CPU capacity, a hard rate limiting applies. Applications will need to come up with some kind of blacklisting of IP addresses / ranges, to make sure their capacity is not used up by connections coming from a single IP.

Does this make sense, @MarcoPolo?

Yeah, that makes sense. I think I need to figure out an ergonomic way to define rate limits for libp2p users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants