-
Notifications
You must be signed in to change notification settings - Fork 776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugins/meta/bandwidth: traffic shaping plugin #96
plugins/meta/bandwidth: traffic shaping plugin #96
Conversation
build.sh
Outdated
@@ -19,7 +19,7 @@ export GOPATH=${PWD}/gopath | |||
mkdir -p "${PWD}/bin" | |||
|
|||
echo "Building plugins" | |||
PLUGINS="plugins/meta/* plugins/main/* plugins/ipam/* plugins/sample" | |||
PLUGINS="plugins/meta/* plugins/main/* plugins/ipam/* plugins/sample plugins/tc/*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. you'll need to add it to plugins/linux_only.txt
also to fix the AppVeyor (Windows CI) build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! I left a few small suggestions.
Also, I was hoping it would be possible to pull in some of the integration test coverage that we developed for the Silk CNI traffic-shaping, i.e. the tests that actually measure and assert on bandwidth for ingress and egress. Perhaps by chaining this plugin after ptp
or bridge
and asserting on the resulting state?
plugins/tc/tbf/main.go
Outdated
// limitations under the License. | ||
|
||
// This is a sample chained plugin that supports multiple CNI versions. It | ||
// parses prevResult according to the cniVersion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either remove this comment or make it more descriptive
plugins/tc/tbf/main.go
Outdated
|
||
// PluginConf is whatever you expect your configuration json to be. This is whatever | ||
// is passed in on stdin. Your plugin may wish to expose its functionality via | ||
// runtime args, see CONVENTIONS.md in the CNI spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like they were copied from the sample. Not sure if they are are helpful here.
plugins/tc/tbf/main.go
Outdated
} | ||
err = nonContainerNetNs.Do(func(netNS ns.NetNS) error { | ||
nonContainerIfaceName := getRoutableHostIF(cip.Address.IP) | ||
if nonContainerIfaceName != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can assume that this plugin is being called from the "host" namespace. In other words, the current NS when the plugin starts is the place to put host devices. That should remove the need for these nested loop and conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right - plugins are always called from the host.
plugins/tc/tbf/main.go
Outdated
} | ||
|
||
if conf.Rate == 0 && conf.Burst == 0 { | ||
return types.PrintResult(conf.PrevResult, conf.CNIVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a comment here, e.g. no traffic shaping was requested, so just no-op and quit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a second pass showed some more serious issues that will require changes. let me know if you have questions about this.
plugins/tc/tbf/main.go
Outdated
err = nonContainerNetNs.Do(func(netNS ns.NetNS) error { | ||
nonContainerIfaceName := getRoutableHostIF(cip.Address.IP) | ||
if nonContainerIfaceName != "" { | ||
ifbDeviceName := os.Getenv("CNI_IFNAME") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, also, this is not correct. According to the CNI spec, CNI_IFNAME
is the desired name for the primary network interface inside the container, i.e. eth0
. It is guaranteed unique only inside the container network namespace, not the host. You'll need to generate a unique name for the host-side interface -- perhaps by hashing the (network name, container id)
pair (see here for some considerations), using a random name (see here), or by using an auto-incrementing integer on the system. You may need to store this name on the local filesystem in order to properly clean it up when DEL
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of generating a deterministic unique name for the host side interface. I have pushed a change to compute the sha1 of the (network name, container id) and only using the first 15 characters of the hex digest.
Thus on cmdDel i am able to determine the device name to delete.
On the off chance that the sha1 conflicts with an interface that already exists, this plugin will fail... If we think this scenario is likely enough to cause issues for people, I can change to the random name approach + stored state. (i.e. if the random name conflicts, generate a new random name until no conflict)
plugins/tc/tbf/main.go
Outdated
} | ||
_ = conf | ||
|
||
for _, prevIface := range conf.PrevResult.Interfaces { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I'm afraid PrevResult
will not be populated on a DEL
call. The runtime will execute the plugins in reverse order. (also, to top it off, most plugins don't return any data on a successful DEL
).
So to know the interface name to teardown, you may need some other tricks. One way is that ADD
may store some state (i.e. the interface name) on the filesystem, and DEL
reads it back out. See the other comment about the IFB device name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dealt with this by generating a deterministic unique name (sha1 of network name and container id).
I like the idea of adding a measure test and asserting ingress/egress shaping. I can chain it with some plugins too. I can work on that next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great progress, left a few more comments but I think it's really coming together.
Two other high-level things:
- consider renaming: maybe something like
plugins/meta/bandwidth
? - please a README! explain how this works, link to relevant docs.
plugins/tc/tbf/main.go
Outdated
return "", err | ||
} | ||
|
||
return fmt.Sprintf("%x", hash.Sum(nil))[:15], nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok, but I'd like other @containernetworking/cni-maintainers to weigh in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd prefer a short prefix (<= 4 characters). That should still leave enough entropy (>=2^44) even just using the basic hex encoding.
plugins/tc/tbf/main.go
Outdated
} | ||
|
||
if len(containerIPConfigs) == 0 { | ||
return fmt.Errorf("got no container IPs") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother with this check? I don't see containerIPConfigs
used later in this function.
plugins/tc/tbf/main.go
Outdated
|
||
for _, prevIface := range conf.PrevResult.Interfaces { | ||
if prevIface.Sandbox != "" { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I wonder about readability of this code though.
One option is to extract a private function with a signature like
func getHostInterface(ifaces []*current.Interface)` (*current.Interface, error) { ... }
You could keep the Sandbox
emptiness check in there, and do stuff like have it error if no matching interfaces are found.
That would make it more clear what this code is doing. It would also have the side benefit of removing the for-loop indentation from the main code-path here, so your top-level function would read
hostInterfaceface, err := getHostInterface(conf.PrevResult.Interfaces)
if err != nil {
return err
}
err = CreateIngressQdisc(conf.Rate, conf.Burst, hostInterface.Name)
...
plugins/tc/tbf/main.go
Outdated
if err != nil { | ||
return err | ||
} | ||
_ = conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy-pasta from the plugin template example :-(
plugins/tc/tbf/tbf_linux_test.go
Outdated
}) | ||
}) | ||
|
||
Expect(runtimeWithLimit).To(BeNumerically(">", runtimeWithoutLimit+1000*time.Millisecond)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome. I love this.
plugins/tc/tbf/tbf_linux_test.go
Outdated
], | ||
"routes": [] | ||
} | ||
}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I see you've got a more integration-type test with the PTP plugin below. Is there a reason you want to test with a literal result JSON as well? (Not opposed, just curious.)
plugins/tc/tbf/tbf_suite_test.go
Outdated
})).To(Succeed()) | ||
} | ||
|
||
func startTcpServerInNS(containerNS ns.NetNS, laddr string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance, this sounds a lot like the echosvr
we use in the portmap
integration tests.
Perhaps there's an opportunity for code-reuse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for pointing this out. I like the idea of code-reuse.
I am finding using echosvr a bit tricky to use for this particular test. Mainly because the echosvr tries to read a fixed amount of bytes and then continues to write back what it read and close the connection. However in the measure case I want to ensure that the entire message is sent and read by the server. I did this previously by sending a message with a trailing new-line character as a signal to the server that the entire message had been sent, and that it could then write back a response so the client could know that the server received the entire message.
I have tried to get this to work by tweaking the num of packets sent to the 'echo server' and tbf's burst/limit sent to fit into the buffer size of 512. However this results in the time taken with/without tbf to be negligible (~100 millisecond difference)
I have tried to increase the echosvr
buffer size to allow sending/reading a larger payload, but conn.Read
seems like it reads as much as the read syscall
is set to (which when I run is consistently 61232 bytes). If i send < 61232 to the echosvr
it results in a broken pipe error
. Haven't looked too much into it but I suspect that since ingress and egress traffic are both shaped, that when the echo svr
tries to write back a large payload it gets shaped, and it closes the connection prematurely.
Your thoughts are much appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy with replacing the echoserver with this code. It really should be in another process, if only because Go namespace crappiness has caused us a lot of grief and flaky tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thanks for the reply. will do
Thanks for the comments.
|
plugins/meta/bandwidth/README.md
Outdated
"burst": 456 | ||
} | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful to have a more fleshed out example of chaining, e.g. with bridge first and then this.
plugins/meta/bandwidth/README.md
Outdated
|
||
This plugin configures a token bucket filter (tbf) queing discipline (qdisc) on both ingress and egress traffic. Resulting in traffic being shaped when reading / writing. | ||
|
||
Due to limitations on tc shaping rules for ingress, this plugin creates an Intermediate Functional Block device (ifb) to redirect packets from the target interface. tc tbf is then applied to the ifb device. The packets that were redirected to the ifb devices, are written OUT (and shaped) to the target interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You say "target" interface but it may be more clear to distinguish between host-side and container-side interfaces here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with a couple tiny README nits.
Also please update the commit message with new location and a brief title, e.g.
plugins/meta/bandwidth: traffic shaping plugin
<more detail here>
fyi
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
plugins/meta/bandwidth/README.md
Outdated
|
||
This plugin provides a way to use and configure Linux's Traffic control (tc) subystem. tc encompasses the sets of mechanisms and operations by which packets are queued for transmission/reception on a network interface. | ||
|
||
This plugin configures a token bucket filter (tbf) queing discipline (qdisc) on both ingress and egress traffic. Resulting in traffic being shaped when reading / writing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queing -> queuing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch. fixed typo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good. Just a few small things.
plugins/meta/bandwidth/README.md
Outdated
"name": "mynet", | ||
"plugins": [ | ||
{ | ||
"name": "my-net", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "name" field here (and below) isn't needed.
plugins/meta/bandwidth/README.md
Outdated
``` | ||
|
||
|
||
1. The following is the json payload the container runtime would need to perform during the ADD action: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably don't need to explain plugin chaining in this README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed plugin chaining examples. pushed.
plugins/meta/bandwidth/main.go
Outdated
PrevResult *current.Result `json:"-"` | ||
|
||
// Add plugin-specifc flags here | ||
Rate int `json:"rate"` //Bandwidth rate in Kbps for traffic through container. 0 for no limit. If rate is set, burst must also be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add separate ingress and egress rates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. pushed.
plugins/meta/bandwidth/main.go
Outdated
switch { | ||
case conf.Burst < 0 || conf.Rate < 0: | ||
return nil, fmt.Errorf("rate and burst must be a positive integer") | ||
case conf.Burst == 0 && conf.Rate != 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather infer when burst == 0
that burst = rate
. But this is subjective.
plugins/meta/bandwidth/main.go
Outdated
|
||
conf.PrevResult.Interfaces = append(conf.PrevResult.Interfaces, ¤t.Interface{ | ||
Name: ifbDeviceName, | ||
Sandbox: hostInterface.Sandbox, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sandbox should not be set, since the IFB device isn't in the sandbox / container netns.
@@ -0,0 +1,165 @@ | |||
package bandwidth | |||
|
|||
// Copyright 2015 CNI authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2018 :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy new year :-)
fyi @matthewdupre @squeed the requested changes have been pushed. |
plugins/meta/bandwidth/main.go
Outdated
// removed (though you may wish to error if a non-chainable plugin is | ||
// chained. | ||
// If you need to modify the result before returning it, you will need | ||
// to actually convert it to a concrete versioned struct. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be removed - it is advice to the plugin author.
It could be replaced with a comment about how you resolved the decisions.
plugins/meta/bandwidth/main.go
Outdated
RawPrevResult *map[string]interface{} `json:"prevResult"` | ||
PrevResult *current.Result `json:"-"` | ||
|
||
// Add plugin-specifc flags here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be removed - it is advice to the plugin author.
plugins/meta/bandwidth/main.go
Outdated
continue | ||
} | ||
|
||
return prevIface, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the maintainers meeting someone pointed out that this will break for the bridge plugin, since bridge returns two non-sandbox interfaces: the bridge itself, and the container veth, in that order.
We're considering how best to resolve the issue: perhaps by adding a peerIndex
field to the result structure so that the prior plugin can specify which interface is the container's peer veth.
Alternately, you may be able to detect whether a particular interface is a peer of the sandbox one, cc @dcbw and @bboreham for possible tricks here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rosenhouse implementing the "get peer index" library code for #75 brought up the following question: what do we consider a "peer"? Of course we know what that should be for veth devices. But what other device types does this plugin support?
eg, does it support ipvlan or macvlan? In that case, there is only a "parent" device that may have multiple "child" ipvlan/macvlan devices attached. The child devices all have the IFLA_LINK attribute (which is what veth sets to indicate its peer) set to the parent device that they all share.
We can obviously define a peerIndex
field however we like in the CNI spec, but it would be a shame if that was veth-only or relied on some implied behavior of the LInux kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, lets do the hack to detect the peer on the host side, and then update the readme to call out that this only works with plugins that put a peer veth on the host.
. "github.com/onsi/gomega" | ||
"github.com/onsi/gomega/gexec" | ||
"github.com/vishvananda/netlink" | ||
"net" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits:
move "net" and "time" right below "fmt"?
Hi all, Any chance to push it forward? I have been thinking about supporting traffic shaping for kubernetes cni and this work relies on it. |
Yeah, since #75 is still not merged, I would suggest we copy over the peer-discovery function from there into here. I can probably get to that this week if @DennisDenuto doesn't beat me too it. |
apologies for the delay, i have been a bit preoccupied. @rosenhouse If you have the peer-discovery function, please feel free to add it. I won't be able to look into it this week. I would probably have time on the weekend. |
I've pushed a couple commits here to address the nits, and opened #120 so we can use that library function from this PR. |
#120 has been merged so this can get rebased and updated to use GetVethPeerIfindex(). |
Friendly ping @DennisDenuto ~ |
Dennis, feel free to pick this up if you want, but if not, I can pair with @chinangela and team on it next week. |
Looking forward for this to be merged. 👍 |
@rosenhouse yeah that would be great if you could pair on it with the c2c team. Thanks |
Add chained plugin to add a tbf qdisc to shape ingress/egress traffic
Coming to this a bit late so apologies if it's already been discussed, but have folks considered using the capabilities mechanism + convention to allow runtime configuration of the plugin, similar to how the portmap plugin works today? |
@DennisDenuto / @rosenhouse is this ready to go? |
I can be the volunteer to implement what @caseydavenport want in followup when this PR gets in, if you are happy with that :) |
@m1093782566 Cool. I personally think submitting that as a separate PR makes sense. |
Agree! Sorry for keeping making noise, but please keep in mind that kubelet cni traffic shaping relies on this feature :) Thanks! |
When chained with a plugin that returns multiple devices, the bandwidth plugin chooses the host veth device. Signed-off-by: Tyler Schultz <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good besides that tiny thing with err
being nil.
plugins/meta/bandwidth/main.go
Outdated
} | ||
|
||
return nil, errors.New("no host interface found") | ||
return nil, errors.New("no host interface found: " + err.Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the loop never runs, then err
won't be set, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that would have nil pointed. Fixed.
|
||
It("supports add and del with bridge + bandwidth", func() { | ||
basicAssertion("chained-bridge-bandwidth", "10.11.2.") | ||
By(fmt.Sprintf("adding %s to %s\n\n", "basic-bridge", contNS2.ShortName()), func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I slightly prefer the By(...)
syntax where there isn't another nested func()
, e.g. https://onsi.github.io/ginkgo/#documenting-complex-its-by
This makes it so you don't have to do weird things like lift variables outside.
But no big deal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed the nesting and localized some vars.
Format plugin code Signed-off-by: Aidan Obley <[email protected]>
@squeed I think this is ready to go. WDYT? |
It looks pretty good now and can meet the basic needs of kubelet. |
} | ||
}, | ||
{ | ||
"name": "slowdown", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: this name
isn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for Casey's nit, I think we should merge and fix with follow-ups.
As mentioned in #94
An attempt to port
github.com/cloudfoundry/cf-networking-release
traffic control tbf (ingress/egress) logic into a chained plugin