Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using veth and namespaces instead perhaps #59

Open
dtaht opened this issue Sep 4, 2022 · 7 comments
Open

Using veth and namespaces instead perhaps #59

dtaht opened this issue Sep 4, 2022 · 7 comments
Assignees
Labels
documentation Improvements or additions to documentation
Milestone

Comments

@dtaht
Copy link
Collaborator

dtaht commented Sep 4, 2022

This is an example proof of concept that uses veth interfaces instead of complicated filters and rules. I don't know if this
would be better than how libreqos works today, and not faster (I think), but has a couple potential advantages,
in that it allows for routing in addition to bridging. Also there's been some recent work on preserving the tx timestamp from
ingress to egress even through namespaces (kernel 5.18 and later), (Cilium has a blog entry on what you do), you don't need to use tc-mirred, you can use the cake integral shaper on the customer interfaces, and you can name interfaces after customers, and iptables firewall rules etc, just work.

Downsides include I don't know how to bridge it properly without thinking hard about it, you end up with multiple route tables with sometimes mysterious side-effects, and an extra "hop" in the network. I also personally find it hard to wrap my head around
how namespaces work in general. Anyway, a quick and drity example:

#!/bin/sh

ip netns add test

# you'd have separate namespaces I think for each ap
#ip netns add in
#ip netns add out
#ip netns add ap

ip link add h1-eth0 type veth peer name h2-eth0 netns test
ip link set dev h1-eth0 up
ip netns exec test ip link set dev h2-eth0 up

ip addr add 10.0.1.1/24 dev h1-eth0
ip netns exec test ip addr add 10.0.1.2/24 dev h2-eth0
ip netns exec test ip addr add 240.0.0.1/32 dev h2-eth0

tc qdisc replace dev h1-eth0 root cake bandwidth 60mbit
ip netns exec test tc qdisc replace dev h2-eth0 root cake bandwidth 6mbit
ip netns exec test netserver -N

ip route add 240.0.0.0/4 dev h1-eth0
netperf -H 240.0.0.1
disc cake 8197: dev h1-eth0 root refcnt 2 bandwidth 60Mbit diffserv3 triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw overhead 0 
 Sent 75543078 bytes 52632 pkt (dropped 0, overlimits 54955 requeues 0) 
 backlog 0b 0p requeues 0
 memory used: 120032b of 4Mb
 capacity estimate: 60Mbit
 min/max network layer size:           42 /    1514
 min/max overhead-adjusted size:       42 /    1514
 average network hdr offset:           14

                   Bulk  Best Effort        Voice
  thresh       3750Kbit       60Mbit       15Mbit
  target          5.0ms        5.0ms        5.0ms
  interval      100.0ms      100.0ms      100.0ms
  pk_delay          0us        104us         11us
  av_delay          0us         16us          0us
  sp_delay          0us          2us          0us
  backlog            0b           0b           0b
  pkts                0        52627            5
  bytes               0     75542868          210
  way_inds            0            0            0
  way_miss            0           27            1
  way_cols            0            0            0
  drops               0            0            0
  marks               0            1            0
  ack_drop            0            0            0
  sp_flows            0            2            0
  bk_flows            0            1            0
  un_flows            0            0            0
  max_len             0        33308           42
  quantum           300         1514          457

~

@dtaht
Copy link
Collaborator Author

dtaht commented Sep 5, 2022

This is another futuristic thing. I have to note that this method only works on containers and namespaces, and only for locally sourced tcp stacks, not for routing packets, and I don't think they've ever tried to run a rrul test with this design.They have regular meetings on wednesdays at 8AM PST, and I might attend one day.

https://isovalent.com/blog/post/addressing-bandwidth-exhaustion-with-cilium-bandwidth-manager/

@dtaht dtaht added this to the v1.4 milestone Nov 13, 2022
@dtaht
Copy link
Collaborator Author

dtaht commented Nov 13, 2022

We're going to end up using namespaces for #153 anyway...

@dtaht dtaht self-assigned this Jan 12, 2023
@dtaht dtaht modified the milestones: v1.4, v1.5 Jan 12, 2023
@dtaht
Copy link
Collaborator Author

dtaht commented Jan 16, 2023

I attempted to use namespaces to simulate 1k users. That didn't go well, with dynamic routing in play (1k babel daemons). Without dynamic routing, a bit more setup, haven't got to it.

@thebracket
Copy link
Collaborator

I do this daily, and we've deployed it as a means for working with bonded interfaces.

@rchac rchac modified the milestones: v1.5 Beta, v1.6 May 30, 2024
@thebracket
Copy link
Collaborator

Here's my script:

#!/bin/bash

# Set the number of rx/tx queues to create
NUM_QUEUES=1

## This script creates two `veth` devices, each in their own namespace.
## Each is assigned an address (192.168.66.1/30 and 192.168.66.2/30)
## They won't be able to ping each other until a bridge is made available.
## The idea is to simulate a middle-box setup (like `lqosd`), allowing
## `iperf` and other tests between the two.

#######################################################################
#
# USAGE:
#
# ./testbed.sh params
# Params can be multiples of the following:
# q <num_queues>         -- Sets the number of rx/tx queues to create
# setup                  -- Creates the veth devices and namespaces
# bridge                 -- Creates a Linux bridge (br0) and adds the veth devices to it
#                           This is for base-line setup, or if you need a complex setup
# cleanup                -- Deletes the veth devices and namespaces
# iperf_server           -- Runs iperf server in ns_external
# iperf_client           -- Runs iperf client in ns_internal
# iperf_kill_server      -- Kills iperf server in ns_external
# checksum               -- Disables checksum calculation with ethtool.
#                           You need this for AF_XDP bridges with veth.
# lossy <delay> <jitter> <loss>
#                        -- Add simulation to the network to make it suck.
# nat                    -- Adds a route between veth_external and main with NAT
#######################################################################

function setup_testbed() {
    sudo ip netns add ns_external
    sudo ip netns add ns_internal

    if ((NUM_QUEUES > 1)); then
        sudo ip link add veth_external numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 123 type veth peer name veth_toexternal numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 124
        sudo ip link add veth_internal numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 125 type veth peer name veth_tointernal numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 126
    else
        echo "(warning) Creating single queue veths"
        sudo ip link add veth_external type veth peer name veth_toexternal
        sudo ip link add veth_internal type veth peer name veth_tointernal
    fi

    sudo ip link set veth_external netns ns_external
    sudo ip link set veth_internal netns ns_internal

    sudo ip netns exec ns_external ip addr add 192.168.66.1/30 dev veth_external
    sudo ip netns exec ns_internal ip addr add 192.168.66.2/30 dev veth_internal

    sudo ip netns exec ns_external ip link set veth_external up
    sudo ip netns exec ns_internal ip link set veth_internal up
    sudo ip link set veth_toexternal up
    sudo ip link set veth_tointernal up
}

function setup_nat() {
    # Create a routed interface to carry data from ns_external back to the main network    
    sudo ip link add veth_route_main numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 120 type veth peer name veth_route_ext numrxqueues $NUM_QUEUES numtxqueues $NUM_QUEUES index 121
    sudo ip link set veth_route_ext netns ns_external
    sudo ip link set veth_route_main up
    sudo ip netns exec ns_external ip link set veth_route_ext up
    sudo ip netns exec ns_external ip addr add 192.168.65.2/30 dev veth_route_ext
    sudo ip addr add 192.168.65.1/30 dev veth_route_main
    sudo ip route add 192.168.66.0/30 via 192.168.65.2
    sudo ip netns exec ns_external ip route add 0.0.0.0/0 via 192.168.65.1
    sudo ip netns exec ns_internal ip route add 0.0.0.0/0 via 192.168.66.1

    # Enable routing
    sudo sysctl -w net.ipv4.ip_forward=1
    sudo iptables -t nat -A POSTROUTING -o wlo1 -j MASQUERADE
    sudo iptables -A FORWARD -i veth_route_main -o veth_external -m state --state RELATED,ESTABLISHED -j ACCEPT

    # Inside the ns_internal, mount some things so we can run stuff
    sudo ip netns exec ns_internal mount -t cgroup2 cgroup2 /sys/fs/cgroup
    sudo ip netns exec ns_internal mount -t securityfs securityfs /sys/kernel/security/
}

function setup_bridge() {
    echo "Setting up the bridge"
    sudo ip link add name br0 type bridge
    sudo ip link set veth_toexternal master br0
    sudo ip link set veth_tointernal master br0
    sudo ip link set br0 up
}

function no_checksums() {
    sudo ip netns exec ns_internal ethtool -K veth_internal tx off
    sudo ip netns exec ns_external ethtool -K veth_external tx off
}

function cleanup_testbed() {
    sudo ip link del br0
    sudo ip link del veth_toexternal
    sudo ip link del veth_tointernal
    sudo ip link del veth_route_main
    sudo ip netns del ns_external
    sudo ip netns del ns_internal
}

function iperf_server() {
    sudo ip netns exec ns_external iperf -s &
}

function iperf_client() {
    sudo ip netns exec ns_internal iperf -c 192.168.66.1
}

function iperf_kill_server() {
    sudo killall iperf
}

for i in "$@"; do
    case $i in
        setup)
            setup_testbed
            shift # past argument=value
            ;;
        bridge)
            setup_bridge
            shift # past argument=value
            ;;
        nat)
            setup_nat
            shift # past argument=value
            ;;
        cleanup)
            cleanup_testbed
            shift # past argument=value
            ;;
        iperf_server)
            iperf_server
            shift # past argument=value
            ;;
        iperf_client)
            iperf_client
            shift # past argument=value
            ;;
        iperf_kill_server)
            iperf_kill_server
            shift # past argument=value
            ;;
        q)
            NUM_QUEUES=$2
            shift # past argument=value
            shift # Since we're reading two
            ;;
        lossy)
            DELAY1=$2
            DELAY2=$3
            LOSS=$4
            sudo ip netns exec ns_external tc qdisc replace dev veth_external root netem delay ${DELAY1}ms ${DELAY}2ms loss ${LOSS}%
            sudo ip netns exec ns_internal tc qdisc replace dev veth_internal root netem delay ${DELAY1}ms ${DELAY2}ms loss ${LOSS}%
            shift
            shift # Delay
            shift # Delay 2
            shift # Loss
            ;;
        checksum)
            no_checksums
            shift # past argument=value
            ;;
        offload)
            sudo ip netns exec ns_internal ethtool -K veth_internal rxvlan off
            sudo ip netns exec ns_internal ethtool -K veth_internal txvlan off
            sudo ip netns exec ns_internal ethtool -K veth_internal gso off
            sudo ip netns exec ns_internal ethtool -K veth_internal tso off
            sudo ip netns exec ns_internal ethtool -K veth_internal lro off
            sudo ip netns exec ns_internal ethtool -K veth_internal sg off
            sudo ip netns exec ns_internal ethtool -K veth_internal gro off

            sudo ip netns exec ns_external ethtool -K veth_external rxvlan off
            sudo ip netns exec ns_external ethtool -K veth_external txvlan off
            sudo ip netns exec ns_external ethtool -K veth_external gso off
            sudo ip netns exec ns_external ethtool -K veth_external tso off
            sudo ip netns exec ns_external ethtool -K veth_external lro off
            sudo ip netns exec ns_external ethtool -K veth_external sg off
            sudo ip netns exec ns_external ethtool -K veth_external gro off
            shift # past argument=value
            ;;
    esac
done

@thebracket
Copy link
Collaborator

This setup works pretty well, especially if you want to provide other services on your box. I think we should focus on documenting "recipes" for using it.

@thebracket thebracket added the documentation Improvements or additions to documentation label Jul 9, 2024
@dtaht
Copy link
Collaborator Author

dtaht commented Jul 30, 2024

Nice script!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants