PacketMill is a system for optimizing software packet processing, which (i) introduces a new model (X-Change) to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically Data Plane Development Kit or DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function.
Some of the symbols used in this image are modified versions of vectors published by www.freevector.com.
PacketMill is composed of three main components:
- X-Change: developed as an Application Programming Interface (API) within DPDK, which realizes customized buffers when using DPDK, rather than relying on the generic rte_mbuf structure;
- Source-code modifications: implemented on top of a resurrected & modified click-devirtualize, exploiting the information defining a NF (i.e., Click configuration file) to mitigate virtual calls, improve constant propagation & constant folding; and
- IR-based modifications: implemented as LLVM optimization passes applied to the complete program’s IR bitcode as extracted from Link-Time Optimization (LTO), which reoders the commonly used data structures (i.e., Packet class in FastClick).
For more information, please refer to PacketMill's paper. You can also check out our submitted extended abstract to ASPLOS'21.
We have developed/tested PacketMill for/on FastClick, but our techniques could be adapted to other packet processing frameworks.
We modified the MLX5
PMD used by Mellanox NICs in DPDK. However, X-Change is applicable to other drivers, as other (e.g., Intel) drivers are implemented similarly and have the same inefficiencies. Although we have only tested X-Change with FastClick, other DPDK-based packet processing frameworks (e.g., BESS and VPP) could equally benefit from X-Change, as our proposed model only modifies DPDK userspace drivers.
We implemented our source-code optimizations on top of click-devirtualize, but it is possible to apply the same optimization to other packet processing frameworks such as BESS and VPP.
This repository contains information/source code to use PacketMill and reproduce some of the results presented in our ASPLOS'21 paper.
Our paper' results are also available at paper-results/
.
The experiments are located at experiments/
. The folder has a Makefile
and README.md
that can be used to run the experiments.
Note: Before running the experiments, you need to prepare your testbed according to the following guidelines.
Our experiments mainly requires npf
, X-Change
, FastClick
, LLVM toolchain
, perf
, and pmu-tools
. There is a simple bash script (setup_repo.sh
) that could help you to clone/compile different repositories, but you should mainly rely on this README.md
.
You can install npf
via the following command:
python3 -m pip install --user npf
Do not forget to add export PATH=$PATH:~/.local/bin
to ~/.bashrc
or ~/.zshrc
. Otherwise, you cannot run npf-compare
and npf-run
commands.
NPF will look for cluster/
and repo/
in your current working/testie directory. We have included the required repo
for our experiments and a sample cluster
template, available at experiment/
. To setup your cluster, please check the guidelines for our previous paper. Additionally, you can check the NPF README file.
To build X-Change with clang (LTO), you can run the following commands:
git clone https://github.com/tbarbette/xchange.git
cd xchange
make install T=x86_64-native-linux-clanglto
After building X-Change, you have to define XCHG_SDK
and XCHG_TARGET
. To do so, run something similar to:
export XCHG_SDK=/home/alireza/packetmill/xchange/
export XCHG_TARGET=x86_64-native-linux-clanglto
We also use normal DPDK v20.02 in some scenarios. To build it, you can run the following commands:
git clone https://github.com/tbarbette/xchange.git dpdk
cd dpdk
git checkout v20.02
make install T=x86_64-native-linux-gcc
make install T=x86_64-native-linux-clang
After building DPDK, you have to define RTE_SDK
and RTE_TARGET
. To do so, run run something similar to:
export RTE_SDK=/home/alireza/packetmill/dpdk/
export RTE_TARGET=x86_64-native-linux-gcc
Make sure to define RTE_SDK
and XCHG_SDK
based on the location of dpdk
and xchange
directories.
Note that NPF
requires all three builds to perform the experiments. It uses gcc
build (of DPDK v20.02) for packet generation (i.e., default case) and the other two (i.e., X-Change with clanglto
and DPDK v20.02 with clang
) for other scenarios at the server side.
Fore more information, please check X-Change repository.
We used LLVM 10.0.0
in our paper. To install it, run the following command:
chmod +x llvm-clang.sh
sudo ./llvm-clang.sh 10
This command will also create some links to different LLVM tools and clang commands. Check the script (llvm-clang.sh
) for more details.
We use perf
and pmu-tools
to gather microarchitectural metrics. To install them, run the following commands:
sudo apt-get install linux-cloud-tools-$(uname -r) linux-tools-$(uname -r)
git clone https://github.com/andikleen/pmu-tools.git
NPF automatically clone and build FastClick for the experiments (based on the testie/npf file), but if you want to compile/build it manually with X-Change repo. You can run the following commands:
git clone --branch packetmill https://github.com/tbarbette/fastclick.git
cd fastclick
./configure --disable-linuxmodule --enable-userlevel --enable-user-multithread --enable-etherswitch --disable-dynamic-linking --enable-local --enable-dpdk=$XCHG_SDK --enable-research --disable-task-stats --enable-flow --enable-cpu-load --prefix $(pwd)/build/ --enable-intel-cpu --enable-dpdk-pool --enable-rand-align RTE_TARGET=x86_64-native-linux-clanglto CXX="clang++ -flto -fno-access-control" CC="clang -flto" CXXFLAGS="-std=gnu++14 -O3" LDFLAGS="-flto -fuse-ld=lld -Wl,-plugin-opt=save-temps" RANLIB="/bin/true" LD="ld.lld" READELF="llvm-readelf" AR="llvm-ar" --disable-bound-port-transfer --enable-dpdk-pool --enable-dpdk-xchg --disable-dpdk-packet --disable-dpdk-softqueue
make
sudo make uninstall
sudo make install
make install
requires some perl packages. For instance, you might need to run the following command if make install
fails.
sudo cpan File::Which
Note: if you have already exported X-Change (or DPDK) environment variables, you do not need to pass RTE_SDK
and/or RTE_TARGET
in the configure line. However, using --enable-dpdk=$RTE_SDK
is mandatory if you have a globally installed DPDK (e.g. using apt or ninja install); it forces FastClick to use the right DPDK (e.g., X-Change).
Building FastClick with this configuration uses X-Change by default, i.e., providing Packet
class to DPDK PMD (MLX5
). However, it is possible to use other metadata management techniques. The following list summarizes the required compilation flags for different metadata management models.
- X-Change:
--enable-dpdk-xchg --disable-dpdk-packet
- Copying:
--disable-dpdk-packet
- Overlaying:
--enable-dpdk-packet
For more information, please refer to PacketMill's paper.
We used bintuils 2.32
. To install/update bintuils, please refer to here. Beware binutils 2.30, which comes with Ubuntu 18.04 has a bug with AVX512. See DPDK commit e19c6de3.
We developed an optimization pass (via LLVM) that reorders the variables/fields of the Packet
class (i.e., the metadata structure in FastClick) based on the access pattern of the input binary. As our source-code modifications customizes the binary and removes the unused elements (source code), applying our pass together with other optimizations ultimately results in a customized data structure for the input NF configuration.
Our pass parses the whole-program IR code produced by LTO (clang) to finds the references (done by the NF)
to different variables/fields of the Packet
class, sorts these variables based on the estimated number of accesses to the variables, and fixes (repairs) the references to the Packet
class done by the LLVM’s GetElementPtrInst (GEPI) instruction.
After applying the pass, the output IR code can be relinked with the dynamic libraries to produce a new binary for FastClick.
Note that recompiling/relinking the IR code requires removing/striping module flags, we have a simple module pass that performs this task.
- Building: To compile our passes, run the following commands:
cd LLVM
mkdir build
cd build
cmake ..
make
- Using: Compiling FastClick & X-Change with LTO (clang) while using
plugin-opt=save-temps
flag produces four IR bitcode (i.e., whole-program IR code):
click.0.0.preopt.bc
click.0.2.internalize.bc
click.0.4.opt.bc
click.0.5.precodegen.bc
or
embedclick.0.0.preopt.bc
embedclick.0.2.internalize.bc
embedclick.0.4.opt.bc
embedclick.0.5.precodegen.bc
You can use llvm-dis
tool to convert them into human-readable LLVM assembly language. For example, try:
llvm-dis click.0.5.precodegen.bc -o click.ll
or
llvm-dis embedclick.0.5.precodegen.bc -o embedclick.ll
You can apply the passes on the embedclick.ll
via the following commands:
cd /home/alireza/fastclick/userlevel/
opt -S -load /home/alireza/packetmill/LLVM/build/class-stripmoduleflags-pass/libClassStripModuleFlagsPass.so -strip-module-flags click.ll -o click.ll
opt -S -load /home/alireza/packetmill/LLVM/build/class-handpick-pass/libClassHandpickPass.so -handpick-packet-class click.ll -o click.ll
opt -S -O3 click.ll -o click.ll
make click-opt
or
cd /home/alireza/fastclick/userlevel/
opt -S -load /home/alireza/packetmill/LLVM/build/class-stripmoduleflags-pass/libClassStripModuleFlagsPass.so -strip-module-flags embedclick.ll -o embedclick.ll
opt -S -load /home/alireza/packetmill/LLVM/build/class-handpick-pass/libClassHandpickPass.so -handpick-packet-class embedclick.ll -o embedclick.ll
opt -S -O3 embedclick.ll -o embedclick.ll
make embedclick-opt
- Adrian Sampson blog post: https://www.cs.cornell.edu/~asampson/blog/llvm.html
- Adrian Sampson Skeleton pass: https://github.com/sampsyo/llvm-pass-skeleton
- Writing an LLVM Pass: https://llvm.org/docs/WritingAnLLVMPass.html
If you use PacketMill or X-Change in any context, please cite our paper:
@inproceedings{farshin-packetmill,
author = {Farshin, Alireza and Barbette, Tom and Roozbeh, Amir and {Maguire Jr.}, Gerald Q. and Kosti\'{c}, Dejan},
title = {{PacketMill: Toward per-Core 100-Gbps Networking}},
year = {2021},
isbn = {9781450383172},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3445814.3446724},
doi = {10.1145/3445814.3446724},
abstract = {We present PacketMill, a system for optimizing software packet processing, which (i) introduces a new model to efficiently manage packet metadata and (ii) employs code-optimization techniques to better utilize commodity hardware. PacketMill grinds the whole packet processing stack, from the high-level network function configuration file to the low-level userspace network (specifically DPDK) drivers, to mitigate inefficiencies and produce a customized binary for a given network function. Our evaluation results show that PacketMill increases throughput (up to 36.4 Gbps -- 70%) & reduces latency (up to 101 us -- 28%) and enables nontrivial packet processing (e.g., router) at ~100 Gbps, when new packets arrive >10\texttimes{} faster than main memory access times, while using only one processing core.},
booktitle = {Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems},
pages = {1–17},
numpages = {17},
keywords = {DPDK, Full-Stack Optimization, FastClick, 100-Gbps Networking, Packet Processing, Commodity Hardware, PacketMill, Compiler Optimizations, LLVM, Metadata Management, Middleboxes, X-Change},
location = {Virtual, USA},
series = {ASPLOS 2021}
}
If you have any questions regarding our code or the paper, you can contact Alireza Farshin (farshin at kth.se) and/or Tom Barbette (barbette at kth.se).