KIT Artifact Evaluation

Kick the tire!

Installation

⏰Estimated time: 30 machine minutes + 5 human minutes.

First, run the following command to download the source code:

git clone --recurse-submodules [email protected]:rssys/kit-artifact.git

Then, check the build section in README.md to build KIT.

Next, we will setup the testsuite for evaluation. Run the following commands to setup. The script new_bugs/setup.sh will build the Linux v5.13 kernel with kernel memory access instrumented, build a Debian stretch image (requires root), and download test programs; the script known_bugs/setup.sh will download and decompress the kernel and VM images required to reproduce known bugs.

./new_bugs/setup.sh
./known_bugs/setup.sh

Basic test

⏰Estimated time: 20 machine minutes + 5 human minutes.

Run the script in new_bugs/basic_test.sh. This script will automatically run the whole pipeline of KIT using a mini test program corpus as input. In particular, the script will do:

profile the kernel memory access for each test program;
generate the test case prediction file based on the kernel data flow trace and cluster test cases using instruction address strategy (DF-IA);
execute test cases;
aggregate the test results;
analyze the statistics of test results.

The mini test program corpus contains test programs required to trigger all new bugs found by KIT (Table 2). At the end of the execution, you should expect to see the output that is similar to the below:

Bug ID    Sender call                                                           Receiver call                                                         Test report
1         socket$packet                                                         read$FUSE:fd$syz_open_procfs(/proc/net/ptype)|gid|pid|uid             27060_93663.json
2         setsockopt$inet6_IPV6_FLOWLABEL_MGR:sock_in6                          sendmsg$inet6:sock_l2tp6                                              39965_79183.json
3         bind$inet:sock_sctp                                                   bind$inet:sock_sctp                                                   82929_82929.json
4         setsockopt$inet6_IPV6_FLOWLABEL_MGR:sock_in6                          connect$inet6:sock_dccp6                                              39965_12463.json
5         socket$inet_smc:sock_tcp                                              read$FUSE:fd$syz_open_procfs(/proc/net/sockstat)|gid|pid|uid          35657_10120.json
6         getsockopt$SO_COOKIE:sock_udp6                                        getsockopt$SO_COOKIE:sock_udp6                                        3467_3467.json
7         sendto$inet:sock_sctp                                                 getsockopt$inet_sctp_SCTP_SOCKOPT_CONNECTX3:sock_sctp                 75793_45898.json
7         sendto$inet:sock_sctp                                                 getsockopt$inet_sctp_SCTP_SOCKOPT_CONNECTX3:sock_sctp                 75793_45898.json
7         getsockopt$inet_sctp_SCTP_SOCKOPT_CONNECTX3:sock_sctp                 getsockopt$inet_sctp_SCTP_SOCKOPT_CONNECTX3:sock_sctp                 45898_45898.json
8         syz_emit_ethernet                                                     read$FUSE:fd$syz_open_procfs(/proc/net/sockstat)|gid|pid|uid          5960_10120.json
9         sendto$inet:sock_tcp                                                  read$FUSE:fd$syz_open_procfs(/proc/net/protocols)|gid|pid|uid         39728_85119.json

                                 Bug ID
                  1   2   3   4   5   6   7   8   9
Filtered reports  2   1   1   1   2   1   4   1   1
AGG-RS groups     1   1   1   1   1   1   4   1   1
AGG-R groups      1   1   1   1   1   1   1   1   1

                  Total
Filtered reports  13
AGG-RS groups     11
AGG-R groups      8

This output is the analysis for the test report aggregation results, which is similar to Table 6. For the basic test, there is at least one test report for each bug. The output also provides an overview for selected test reports that indicate the functional interference bugs in Table 2.

⚠️Warning: the statistics results generated by the script new_bugs/aggregate_stats.py might introduce false negatives by missing some true test reports; this is because the script checks if one group belong to a certain bug by simply checking if the names of the culprit sender and receiver system call are within its database, which we collected from our prior analysis; however, there might be some 'out-of-band' test reports that are triggered by a one of the nine bugs we found, unfortunately, use the sender and receiver system calls that go beyond our database.

📝Note: you might notice that, in bug #3, the sender system call and receiver system call is bind$inet:sock_sctp, which means call bind system call on a 'sctp' socket. However, if you take a look at the test program (e.g., new_bugs/workdir/prog/82929), you will see that the file descriptor is created via a sctp socket system call, but it later turns into a RDS socket via a dup3 system call.

🐛Known issue: sometimes creating the VM snapshot (at the very begining of the profiling stage) will fail with the following output; just re-run the script if you run into this case.

2222/22/22 11:22:33 cannot create snapshot: loopback test fail: cannot read from comm 1: read ...new_bugs/workdir/basic_test/run/instance-0/virtio_pipe_1.out: i/o timeout

Evaluation

Find Functional Interference Bugs

⏰Estimated time: 1 machine day + 30 human minutes.

🎯This section aims to reproduce the results in Table 2, Table 4, Table 5 and Table 6.

📝Note: The performance of KIT mainly depends on how many VMs are spawned in parallel. The estimated time provided here is based on the observation of one of our past experiments, where we used a machine with about 50 threads and KIT spawned 40 VMs in parallel.

Table 2. To date, there are two upstream patches for the bugs found by KIT. Please check the patch for bug #1 and the patch for bug #2 & #4.

Run the command

VM_COUNT=<number_of_vm> new_bugs/test_df_ia.sh

, which will automatically:

run kernel memory access profiling on a corpus that consists of ~90000 test programs;
generate the test case prediction file based on the kernel data flow trace and cluster test cases using instruction address strategy (DF-IA);
execute test cases by keeping iterating over the clusters and pick one test case to execute from each cluster during each visit;

The environment variable VM_COUNT is used to specify the number of VMs to spawn. If you do not set this variable, the script will run KIT with $(nproc)*5/6 VMs spawned by default.

The test case execution will not stop until all test cases are consumed, which would take infinite time to finish. Thus, you need to terminate the testing manually (e.g., type Ctrl-C). In our experiment, we stopped the test case execution after all clusters were visited once. To reproduce the results, terminate the test case execution when there are about 1.13M test cases executed (Table 4). You can see on the test manager log how many test cases have been executed in the statTest field:

2222/03/44 02:33:44 exec=10, test=797654, timeout=0, hanged=0, duration=6h20m51s, result=1234, totResult=5678, throughput=0.00(test/s), [generator]: statTest=797732, numCls=1430736, clsIdx=1102300, clsScore=36, testIdx=0, testScore=36.000000...

📝Note: you might wonder, if we know that there are X clusters (in fact, you can find the number of clusters at the end of the test case prediction file generation), we can terminate after X test cases are executed so that all clusters had been visited for one time; well, you should terminate earlier than that; the number of test cases executed after going over all clusters is less than the number of clusters; for instance, in our experiment, there were 1.4M clusters in total, but KIT only executed 1.1M test cases to go over every cluster; this is because KIT memorizes the executed test cases to skip them in future; KIT will not pick any test case from one cluster, if all test cases in this cluster have been executed previously.

Since the test takes quite a long time, we highly recommend you run the script with tmux so you can let it run in the background.

After terminating the test case execution, run the following commands to generate test report aggregation result and the statistics of the aggregation result:

source ./ae_common.sh
# Aggregate test reports
result_dir="$(cat $AE_NEWBUGS_DF_IA_TEST_RUN_DIR/result_dir)"
result_cluster=$AE_NEWBUGS_DF_IA_TEST_DIR/aggregate.json
$MAIN_HOME/bin/resanalyze \
        -prog_dir $AE_NEWBUGS_DF_IA_TEST_PROG_DIR \
        -result_dir $result_dir \
        -result_cluster $result_cluster
# Print statistics of the test report aggregation
python3 ./aggregate_stats.py $result_cluster

Table 5. The totResult field of the test case execution log is the number of initial test reports without non-determinism filtering, which corresponds to the row of Initial reports in Table 5; the result field corresponds to the After non-det filtering row in Table 5. The After non-det + resource filtering row in Table 5 should be the same as the Total filtered reports in Table 6, so we leave it.

Table 2 and Table 6. The last command will output three tables similar to the ones shown in the basic test section. To further check the receiver system call trace difference between two executions, open the test report in $result_dir, and check fields v_prog_sctrace_pre and v_prog_sctrace, which correspond to the receiver system call trace w/o the sender program and w/ sender program. The 2nd and 3rd tables should contain similar results to Table 6 in the paper, which demonstrates how KIT aggregates the test reports. The 1st table displays results similar to Table 2.

Detect Known Isolation Bugs

⏰Estimated time: 10 machine minutes + 5 human minutes.

🎯This section aims to reproduce the results in Table 3.

Each directory in `known_bugs/src contains the C source code, test configurations, etc. required to trigger the bug. Feel free to inspect them. Below is a more detailed testing configuration.

ID	Directory	GCC	Kernel	Image
A	known_bugs/src/prio	4.8	de90a6bcaede	stretch
B	known_bugs/src/uevent	4.8	949db153b646	wheezy
C	known_bugs/src/ipvs	4.8	c5cc0c697149	wheezy
D	known_bugs/src/nf_conntrack	5	d0febd81ae77	stretch
E	known_bugs/src/io_uring	5	4c46bef2e96a	buster

📝Note: We provided the pre-built kernel and VM images for your convenience, since building them is time-consuming and generally requires considerable effort in terms of setting up a compatible compiler toolchain. The kernel building is done under a Ubuntu 16.04 docker image.

To reproduce all tests, simply run:

known_bugs/run.sh

Table 3. The test results are saved in known_bugs/<bug_name>/result/result-<timestamp>, which contains the receiver system call results collected when the receiver system call runs with and without the sender program.

Build old kernel from scratch (optional)

While we already provided the pre-built kernel, you can also try to build the kernel from scratch. We provide the utilities we used to build the old kernel.

📝Note: as the prerequisite, Docker must be installed and root privilege is required.

Run the following command to build the docker image. Feel free to choose the image name.

cd known_bugs/kernel_build/docker && sudo docker build -t <name_of_docker_image> .

Run the following command to build the kernel for reproducing a known bug. Note that $KERNEL is the path to a Linux source directory that you git cloned, so we can switch to any historic version of the kernel; the first argument to the script is the path to a directory in known_bugs/kernel_build; for instance, if you want to build the kernel for bug ipvs, then it should be known_bugs/kernel_build/ipvs.

DOCKER_IMAGE=<name_of_docker_image> \
KERNEL=<path_to_linux_src> \
known_bugs/kernel_build/build.sh known_bugs/kernel_build/<bug_name>

After the building is done, the kernel should be at $KERNEL/arch/x86/boot/bzImage.

Publications

Congyu Liu, Sishuai Gong, Pedro Fonseca. Kit: Testing OS-level Virtualization for Functional Interference Bugs. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Vancouver, Canada, 2023

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
kit @ ed56c8a		kit @ ed56c8a
known_bugs		known_bugs
new_bugs		new_bugs
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
ae_common.sh		ae_common.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KIT Artifact Evaluation

Kick the tire!

Installation

Basic test

Evaluation

Find Functional Interference Bugs

Detect Known Isolation Bugs

Build old kernel from scratch (optional)

Publications

About

Releases

Packages

Languages

License

rssys/kit-artifact

Folders and files

Latest commit

History

Repository files navigation

KIT Artifact Evaluation

Kick the tire!

Installation

Basic test

Evaluation

Find Functional Interference Bugs

Detect Known Isolation Bugs

Build old kernel from scratch (optional)

Publications

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages