This is an implementation of translation validation for GCC (similar to LLVM's Alive2), used to find bugs in the compiler.
The main functionality is in a plugin, which is passed to GCC when compiling:
gcc -O3 -fplugin=smtgcc-tv file.c
This plugin checks the GCC IR (Intermediate Representation) before and after each optimization pass and reports an error if the IR after a pass is not a refinement of the input IR (i.e., the optimized code does not behave the same as the input source code, indicating that GCC has miscompiled the program). While the tool has some limitations, it has already discovered several bugs in GCC. A partial list of bugs found includes: 106513, 106523, 106744, 106883, 106884, 106990, 108625, 109626, 110434, 110487, 110495, 110554, 110760, 111257, 111280, 111494, 112736, 113588, 113590, 113630, 113703, 114032, 114056, 114090, 116120, 116355, 117186, 117688, 117690, 117692.
The implementation is described in a series of blog posts. The first posts describe a previous version of this tool (pysmtgcc), but the general ideas are the same for both tools:
- Writing a GCC plugin in Python
- Verifying GCC optimizations using an SMT solver
- Memory representation
- Address calculations
- Pointer alignment
- Problems with pointers
- Uninitialized memory
- Control flow
You must have the Z3 SMT solver installed. For example, as
sudo apt install libz3-dev
Configuring and building smtgcc
is done by configure
and make
, and you must specify the target compiler for which to build the GCC plugins
./configure --with-target-compiler=/path/to/install/bin/gcc
make
smtgcc-tv compares the IR before/after each GIMPLE pass and complains if the resulting IR is not a refinement of the input (i.e. if the GIMPLE pass miscompiled the program).
For example, compiling the function foo
from PR 111494
int a[32];
int foo(int n) {
int sum = 0;
for (int i = 0; i < n; i++)
sum += a[i];
return sum;
}
with a compiler where the bug is not fixed (for example, the current trunk GCC) using smtgcc-tv.so
gcc -O3 -fno-strict-aliasing -c -fplugin=/path/to/smtgcc-tv.so pr111494.c
gives us the output
pr111494.c: In function 'foo':
pr111494.c:2:5: note: ifcvt -> dce: Transformation is not correct (UB)
.param0 = #x00000005
.memory = (let ((a!1 (store (store (store ((as const (Array (_ BitVec 64) (_ BitVec 8)))
[...]
telling us that the output IR of the dce pass is not a refinement of the input that comes from ifcvt (in this case the error is in the vectorizer pass, but we are treating vect followed by dce as one pass because of PR 111257) because the result has more UB than the original, and the tool give us an example for n = 5
where this happens.
Limitations in the current version:
- Function calls are not implemented.
- Exceptions are not implemented.
- Only tested on C and C++ source code.
- Only little endian targets are supported.
- Irreducible loops are not handled.
- Memory semantics is not correct:
- Strict aliasing does not work, so you must pass
-fno-strict-aliasing
to the compiler. - Handling of pointer provenance is too restrictive.
- ...
- Strict aliasing does not work, so you must pass
smtgcc-tv-backend compares the IR from the last GIMPLE pass with the generated assembly code and reports an error if the resulting assembly code is not a refinement of the IR (i.e., if the backend has miscompiled the program).
The plugin supports the RISC-V RV32G and RV64G base profiles and the Zba, Zbb, Zbc, and Zbs extensions. It can be invoked as follows:
riscv64-elf-gcc -O3 -march=rv64gc_zba_zbb_zbs -fno-section-anchors -fno-strict-aliasing -c -fplugin=/path/to/smtgcc-tv-backend.so file.c
The limitations of smtgcc-tv-backend include all those listed for smtgcc-tv, with the following additional limitations:
- You must pass
-fno-section-anchors
to the compiler. - Address allocation of local variables differs between the IR and the generated assembly. As a result, the plugin disables checks when a local address is written to memory or cast to an integer type to avoid false positives.
smtgcc-check-refine requires the translation unit to consist of two functions named src
and tgt
, and it verifies that tgt
is a refinement of src
.
For example, testing changing the order of signed addition
int src(int a, int b, int c)
{
return a + c + b;
}
int tgt(int a, int b, int c)
{
return a + b + c;
}
by compiling as
gcc -O3 -fno-strict-aliasing -c -fplugin=/path/to/smtgcc-check-refine.so example.c
gives us the output
example.c: In function 'tgt':
example.c:6:5: note: Transformation is not correct (UB)
.param2 = #x9620d6eb
.param0 = #x7edbb92a
.param1 = #x062be612
telling us that tgt
invokes undefined behavior in cases where src
does not,
and gives us an example of input where this happens (the values are, unfortunately, written as unsigned values. In this case, it means [c = -1776232725, a = 2128329002, b = 103540242]
).
Note: smtgcc-check-refine works on the IR from the ssa pass, i.e., early enough that the compiler has not done many optimizations. But GCC does peephole optimizations earlier (even when compiling as -O0
), so we need to prevent that from happening when testing such optimizations. The pre-GIMPLE optimizations are done one statement at a time, so we can disable the optimization by splitting the optimized pattern into two statements. For example, to check the following optimization
-(a - b) -> b - a
we can write the test as
int src(int a, int b)
{
int t = a - b;
return -t;
}
int tgt(int a, int b)
{
return b - a;
}
Another way to verify such optimizations is to write the test in GIMPLE and pass the -fgimple
flag to the compiler.
It is good practice to check with -fdump-tree-ssa
that the IR used by the tool looks as expected.
smtgcc-check-refine has the same limitations as smtgcc-tv.
SMTGCC_VERBOSE
— Print debug information while running. Valid value 0-2, higher value prints more information (Default: 0)SMTGCC_TIMEOUT
— SMT solver timeout (Default: 120000)SMTGCC_MEMORY_LIMIT
— SMT solver memory use limit in megabytes (Default: 10240)