You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Non faulty code generation on operator build when several python threads are each compiling different module/operators.
Actual behavior
When launching in parallel, for instance in a thread pool the creation and build of an operator, one may encounter a Segfault on HashTable insertion (race on iteration/insert).
This bug is flaky as it is highly dependent of the machine/number of threads/compiled workload.
Environment
TVM: v0.16.0
Target device: llvm host cpu
LLVM: llvm-config-12
Kernel: Linux 5.10.0-27-amd
Distro: Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux
Archi: 52 Cores Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
A simple fix is to define the cache thread_local in this declaration and at the definition point.
Though there may be some more elegant fix such as not using a static cache but a per compilation context cache.
Find there a test script which reproduce the issue by massively launching parallel build of a matmul operator: multithreaded-bug.py.gz
Launch it with (note that I use setarch -R in an attempt to be more reproducible), this was run on the machine described above (52 cores):
Using synchronization may be more costly and error prone than a simple thread local cache as proposed above, you may experiment with both.
Though, actually I didn't understand the rationale for making à global expression cache. Shouldn't the cache be local to some function scope, hence a bare member of some parent object instead of a global state?
Perhaps were there some experiments conducted at the time of this addition?
There is an issue due to a race in the TVM/TIR optimization passes when several distinct python threads are each compiling some operator.
The race occurs in the common subexpression elimination which uses a cache of expression/statements map which is declared static.
The cache should be attached to the build context or at least declared
thread_local
The faulty cache declaration is located there: https://github.com/apache/tvm/blob/v0.16.0/src/tir/transforms/common_subexpr_elim_tools.h#L115
Expected behavior
Non faulty code generation on operator build when several python threads are each compiling different module/operators.
Actual behavior
When launching in parallel, for instance in a thread pool the creation and build of an operator, one may encounter a Segfault on HashTable insertion (race on iteration/insert).
This bug is flaky as it is highly dependent of the machine/number of threads/compiled workload.
Environment
TVM: v0.16.0
Target device: llvm host cpu
LLVM: llvm-config-12
Kernel: Linux 5.10.0-27-amd
Distro: Debian 5.10.205-2 (2023-12-31) x86_64 GNU/Linux
Archi: 52 Cores Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
Steps to reproduce
The problem arises due to a statically declared cache in: https://github.com/apache/tvm/blob/v0.16.0/src/tir/transforms/common_subexpr_elim_tools.h#L115
A simple fix is to define the cache
thread_local
in this declaration and at the definition point.Though there may be some more elegant fix such as not using a static cache but a per compilation context cache.
Find there a test script which reproduce the issue by massively launching parallel build of a matmul operator: multithreaded-bug.py.gz
Launch it with (note that I use
setarch -R
in an attempt to be more reproducible), this was run on the machine described above (52 cores):Note that the bug is flaky, if not reproduced, try to play with the number of parallel threads and the total number of tasks, for instance:
Also, one can play on the
sys.setswitchinterval(0.00001)
in the file, by lowering or increasing the context switch interval.By applying the simple
thread_local
fix, the bug is not visible anymore. Ref to the attached patch file for the fix:0001-Bugfix-TIR-Fix-race-on-ComputationCache.patch.gz
Triage
The text was updated successfully, but these errors were encountered: