ipc_transport_structured: Investigate: Msg_out - move ctor impl mystery (past instability). #20
Labels
from-akamai-pre-open
Issue origin is Akamai, before opening source
question
Further information is requested
Filed by @ygoldfeld pre-open-source:
Description follows. Environment (the only one, with many others tried!) where problem was actually observed:
Linux 5.4.... machine
CMake-built open-source
ipc
meta-projectclang-17 installed via apt-get = /usr/bin/clang =>
clang
due to PATHlibc++ (LLVM-10) installed (as opposed to using GNU stdc++, which is also installed but not used here) via apt-get
compiler selection:
-DCMAKE_CXX_COMPILER=clang++
compiler flags:
-DCMAKE_CXX_FLAGS_RELEASE='-stdlib=libc++ -I/usr/lib/llvm-10/include/c++/v1 -O3 -DNDEBUG -fno-omit-frame-pointer'
linker flags:
-DCMAKE_EXE_LINKER_FLAGS_RELEASE="-L/usr/lib/llvm-10/lib -lc++ -lc++abi -fno-omit-frame-pointer -O3"
-DCMAKE_BUILD_TYPE=Release
Plus CMake settings:
CFG_ENABLE_TEST_SUITE:BOOL=ON
CFG_NO_LTO:BOOL=OFF <-- Important
Lastly (important):
Problem reproduced without ASAN (-fsanitize=address in CMAKE_*_FLAGS_RELEASE)
Problem not reproduced with it (+ ASAN detects no errors)
Sufficient to build just libipc_unit_test.exec:
make -j32 libipcunit_test.exec
then go (from build dir) to test/suite/unit_test
and run the test in question: ./libipc_unit_test.exec --gtest_filter=Shm_session_test.In_process_array
Expected behavior: it passes; no seg-fault. Observed behavior as-is: it passes; no seg-fault.
The thing to investigate:
= default;
Expected behavior: it passes; no seg-fault. Observed behavior (with the above code change): it seg-faults in the move ctor. (In backtrace seen by
gdb libipc_unit_test.exec core
it is shown as being within Client_session_impl::mdt_builder() due to inlining.)(The mdt_builder() code tickling the problem is this:
The moving of
Msg_out req_msg
into the Master_channel_req member => seg-fault.)--
Now as to WTF this is all about, the comment in the code should explain it.
Please note: ASAN passes, as of this writing, throughout this unit test and all known tests (unit_test ones or transport_test ones or otherwise). It "feels" like there could be a gremlin from another thread, but further experiments and work seem to suggest otherwise. So one way or another it's a matter of investigating as noted in the above comment.
(Aside: It's true that as of the time of that writing no sanitizer had run from the code. As pointed out here, though, since then it absolutely has -- and passed. So the plot thickens a little.)
That said, there are no known functional or correctness problems in the actual committed code -- merely the mystery of why the change had to be made. Still I don't like such mysteries.
The text was updated successfully, but these errors were encountered: