Tags: DynamoRIO/dynamorio
Tags
i#6822 unscheduled: Add start-unscheduled support (#6851) Adds support for threads starting out in an "unscheduled" state. This is accomplished by always reading ahead in each input and looking for a TRACE_MARKER_TYPE_SYSCALL_UNSCHEDULE marker *before* the first instruction. Normally such a marker indicates the invocation of a system call and is after the system call instruction; for start-unscheduled threads it is present at the system call exit at the start of the trace. Changes the scheduler's virtual method process_next_initial_record() to make the booleans on finding certain markers input-and-output parameters and moves filetype marker handling and timestamp recording into the function. This also fixes a problem where an input's initial next_timestamp was replaced with the 2nd timestamp if a subclass read ahead. The extra readahead causes complexities elsewhere which are addressed: + The reader caches the last cpuid to use for synthetic recores on skipping. + Generalizes the existing scheduler handling of readahead (the "recorded_in_schedule" field in input_info_t) to store a count of pre-read instructions, which will generally be either 0 or 1. Adds a new internal interface get_instr_ordinal() to get the input reader's instruction ordinal minus the pre-read count. Changes raw2trace's virtual function process_marker_additionally() to process_marker() and moves all marker processing (including timestamps, which are not markers in the raw format) there, to better support subclasses inserting start-unscheduled markers and deciding whether to insert new markers either before or after pre-existing markers. Adds a scheduler test for the new feature. Issue: #6822
i#6662 public traces, part 5: func_id_filter_t (#6820) Adds a new filter: `func_id_filter_t` to record_filter, which filters TRACE_MARKER_TYPE_FUNC_ markers based on the function ID. The filter is enabled by `-filter_keep_func_ids` followed by a list of integers that represent the function IDs bound to TRACE_MARKER_TYPE_FUNC_ markers to keep in the trace. Specifically, whenever we encounter a TRACE_MARKER_TYPE_FUNC_ID marker whose marker value is in the list we set a per-shard flag to indicate that all TRACE_MARKER_TYPE_FUNC_[ID | ARG | RETVAL | RETADDR] markers related to that function ID need to be preserved. We remove the TRACE_MARKER_TYPE_FUNC_ markers related to functions whose ID is not in the list. This filter can be invoked with: ``` drrun -t drmemtrace -tool record_filter -filter_keep_func_ids 1,2,3,4 -indir path/to/input/trace -outdir path/to/output/trace ``` To preserve TRACE_MARKER_TYPE_FUNC_ markers related to functions with ID: 1, 2, 3, 4, and remove the TRACE_MARKER_TYPE_FUNC_ markers for all other ID values. We use this filter to preserve markers related to SYS_futex functions in the public release of traces. Issue #6662
Support subclassing drmemtrace syscall_mix data (#6834) Adds a virtual destructor to the drmemtrace tool syscall_mix_t::shard_data_t, to support subclassing that struct for extended usage such as tracking callstacks for each syscall.
i#6814: Fix stack overflow on signal delivery to mid-detach thread (#… …6815) Fixes two stack overflow scenarios that occur when DR delivers an app signal to the native signal handler for a thread that is mid-detach. First case: when a thread is handling the suspend signal and is waiting for the detacher thread to wake it up and tell it to continue detaching. Currently, DR unblocks signals before starting the wait. If the signal is delivered at this point, currently execute_native_handler() incorrectly delivers the signal to the native handler on DR's own signal stack. To fix this: we now do not unblock signals during this wait as it complicates native signal delivery, also for the second case described below. Additionally, for a detaching thread, we now do not explicitly restore the app's sigblocked mask; DR already restores the mask on the signal frame, which would be restored automatically when the thread returns from the DR detach signal handler. This avoids another case where the app may be on DR's signal stack when the native signal is delivered. Second case: when the thread has been woken up by the detacher thread, executed sig_detach, and reinstated the app signal stack (if available). If the signal is delivered at this point, execute_native_handler() adds a new signal frame on top of DR's own signal frame on the app stack and invokes the native signal handler. This sometimes ends up taking too much stack space which causes a stack overflow, as observed on an internal app with frequent profiling signals that use the stack-intensive libunwind to get a stack trace for all threads. To fix this: we reuse the same signal frame for delivering the signal to the native signal handler, when the app doesn't need a non-RT frame. The new code is exercised by the existing detach_signal test. Also modified the test to have some threads that have a very small sigstack, which helps reproduce the crash originally seen on a real app. (There was already a note in detach_signal test about using a large sigstack to avoid this stack overflow.) Tested on an internal app where failures reduce from ~136/4000 to ~1/4000. Issue: #6814
i#6662 public traces, part 4: view tool (#6816) Modifies the view tool to handle OFFLINE_FILE_TYPE_ARCH_REGDEPS traces, leveraging the disassembly of DR_ISA_REGDEPS instructions. When visualizing DR_ISA_REGDEPS instructions, the view tool still prints the instruction length and PC, which for OFFLINE_FILE_TYPE_ARCH_REGDEPS traces are the same as those in the original trace. Then, after the PC, the instruction encoding, categories, operation size, and registers are printed following the disassembly format of DR_ISA_REGDEPS instructions (xref: #6799). DR_ISA_REGDEPS instructions printed by the view tool look as follows: ``` [...] ifetch 10 byte(s) @ 0x00007f86ef03d107 00001931 04020204 load store [4byte] %rv0 %rv2 %rv36 -> %rv0 [...] 00000026 ``` We also fix a formatting bug in DR_ISA_REGDEPS instruction disassembly, where we were missing a new line when the instruction encoding spills into a second line. Issue: #6662
i#3544 RV64: Optimize private memcpy and memset (#6800) 1. Optimize private memcpy and memset for RV64. 2. Add test to compare private and libc memset. 3. Compare private memcpy with libc memcpy on more small sizes. 4. Fix a bug of core/CMakeLists.txt. For unit_tests, to compare private and libc memcpy, we should link unit_tests to drmemfuncs but not link to libc. Compare original memcpy&memset, optimized private memcpy&memset and glibc memcpy&memset. Test command: ``` ./bin64/unit_tests ``` When we use original memcpy and memset, outputs: ``` our_memcpy_time: size=1 time=0 libc_memcpy_time: size=1 time=2 our_memcpy_time: size=4 time=2 libc_memcpy_time: size=4 time=2 our_memcpy_time: size=128 time=16 libc_memcpy_time: size=128 time=4 our_memcpy_time: size=512 time=57 libc_memcpy_time: size=512 time=7 our_memcpy_time: size=8192 time=824 libc_memcpy_time: size=8192 time=79 our_memcpy_time: size=20480 time=2080 libc_memcpy_time: size=20480 time=183 our_memset_time: 4129 libc_memset_time: 292 io all done testing string done testing string ``` When we use optimized memcpy and memset, outputs: ``` our_memcpy_time: size=1 time=1 libc_memcpy_time: size=1 time=2 our_memcpy_time: size=4 time=1 libc_memcpy_time: size=4 time=3 our_memcpy_time: size=128 time=2 libc_memcpy_time: size=128 time=3 our_memcpy_time: size=512 time=7 libc_memcpy_time: size=512 time=7 our_memcpy_time: size=8192 time=72 libc_memcpy_time: size=8192 time=69 our_memcpy_time: size=20480 time=184 libc_memcpy_time: size=20480 time=175 our_memset_time: 307 libc_memset_time: 302 io all done testing string done testing string ``` Issue: #3544
Update incorrect docs about scheduler window ids (#6781) Updates the drmemtrace scheduler regions_of_interest docs which incorrectly stated the window id markers were not inserted between back-to-back regions: they are inserted, as the code confirms (with an explicit comment) and the unit tests check.
PreviousNext