Skip to content

State of art reference

Loic Jaquemet edited this page Jan 9, 2017 · 11 revisions

In Q4 2016, reddit finally helped me identify the relevant academic field. below is a short list of what I found and find relevant.

As of 2016, the state of art is best described by Lin, Zhiqiang, Juan Caballero, IMDEA Software Institute

Identification of data structure in memory dump of executed sample

  • Laika: Digging For Data Structures, 2008
    Keywords: Bayesian unsupervised learning, typed pointer
    Parse the whole memory pages without precise identification of memory chunks. Basic type heuristics. Introduce the indea of typed pointer, where same-size records with pointers values in same offset would probably be the same type.
    Applied to malware triage using bayesian learning.
    NB: Concept tested in haystack circa 2011-2012
    ***TODO: look for follow-ups. heap walker anyone ? Sigpath, ***

  • SigGraph: Brute Force Scanning of Kernel Data Structure Instances Using Graph-based Signatures, 2011
    Keywords: data structure isomorphism, signature creation, Kernel rootkit detection, kernel version inference
    Detection of kernel structures in memory dump. Compiler based approach. Generate signatures based on source code (AST).
    NB: Concept very similar to haystack but in kernel space

  • DIMSUM: Discovering Semantic Data of Interest from Un-mappable Memory with Confidence (2012) Keywords: unmapped memory, probabilistic inference, freed data recovery.
    Recovery of data structures from unmapped memory(Linux, Android), post process termination remnants. No mappings information required. Search for known structure definitions.
    Probabilistic inference and constraint solving. Use primitive, pointer, structural, same-page, semantic, and staged constraints.
    *** Todo: check constraint algo for float/double and pointer/same page constraints.***

  • sigpath: A Memory Graph Based Approach for Program Data Introspection and Modication Keywords: path signature, sequence of snapshots, read and write, game hacking, memory graphing, structural memory diffing Generate memory walking paths to efficiently locate specific data structure in subsequent execution.
    Use process allocation metadata to extract allocation chunks.
    *** similar to haystack-watch ***

  • MACE: High-Coverage and Robust Memory Analysis For Commodity Operating Systems Keywords: kernel structures, rootkits detection. Limit constraints to pointers, on purpose. Full structure reverse graph of kernel objects.
    Type inference on pointer field and function pointers.
    First generation graph model, from known memory baseline, then applies to same-os target snapshot.
    Probabilistic model allows for wrong pointer values.
    Is there a public volatility module ?
    Can be applied in volatility to detect hidden object, hooked pointers.
    Could be interesting to bind MACE to libvirt for real time analysis.
    *** vaguely similar to haystack 2012 pointer signature but in kernel space + known definitions as anchor points***

  • KOP

  • MAS
    Microsoft efforts at rootkit detection in kernel memory dumps.
    Massive data source.
    Based on known type definition. See MACE and Siggraph.

Identification of type in (static|dynamic) reverse analysis of samples

  • TIE: Principled reverse engineering of types in binary programs, 2011
    Keywords: Type reconstruction, code analysis, constraint solving
    Apply constraints to variables and record members based on code semantics. Turn constraints into types.

Identification of data structures in dynamic execution of samples

  • ASI: Aggregate Structure Identification and its Application to Program Analysis, 1999 Keywords: atomisation of aggregates
    Itemize the members of structures by analysis of access patterns.

  • VSA: Value Set analysis ?

  • Daikon: The Daikon system for dynamic detection of likely invariants Keywords: value set analysis, signature creation.
    Could potentially be used for constraints generation.

  • RDS: Recursive data structure profiling, 2005
    Keywords: profiling, shape graph analysis, Allocation dynamic analysis
    It creates a graph of relation between records based on pointer values. Note: I'm not sure I understand the finality of using dynamic analysis for this

  • DDT: design and evaluation of a dynamic program analysis for optimizing data structure usage, 2009
    Keywords: data structure identification, Allocation dynamic analysis, interface detection, memory graph
    Create a memory graph. Takes into account the allocation-site to type a memory chunk. Allow performance analysis of linked data structure (lists, tree).

  • Rewards: Automatic reverse engineering of data structures from binary execution, 2010
    Keywords: allocation dynamic analysis, type sink, memory graph
    Builds on ASI ? Infer primitive data types in static code section. Using semantics to infer record type argument of known function. Resolve variable type live and offline based on type propagation. Relies on C, python and Intel PIN. Can be applied to a memory dump (binary in dump)

  • Howard: a dynamic excavator for reverse engineering data structures, 2011
    Keywords: allocation dynamic analysis, type sink, array detection Analysis access pattern to determine field size. Uses allocation-locality, types sink and typed pointer to type records. Array detection using loop unrolling and access analysis (linked list {prev,next} pointers). Generation of new debug symbols. Based on KLEE for instrumentation. Focus of work seems to be performance and memory protection. NB: CodeSurfer/x86 is a combination of VSA and ASI.

  • ARTISTE: Automatic Generation of Hybrid Data Structure Signatures from Binary Code Executions (2012) Keywords: type reversing/structure signature, shape analysis, type sink, dynamic analysis.
    Reverse structure type/signature based on memory shape analysis and dynamic analysis.

  • MemPick: High-Level Data Structure Detection in C/C++ Binaries, 2013
    Keywords: Shape analysis, allocation dynamic analysis, overlay
    Uses Intel PIN to instrument allocation primitives. Creates a graph based on pointer values? Type analysis. Overlays detection. Classify linked list, trees shapes. Performance assessment of tree usage.
    Overall, conclusion seems focused on detection of linked list structures (hint at performance domain).

Terminology

  • Shape analysis: Discovers and verifies properties of linked data structure to verify high level correctness of programs.
  • Allocation dynamic analysis: Instrumentation of sample to identify allocated memory block. Not heap walking. Require one or multiple run of the sample.
  • Heap walking: Identify allocated memory block by following the static memory dump heap management structure.
  • Low-level data structure identification: Detection of primitive types or records.
  • High-level data structure identification: Recursive data structure and linked (pointer) structures.