Skip to content

DonggeLiu/Legion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Principes

Second version of Legion, with progresses and TODOs

TODO

Version control

  1. Independent repository
  2. An online doc

Runner Optimisation

  1. Test tracejump
  2. Replace QEMU with tracejump
  3. tracejump optimisation:
    • Investigate the difference between tracejump instrumentation and SIMGR

Tracer optimisation

  1. Check into constraints() to see how constraints are collected
  2. In expansion stage, run tracer starting from the node selected in tree policy, instead of from the root.
    • Call step() on states:
      • Cannot tell which successor to choose
    • simgr.explore():
      • Cannot use it together with tracer
    • simgr.run():
      • Runs into a dead-end state
      • Uses step() internally
    • Fixed the logic to choose successors
  3. Run on pre-instrumentation binary

Program Under Test

  1. Program with loops:
    • Why constraints are missing?:
      • Cause repeated bytes recorded by tracejump are not recorded by SIMGR
    • match the bytes recorded by tracejump with the ones in SIMGR
  2. CGC programs
  3. LAVA-M programs
  4. Four-byte-word sample PUT
  5. Replace QEMU with tracejump

Solver optimisation

  1. Quick Sampler
  2. Keep $\delta$ instead of constraints?

Experiments

  1. Compare time: Legion - tracejump ?= random - tracejump:
    • Legion is way more slower on one-byte-input
    • Test on inputs with more bytes (choke-point)
  2. simpler loop:
    • simple_while.c:
    • check assembly, make sure loops are not simplified away
    • for loops

Progress

  1. study tracejump
  2. fix bugs in tracejump
  3. sample PUT triggers the difference between tracejump & SIMGR:
    • If any:
      • caused by repeated bytes that are not recorded by SIMGR
    • load the assembly or the binary in GDB, scan step through it.
    • Fixing the mismatch

Next

  1. Correct the names in Pie Chart
  2. Correct the counters in the algorithm
  3. Test on inputs with more bytes
  4. Test on inputs with for loops
  5. Optimisation: avoid executing the binary on inputs that showed up before
  6. Fixing the mismatch between instrumentation and tracer
  7. Mark a node as exhausted if quick sampler cannot find any new in_str from it
  8. A automatic program to compare the performance between legion and given benchmark
  9. Fix back-propagation: assign rewards according to the in_str generated
  10. Version-control Angr

Important notes:

  1. Cannot keep symbolic execution states with preconstraints in the MCTS tree node, otherwise, future symbolic execution will be limited to this input.
  2. Four kinds of nodes:
    1. White: In TraceJump + Not sure if in Angr + check Symbolic state later + may have simulation child
    2. Red: In TraceJump + Confirmed in Angr + has Symbolic state + has Simulation child
    3. Black: In TraceJump + Confirmed not in Angr + No Symbolic state + No Simulation child
    4. Gold: Not in TraceJump + Not in Angr + Same Symbolic state as parent + is a Simulation child
    5. Purple: Unknown TJ path + SymEx found in Angr + has Symbolic state + is a Phantom Node
  3. Installation order: Angr -> Cle -> Claripy

Changes to dependencies

  1. Angr: Fixed the loggers of angr, so that it will not affect importers
  2. Claripy:
    • Added a new approximate constraint solver backend: Quick Sampler
    • An assertion on the length of exprs