A basic implementaiton of using SGD over path-patching hypotheses to automatically localize behaviors. Inspired by work I did while at Redwood.
This is very much a work in progress! I hope to have a markdown file that more clearly summarizes current results soon. In the meantime, I reccomend looking at this file for the best summary.
- Finish basic implimentation
- Sanity checks + naive experiments
- Allow hypotheses about direct paths direct paths to the output.