Skip to content

JingfengYang/geca

 
 

Repository files navigation

Good-Enough Data Augmentation

A simple rule-based data augmentation scheme aimed at encouraging generalization in sequence-to-sequence models.

Jacob Andreas, ACL 2020. https://arxiv.org/abs/1904.09545

Experiments:

Look in the exp folder. Experiments labeled retrieval use GECA for data augmentation.

Data:

To use on a new dataset:

  1. Point torchdec at https://github.com/jacobandreas/torchdec.
  2. Create a new data loader under data (look at data/scan.py for a simple example).
  3. Update get_dataset in train.py to use the new loader.
  4. Run the experiment pipeline (look at exp/scan_jump/retrieval/run.sh for an example).

The wug_size and wug_count flags (defined in data/builder.py) determine the number and size of the fragments that will be extracted from each template. the template_sim flag determines whether the whole string or a fixed-size window will be used for evaluating template similarity; sim_window_size determines the window size. The number and diversity of generated templates can be further controlled using the variants and n_sample flags.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 73.3%
  • Shell 26.7%