Skip to content

tjunlp-lab/TGEA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

TGEA 2.0

Datasets and codes for the paper "TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models".

Data License

Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. (License URL: https://creativecommons.org/licenses/by-sa/4.0/)

Quick Start

Data Preprocessing

Converting raw data to the format of each task

unzip data.zip
python data/convert_raw_data_to_benchmarks.py 
python data/convert_gec_format.py

Benchmarks

  1. Erroneous Text Detection
sh Diagnosis_tasks/train_b1.sh
  1. MiSEW Extraction
sh Diagnosis_tasks/train_b2.sh
  1. Erroneous Span Location
sh Diagnosis_tasks/train_b3.sh
  1. Error Type Classification
sh Diagnosis_tasks/train_b4.sh
  1. Error Correction
sh Diagnosis_tasks/train_b5.sh

m2scorer is used to evaluate results of error correction.

  1. Generation Pathology Mitigation
sh Generation_Pathology_Mitigation/train_b6.sh
python Generation_Pathology_Mitigation/evaluate.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published