Data for canonical name translation experiments
-
Full dump of ParaNames used to create parallel data for experiments
- TSV formatted (
full_paranames_dump.tsv.tar.gz
) - DuckDB database (
full_paranames_dump_duckdb.tar.gz
, experimental)
- TSV formatted (
-
Parallel data used in experiments (
parallel_data_for_experiments.tar.gz
) -
Metadata (
metadata.tar.gz
)Wikidata ID => train/dev/test split
mapping- Sizes of each language in each split