This is a documentation for the sequence-to-sequence model on 2 datasets (Spider, ViText2SQL) with 3 methods (basic, attention, attention+copy) each. With mask and tune with glove embedding.
This code is based on Google seq2seq v0.1 and the code by Catherine Finegan-Dollak.
- Python 2.7.14 :: Anaconda custom (64-bit)
- tensorflow, gpu: 1.5.0
- numpy: 1.13.3
- matplotlib: 2.1.0
bin/
contains entry point to the model including train.py and infer.py, and other tool codesexperimental_configs/
contains experimental configurations in yaml formatdata/
contains folderpre_process
, folderglove
, folderdatasets
folder containing 2 datasets.seq2seq/
contains the main model codeconfig_builder.py
helps make new model directory and write configurations into bash files
Download glove.6B.100d.txt
in data/glove/
.
Put the original data in the folder
data/datasets/data
data/datasets/data_radn_split
Then we should generate data to folder
data/datasets/data_processed
data/datasets/data_radn_split_processed
by changing the infiles
and prefix
variables in data/pre_process/utils.py
, and use data/pre_process/generate_vocab.py
to generate processed data.
We run 6 experiments. The configuration yaml files are in experimental_configs
folder:
attn_copying_tune_data_radn_split.yaml
# attention + copy, data_radn_splitattn_tune_data.yaml
# attention, dataattn_copying_tune_data.yaml
# attention + copy, databasic_tune_data_radn_split.yaml
# basic, data_radn_splitattn_tune_data_radn_split.yaml
# attention, data_radn_splitbasic_tune_data.yaml
# basic, data
In the configuration yaml files, change the data directotries:
data_directories
:data/datasets/data_processed/
embedding.file
:data/glove/glove.6B.100d.txt
Use config_builder.py
to generate model folder with configuration and bash files using configs in experimental_configs/
:
python config_builder.py [configuration_yaml_file]
Then we will get 6 model folders
InputAttentionCopyingSeq2Seq_tune_model_data/
InputAttentionCopyingSeq2Seq_tune_model_data_radn_split/
BasicSeq2Seq_tune_model_data/
BasicSeq2Seq_tune_model_data_radn_split/
AttentionSeq2Seq_tune_model_data/
AttentionSeq2Seq_tune_model_data_radn_split/
For training:
./[model_folder]/experiment.sh
For testing:
./[model_folder]/experiment_infer.sh
with the following outputs:
- train:
[model_folder]/output_train.txt
- dev:
[model_folder]/output.txt
- test:
[model_folder]/output_test.txt
Check output_[data_split].txt and keep all lines relevant for comparing
python evaluation.py --gold [gold file] --pred [predicted file] --etype [evaluation type] --db [database dir] --table [table file]
arguments:
[gold file] gold.sql file where each line is `a gold SQL \t db_id`
[predicted file] predicted sql file where each line is a predicted SQL
[evaluation type] "match" for exact set matching score, "exec" for execution score, and "all" for both
[database dir] directory which contains sub-directories where each SQLite3 database is stored
[table file] table.json file which includes foreign key info of each database
Dongxu Wang, Rui Zhang