The data comes from the yelp review challenge, which is a document classification challenge: how many stars did each review get?
It has been sampled and processed as follows:
words.txt
-> vocabulary, one word per linedocuments.txt
-> dataset, one document per line. Sentences separated by|&|
, words already tokenizedlabels.txt
-> int label for each documenttf_multi_sequence_dataset.py
-> standalone code exhibiting how to store sequences of sequences into atf.data.Dataset
python: 3.6.4 tensorflow: 1.8
$ python tf_multi_sequence_dataset.py
(32, 28, 55)
(32, 57, 112)
True
True