-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Iam #2658
Iam #2658
Changes from 1 commit
a3a18e2
91508b5
5f273d6
2645f14
6ebfdb2
b532978
f383334
44c9e58
2d11672
4fc6705
8877530
59e2c8b
5fc0d17
1138ee3
b3532ce
9b67d9d
89c9ec7
c05cd4d
5dfe8fc
d7448df
d7d5c22
43e9af9
17c506b
d640742
7dfd0b5
94a80ad
711c3c9
5f2d960
7f2ad0b
8f2ac25
b8e71b2
e9a75f6
ae674ed
7651f37
417d97c
6a86531
ba07ff0
5398412
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -93,12 +93,12 @@ END | |
|
||
cut -d' ' -f2- data/train/text > data/local/train_data.txt | ||
cat data/local/phones.txt data/local/train_data.txt | \ | ||
local/prepend_words.py | \ | ||
utils/lang/bpe/prepend_words.py | \ | ||
utils/lang/bpe/learn_bpe.py -s 700 > data/local/bpe.txt | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you have a chance to check learn_bpe.py to see if it has an option to include all singletons? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sorry, I tried going through the code, but i was not able to complete it. I will surely do it now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried checking the options and docs, there is no option to include singletons |
||
for set in test train val train_aug; do | ||
cut -d' ' -f1 data/$set/text > data/$set/ids | ||
cut -d' ' -f2- data/$set/text | \ | ||
local/prepend_words.py | utils/lang/bpe/apply_bpe.py -c data/local/bpe.txt \ | ||
utils/lang/bpe/prepend_words.py | utils/lang/bpe/apply_bpe.py -c data/local/bpe.txt \ | ||
| sed 's/@@//g' > data/$set/bpe_text | ||
mv data/$set/text data/$set/text.old | ||
paste -d' ' data/$set/ids data/$set/bpe_text > data/$set/text | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please check to see if there is an option that we can pass to learn_bpe.py to make sure all singleton characters are included (so that you don't need to create local/phones.txt)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, thank you. I will check it.