You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A minor extension to the torchtext Dataset class, allowing it to extract bzip2 compressed files after downloading them.
Motivation
Basically, bzip2 might not be as common as zip or gzip, but it's certainly not unheard of. Moreover, Dataset is already making use of the tarfile module to extract gzip files, so this would be a very minor change.
My particular use case involves the WikiSmall and WikiLarge datasets (see here).
Pitch
I'm happy to implement this feature myself. I would just an extra branch to this conditional statement.
Other thoughts
Not sure if this should be a separate issue, but perhaps the conditional linked to above should have an else clause that warns the user if their data is left un-extracted.
The text was updated successfully, but these errors were encountered:
Thanks, I hadn't seen that before. I can't look at it carefully right now, but it seems to me the entire extraction procedure in Dataset.download should simply call (a patched version of) extract_archive.
🚀 Feature
A minor extension to the torchtext
Dataset
class, allowing it to extract bzip2 compressed files after downloading them.Motivation
Basically, bzip2 might not be as common as zip or gzip, but it's certainly not unheard of. Moreover,
Dataset
is already making use of thetarfile
module to extract gzip files, so this would be a very minor change.My particular use case involves the WikiSmall and WikiLarge datasets (see here).
Pitch
I'm happy to implement this feature myself. I would just an extra branch to this conditional statement.
Other thoughts
Not sure if this should be a separate issue, but perhaps the conditional linked to above should have an
else
clause that warns the user if their data is left un-extracted.The text was updated successfully, but these errors were encountered: