Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot unzip vqa_data.zip file #68

Closed
wenliangdai opened this issue Apr 12, 2022 · 1 comment
Closed

Cannot unzip vqa_data.zip file #68

wenliangdai opened this issue Apr 12, 2022 · 1 comment
Assignees

Comments

@wenliangdai
Copy link

image

The provided vqa_data.zip file is too large and the unzip failed (as shown in the figure above).

We have tried to fix it by zip -F file.zip --out file-large.zip but it doesn't work.

Could you please help to provide a better way to process data files? Thanks!

@yangapku yangapku self-assigned this Apr 12, 2022
@yangapku
Copy link
Member

Hi, since the vqa_data.zip is too large, we have alternatively provided chunked parts of vqa_train.tsv and vqa_test.tsv for downloading. You can try the following code snippet to download the data files:

dist_source=https://ofa-beijing.oss-cn-beijing.aliyuncs.com/datasets/vqa_data
mkdir -p dataset/vqa_data/ && cd dataset/vqa_data/
for idx in `seq -w 00 36`; do wget ${dist_source}/vqa_train_${idx}.tsv; done
for idx in `seq -w 00 08`; do wget ${dist_source}/vqa_test_${idx}.tsv; done
wget ${dist_source}/vqa_val.tsv
wget ${dist_source}/trainval_ans2label.pkl
wget ${dist_source}/vqa_data_md5.txt # for checking the integrity

Each chunked part tsv file contains 50K lines (around 3GB). You can check the integrity of each file with md5sum using the downloaded vqa_data_md5.txt. By concatenating the chunked parts, you will get the complete vqa_train.tsv and vqa_test.tsv files.

cat vqa_train_* > vqa_train.tsv
cat vqa_test_* > vqa_test.tsv # you can also check the md5 using vqa_data_md5.txt

If you still have problems in downloading the files, please do not hesitate to ask me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants