Questions about refcocog datasets split. #7

zzzzzz0407 · 2020-09-16T04:54:04Z

Hello, thanks for your wonderful work.
In your paper, you claim that you use the UNC partition for refcocog dataset.

However, in your DATA_PREP only has the umd split

And then if I use the default umd split.
The total expression for train + val + test = 95010, which is much fewer than the original expressions (104560)

Can you help to figure out what's wrong with it? Thanks very much!

luogen1996 · 2020-09-16T08:13:56Z

Thank you for pointing out this typo error. The descriptions in the paper might be 'umd' instead of 'unc'. In fact, there are 104,560 expressions in the original RefCOCOg (google), which only contains 'train' and 'val' split. But when it was re-splitted in the 'umd' partition, some expressions are filtered out.
see details in the datasets' paper : https://arxiv.org/pdf/1608.00525.pdf
“Our training partition contains 23199 images with 67996 objects. Some objects have multiple referring expressions and hence the total number of referring expressions is 85,408. The validation partition contains 2600 images with 7623 objects and 9602 referring expressions.”

zzzzzz0407 · 2020-09-16T08:28:27Z

Thank you for pointing out this typo error. The descriptions in the paper might be 'umd' instead of 'unc'. In fact, there are 104,560 expressions in the original RefCOCOg (google), which only contains 'train' and 'val' split. But when it was re-splitted in the 'umd' partition, some expressions are filtered out.
see details in the datasets' paper : https://arxiv.org/pdf/1608.00525.pdf
“Our training partition contains 23199 images with 67996 objects. Some objects have multiple referring expressions and hence the total number of referring expressions is 85,408. The validation partition contains 2600 images with 7623 objects and 9602 referring expressions.”

Thanks for quick reply. However, when I count the right number with umd split.
The results are as follows.
RefCOCOg: 25799 images / 49822 objects / 95010 expressions ( ~ 8.4 words)
train: 42226 / 80512 val: 2573 / 4896 test: 5023 / 9602 (objects/expressions)
The scripts code is as follows:
`file_path = "/data00/home/zhangrufeng1/projects/mcn/data/anns/refcocog/test.txt"
with open(file_path, "r") as f:
lines = f.readlines()

num_sen = 0
num_token = 0
for line in lines:
    ann = line.strip().split("~")[1:]
    num_sen += len(ann)

    for tokens in ann:
        num_token += len(tokens.split())

ave_token = num_token / num_sen
print("Number Sentences: {}, Ave Tokens: {}".format(num_sen, ave_token))`

Anyway, Thanks very much to solve my concern.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about refcocog datasets split. #7

Questions about refcocog datasets split. #7

zzzzzz0407 commented Sep 16, 2020

luogen1996 commented Sep 16, 2020

zzzzzz0407 commented Sep 16, 2020

Questions about refcocog datasets split. #7

Questions about refcocog datasets split. #7

Comments

zzzzzz0407 commented Sep 16, 2020

luogen1996 commented Sep 16, 2020

zzzzzz0407 commented Sep 16, 2020