Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reproduce Robotcar results #45

Open
RuotongWANG opened this issue Nov 21, 2021 · 11 comments
Open

Can't reproduce Robotcar results #45

RuotongWANG opened this issue Nov 21, 2021 · 11 comments

Comments

@RuotongWANG
Copy link

Hi, I tried to reproduce your result on Robotcar Seasons V2 test set by submitting to the challenge submission server. I used the released performance-focused model which is pre-trained on MSLS dataset, but I got this incorrect result:
image
And I tried the model pre-trained on Pitts30k, the results are not correct either.
image
Besides, the results on other datasets is normal. Is the model version that I used is wrong? Could you possibly release the model state that achieves the results on Robotcat dataset shown in the paper? Or would you provide the results on test set split by conditions like the Supplementary Table 1? Thank you so much.

Best regards,

@Tobias-Fischer
Copy link
Contributor

Hi,

Could you please let us know the complete process that you used to obtain these results? In particular, how you map the best match to a pose?

Best, Tobias

@RuotongWANG
Copy link
Author

RuotongWANG commented Nov 22, 2021

I directly used the pose of the best matched reference image as the estimated pose of the query. And I have also evaluated the SuperGlue method with the same procedure and got a normal result:
image
So I think there might be something wrong with the configuration or the model state that I used.

@Tobias-Fischer
Copy link
Contributor

Ok - @StephenHausler - let's sit together at some point to find where the culprit lies.

@marialeyvallina
Copy link

Hi, @StephenHausler, @Tobias-Fischer , some days ago I ran the Pittsburgh_WPCA4096 and MSLS_WPCA4096 models for RobotSeasons and obtained the following results with the NetVLAD retrieval:
Pittsburgh_WPCA4096:
day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> overall 5.9 23.3 73.9
In the paper you report for NetVLAD: 7.0 24.9 76.6

MSLS_WPCA4096:
day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> overall 5.0 18.58 67.8

The overall I calculate it by doing the weighted mean of both numbers based on the number of images taken at day and night: overall = ( day * 9300 + night * 2634 ) / (9300 + 2634)

For the Pittsburgh model the difference with the reported numbers seems reasonable to me (like what would happen between two different trainings), so I think that the model is probably fine and the problem lies in the Patch-NetVLAD feature extraction part. I hope this info helps with the issue.

@Tobias-Fischer Tobias-Fischer changed the title Can't reproduce Robotcar redults Can't reproduce Robotcar results Dec 2, 2021
@HeartbreakSurvivor
Copy link

Hi, @StephenHausler, @Tobias-Fischer , some days ago I ran the Pittsburgh_WPCA4096 and MSLS_WPCA4096 models for RobotSeasons and obtained the following results with the NetVLAD retrieval: Pittsburgh_WPCA4096: day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> overall 5.9 23.3 73.9 In the paper you report for NetVLAD: 7.0 24.9 76.6

MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> overall 5.0 18.58 67.8

The overall I calculate it by doing the weighted mean of both numbers based on the number of images taken at day and night: overall = ( day * 9300 + night * 2634 ) / (9300 + 2634)

For the Pittsburgh model the difference with the reported numbers seems reasonable to me (like what would happen between two different trainings), so I think that the model is probably fine and the problem lies in the Patch-NetVLAD feature extraction part. I hope this info helps with the issue.

Hi, could you please tell me the dataset you ran Pittsburgh_WPCA4096 model is RobotSeasons V1 or V2?

@marialeyvallina
Copy link

Hi @HeartbreakSurvivor, I ran RobotSeasons V2

@HeartbreakSurvivor
Copy link

Hi @HeartbreakSurvivor, I ran RobotSeasons V2

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

overall = ( day * 9300 + night * 2634 ) / (9300 + 2634)

I dont't know why, but it doesn't matter.

What I really wonder is that how you get these result? just follow the QuickStart in ReadMe.md file?
I alos ran the Pittsburgh_WPCA4096 model on RobotCar Seasons V2 but got wrong result and don't know why. I just run the feature_extract.py, feature_match .py to get the 'PatchNetVLAD_predictions.txt' and just get pose of the best matched database image as estimated pose for each query image. And submit result to benchmark website but got wrong answers.
So I hope you could tell me how you obtained your results which seems reasonable, Thanks.

@marialeyvallina
Copy link

Hi again @HeartbreakSurvivor

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

Thank you very much for pointing this out, it seems that I indeed mixed the two versions. The overall should be instead calculated as:
overall = (day * 1443 +night * 429)/(1443+429)
The distribution is very similar between v1 and v2 so the results do not change much:
For Pittsburgh_WPCA4096 day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> 5.8 | 23.1 | 73.2
For MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> 4.8 | 17.9 | 65.3

I use indeed feature_extract.py and feature_match.py and then use the NetVLAD_predictions.txt file (I have not evaluated Patch-NetVLAD yet, only NetVLAD). You have to be careful with the format of the poses, as explained in the dataset readme, but the retrieval itself should be fine.

@HeartbreakSurvivor
Copy link

Hi again @HeartbreakSurvivor

Hi, the question is the RobotcarV1 has 9300 + 2634 = 11934 query images and the RobotCar v2 has 1872 query images, you said you ran on RobotSeasons V2 but calculate overall use this:

Thank you very much for pointing this out, it seems that I indeed mixed the two versions. The overall should be instead calculated as: overall = (day * 1443 +night * 429)/(1443+429) The distribution is very similar between v1 and v2 so the results do not change much: For Pittsburgh_WPCA4096 day-all: 7.3 29.2 91.3, night all: 0.9 2.6 2.4 -> 5.8 | 23.1 | 73.2 For MSLS_WPCA4096: day-all: 6.2 23.1 83.5, night-all: 0, 0.5, 4.2 -> 4.8 | 17.9 | 65.3

I use indeed feature_extract.py and feature_match.py and then use the NetVLAD_predictions.txt file (I have not evaluated Patch-NetVLAD yet, only NetVLAD). You have to be careful with the format of the poses, as explained in the dataset readme, but the retrieval itself should be fine.

thank you very much for the reply, I will check my code.

@HeartbreakSurvivor
Copy link

Hi again @marialeyvallina it seems that I got the same problem with you. I ran the Pittsburgh_WPCA4096 model for RobotSeasons V1 and obtained the following results with the NetVLAD retrieval:

day all night all
6.3 / 25.4 / 87.6 0.8 / 2.5 / 16.5

which seems reasonable to me.
But when I use the PatchNetvlad retrieval, the result seems wrong.

day all night all
2.1 / 8.3 / 36.7 0.1 / 1.3 / 13.9

I have test Pittsburgh_WPCA4096 on RobotCar Seasons V1 for twice just in case, but got the same result, the result is as follows.
image

So I agree with your point, the problem maybe lies in PathchNetvlad feature extraction or feature match part.
Hi, @Tobias-Fischer, any hints about this issue?Or did you test RobotCar Seasons V1 dataset, if so, could you please provide the test result?

@Tobias-Fischer
Copy link
Contributor

Hi, @StephenHausler and I will be looking at this. However the holiday season is coming up and we're tied with other projects.

We haven't ever checked V1 as far as I remember.

I'm assuming you guys are aware that the lower scores are better for NetVLAD (distances), but higher scores for Patch-NetVLAD (number of inliers)? So it needs an argmax instead of argmin to get the top1 match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants