Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnyLoc-VLAD-DINO : "ViT-S8 layer 9 key facet features and 128 clusters for VLAD." #12

Open
Waycn opened this issue Oct 10, 2023 · 1 comment

Comments

@Waycn
Copy link

Waycn commented Oct 10, 2023

if use "AnyLoc-VLAD-DINO",how to load model and download vocabulary?
2023-10-10 18-14-56屏幕截图
2023-10-09 10-25-33屏幕截图

@TheProjectsGuy
Copy link
Collaborator

Hey @Waycn, thanks for taking an interest in our work.

AnyLoc-VLAD-DINO

If you insist on using AnyLoc-VLAD-DINO (using features from DINOv1, not DINOv2), we haven't created a simple demo script to extract VLAD descriptors from a set of images. You can see some files that could help

  • dino_global_vocab_vlad.py: VLAD with cluster centers obtained from a set of datasets. You'll need to download the datasets and wait for the program to generate the cache files. A sample command can be found by referring to the ablations.
  • dino_extractor.py: The wrapper for extracting ViT features from DINOv1 is in class ViTExtractor. This is similar to DinoV2ExtractFeatures for DINOv2 (see section on DINOv2 below).

Note for #7: We can consider this for a future release.

In the meantime, I suggest you use DINOv2 (the better-performing variant) included in our demos.

AnyLoc-VLAD-DINOv2

As you highlighted in your first image, you can load the model from scratch (using torch.hub). However, as done in our demo to extract global descriptors, I suggest you use our DinoV2ExtractFeatures class from utilities.py. This will allow you to extract features from any facet of any block/layer in the ViT. AnyLoc-VLAD-DINOv2 uses ViT-G/14, layer 31, value facet.

We provide a VLAD class in our utilities, you'll need to download the vocabulary (as you have already done in image 2). As long as the cluster centers file is found, the script should automatically detect it and allow you to generate VLAD (global/image) descriptors using the cached vocabulary.

Hope this helps clarify your doubts, let me know if there are any further issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants