Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate class embedding files. #8

Closed
wjhmike95 opened this issue Mar 10, 2021 · 3 comments
Closed

How to generate class embedding files. #8

wjhmike95 opened this issue Mar 10, 2021 · 3 comments

Comments

@wjhmike95
Copy link

Hi, thanks for you great job.
I just get confused how do you generate these classes embedding files(fastext, glove).
How does the index in classes embedding files match to the class id?
Could provide a little more details about generating class embedding files?
Thanks!

@nasir6
Copy link
Owner

nasir6 commented Mar 12, 2021

@wjhmike95 the class embeddings shared with the codebase are sorted in order of the classes defined here.

@zhongxiangzju
Copy link

Hi, could you figure out how to generate the 300 dimension vector of each label ?
I tried to generate word embeddings for each label in coco dataset using gensim, but got different results.

import numpy as np
import gensim.downloader 
print(list(gensim.downloader.info()['models'].keys()))
# ['fasttext-wiki-news-subwords-300', 'conceptnet-numberbatch-17-06-300', 'word2vec-ruscorpora-300', 'word2vec-google-news-300', 'glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-twitter-25', 'glove-twitter-50', 'glove-twitter-100', 'glove-twitter-200', '__testing_word2vec-matrix-synopsis']
word_vectors = gensim.downloader.load('word2vec-google-news-300')
person_embedding = word_vectors['person']
person_embedding  = person_embedding / np.linalg.norm(person_embedding)
print(person_embedding)
# 0.1208263, -0.1084009, 0.00755164, 0.07369547, -0.06384084, 0.07026777 ...

which is different with the first row in ./zero_shot_detection/MSCOCO/word_w2v.txt
0.092629, 0.013665, 0.037897, 0.034125, 0.015237, 0.034970 ...

@GuangyuanLiu1999
Copy link

Hi, could you figure out how to generate the 300 dimension vector of each label ? I tried to generate word embeddings for each label in coco dataset using gensim, but got different results.

import numpy as np
import gensim.downloader 
print(list(gensim.downloader.info()['models'].keys()))
# ['fasttext-wiki-news-subwords-300', 'conceptnet-numberbatch-17-06-300', 'word2vec-ruscorpora-300', 'word2vec-google-news-300', 'glove-wiki-gigaword-50', 'glove-wiki-gigaword-100', 'glove-wiki-gigaword-200', 'glove-wiki-gigaword-300', 'glove-twitter-25', 'glove-twitter-50', 'glove-twitter-100', 'glove-twitter-200', '__testing_word2vec-matrix-synopsis']
word_vectors = gensim.downloader.load('word2vec-google-news-300')
person_embedding = word_vectors['person']
person_embedding  = person_embedding / np.linalg.norm(person_embedding)
print(person_embedding)
# 0.1208263, -0.1084009, 0.00755164, 0.07369547, -0.06384084, 0.07026777 ...

which is different with the first row in ./zero_shot_detection/MSCOCO/word_w2v.txt 0.092629, 0.013665, 0.037897, 0.034125, 0.015237, 0.034970 ...

Hi bro, now do you know how to solve this problem? Do you know how to generate the fasttext.npy file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants