Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

关于huggingface方法调用 #111

Open
allendred opened this issue Nov 25, 2023 · 1 comment
Open

关于huggingface方法调用 #111

allendred opened this issue Nov 25, 2023 · 1 comment

Comments

@allendred
Copy link

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained('m3e-base/')
model = AutoModel.from_pretrained('m3e-base/')
model.eval()

def get_sentence_embedding(sentence, tokenizer, model):

    input_ids = tokenizer.encode(sentence, return_tensors='pt')
    with torch.no_grad():
        outputs = model(input_ids)
    last_hidden_state = outputs[0]
    sentence_embedding = torch.mean(last_hidden_state[0], dim=0)
    return sentence_embedding.numpy()

这种方式调用有什么问题么,和sentence-transformer 结果不一样

@wangyuxinwhy
Copy link
Owner

是不是差的不太多?一般来讲 Mean 需要考虑 padding 的 token,所以需要 Mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants