Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
llm2vec		llm2vec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Repository files navigation

LLM2Vec

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders.

Instrallation

To use LLM2Vec, first install the llm2vec package from PyPI.

pip install llm2vec

You can also directly install it from our code by cloning the repository and:

pip install -e .

Getting Started

LLM2Vec is a generic model, which takes a tokenizer and a model. First, we define the model and tokenizer using transformers library:

import torch
from transformers import AutoTokenizer, AutoModel, AutoConfig
config = AutoConfig.from_pretrained("McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", trust_remote_code=True)
model = AutoModel.from_pretrained("McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp", trust_remote_code=True, config=config, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained("McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp")

Then, we define our llm2vec model as follows:

from llm2vec import LLM2Vec

l2v = LLM2Vec(model, tokenizer)

This model now returns the text embedding for any input in the form of [[instruction, text]].

inputs = [
  ['Retrieve duplicate questions from StackOverflow forum', 'Python (Numpy) array sorting'],
  ['', 'Sort a list in python'],
  ['', 'Sort an array in Java'],
]
repr = l2v.encode(inputs, convert_to_tensor=True)
sim_pos = torch.nn.functional.cosine_similarity(repr[0].unsqueeze(0), repr[1].unsqueeze(0))  # tensor([0.5987])
sim_neg = torch.nn.functional.cosine_similarity(repr[0].unsqueeze(0), repr[2].unsqueeze(0))  # tensor([0.5585])

Model List

Training

Training code will be available soon.

Bugs or questions?

If you have any question about the code, feel free to email Parishad ([email protected]) and Vaibhav ([email protected]).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM2Vec

Instrallation

Getting Started

Model List

Training

Bugs or questions?

About

Releases 9

Contributors 8

Languages

License

McGill-NLP/llm2vec

Folders and files

Latest commit

History

Repository files navigation

LLM2Vec

Instrallation

Getting Started

Model List

Training

Bugs or questions?

About

Resources

License

Stars

Watchers

Forks

Releases 9

Contributors 8

Languages