Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings models? #43

Closed
walking-octopus opened this issue Mar 21, 2023 · 7 comments
Closed

Embeddings models? #43

walking-octopus opened this issue Mar 21, 2023 · 7 comments

Comments

@walking-octopus
Copy link

Is it possible to use GGML for faster and more portable calculating of sentence embeddings? That might make for a useful offline text search tool.

@walking-octopus walking-octopus changed the title Embeddings model? Embeddings models? Mar 21, 2023
@rekola
Copy link

rekola commented Mar 22, 2023

I'm interested in this too. Meanwhile I'm using Facebook StarSpace for sentence embeddings. It's very fast and easy to use. The official version, however, has some issues which are fixed here: https://github.com/rekola/StarSpace

@ggerganov
Copy link
Owner

See the work here: ggerganov/llama.cpp#282
Could be relevant

@walking-octopus
Copy link
Author

That's cool! But does LLaMA have a tiny version similar to OpenAI Ada to avoid wasting resources? I don't think most use-cases need anything more then BERT, which inference would be quite cool to see in GGML.

@walking-octopus
Copy link
Author

I'm interested in this too. Meanwhile I'm using Facebook StarSpace for sentence embeddings. It's very fast and easy to use. The official version, however, has some issues which are fixed here: https://github.com/rekola/StarSpace

Sounds interesting. Does StarSpace have a pretrained general model? One use-case I had was converting Whisper VTT files into paragraph-split transcripts, which worked by comparing the similarity of each sentence to the previous one, inserting two line-breaks if a threshold is met. Maybe this could even become an official Whisper demo if something like this would be there.

@rekola
Copy link

rekola commented Mar 23, 2023

I'm interested in this too. Meanwhile I'm using Facebook StarSpace for sentence embeddings. It's very fast and easy to use. The official version, however, has some issues which are fixed here: https://github.com/rekola/StarSpace

Sounds interesting. Does StarSpace have a pretrained general model?

It doesn't. So far, I've trained a Finnish model using social media data, and I will be testing a multi-lingual model next.

One use-case I had was converting Whisper VTT files into paragraph-split transcripts, which worked by comparing the similarity of each sentence to the previous one, inserting two line-breaks if a threshold is met.

That's interesting. I'll have to try that when I need paragraph vectors.

@skeskinen
Copy link
Contributor

That's cool! But does LLaMA have a tiny version similar to OpenAI Ada to avoid wasting resources? I don't think most use-cases need anything more then BERT, which inference would be quite cool to see in GGML.

I've implemented BERT in ggml here: https://github.com/skeskinen/bert.cpp

@walking-octopus
Copy link
Author

That's cool! But does LLaMA have a tiny version similar to OpenAI Ada to avoid wasting resources? I don't think most use-cases need anything more then BERT, which inference would be quite cool to see in GGML.

I've implemented BERT in ggml here: https://github.com/skeskinen/bert.cpp

It works amazingly! Being such a tiny model, I've always wondered if the reason it was so sluggish on my laptop was just a ton of Python bloat. Turns out that guess was indeed correct! bert.cpp could go though each sentence in a full video 15 minute transcript in less than 3 seconds, while the Python version spent that amount of time on a single sentence. Thanks a lot!

Perhaps it could be integrated within whisper.cpp? I was planning on using it in my tiny Python script that transcribed a video through Whisper, generated vector embedding, compared each pair of sentences to each other, and if they're different enough, split them into paragraphs. Perhaps something like this can become an official example? Ether using this naive method or this more accurate one that I've found in a Medium article.

I think going from a video to a formatted blog post without sending a byte of data into the cloud could help a ton of people and would make for a cool demo.

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants