Embeddings models? #43

walking-octopus · 2023-03-21T17:06:41Z

Is it possible to use GGML for faster and more portable calculating of sentence embeddings? That might make for a useful offline text search tool.

rekola · 2023-03-22T11:58:51Z

I'm interested in this too. Meanwhile I'm using Facebook StarSpace for sentence embeddings. It's very fast and easy to use. The official version, however, has some issues which are fixed here: https://github.com/rekola/StarSpace

ggerganov · 2023-03-22T20:18:39Z

See the work here: ggerganov/llama.cpp#282
Could be relevant

walking-octopus · 2023-03-22T21:02:30Z

That's cool! But does LLaMA have a tiny version similar to OpenAI Ada to avoid wasting resources? I don't think most use-cases need anything more then BERT, which inference would be quite cool to see in GGML.

walking-octopus · 2023-03-22T21:06:43Z

I'm interested in this too. Meanwhile I'm using Facebook StarSpace for sentence embeddings. It's very fast and easy to use. The official version, however, has some issues which are fixed here: https://github.com/rekola/StarSpace

Sounds interesting. Does StarSpace have a pretrained general model? One use-case I had was converting Whisper VTT files into paragraph-split transcripts, which worked by comparing the similarity of each sentence to the previous one, inserting two line-breaks if a threshold is met. Maybe this could even become an official Whisper demo if something like this would be there.

rekola · 2023-03-23T18:22:18Z

I'm interested in this too. Meanwhile I'm using Facebook StarSpace for sentence embeddings. It's very fast and easy to use. The official version, however, has some issues which are fixed here: https://github.com/rekola/StarSpace

Sounds interesting. Does StarSpace have a pretrained general model?

It doesn't. So far, I've trained a Finnish model using social media data, and I will be testing a multi-lingual model next.

One use-case I had was converting Whisper VTT files into paragraph-split transcripts, which worked by comparing the similarity of each sentence to the previous one, inserting two line-breaks if a threshold is met.

That's interesting. I'll have to try that when I need paragraph vectors.

skeskinen · 2023-04-27T13:50:41Z

That's cool! But does LLaMA have a tiny version similar to OpenAI Ada to avoid wasting resources? I don't think most use-cases need anything more then BERT, which inference would be quite cool to see in GGML.

I've implemented BERT in ggml here: https://github.com/skeskinen/bert.cpp

walking-octopus · 2023-04-27T22:08:19Z

That's cool! But does LLaMA have a tiny version similar to OpenAI Ada to avoid wasting resources? I don't think most use-cases need anything more then BERT, which inference would be quite cool to see in GGML.

I've implemented BERT in ggml here: https://github.com/skeskinen/bert.cpp

It works amazingly! Being such a tiny model, I've always wondered if the reason it was so sluggish on my laptop was just a ton of Python bloat. Turns out that guess was indeed correct! bert.cpp could go though each sentence in a full video 15 minute transcript in less than 3 seconds, while the Python version spent that amount of time on a single sentence. Thanks a lot!

Perhaps it could be integrated within whisper.cpp? I was planning on using it in my tiny Python script that transcribed a video through Whisper, generated vector embedding, compared each pair of sentences to each other, and if they're different enough, split them into paragraphs. Perhaps something like this can become an official example? Ether using this naive method or this more accurate one that I've found in a Medium article.

I think going from a video to a formatted blog post without sending a byte of data into the cloud could help a ton of people and would make for a cool demo.

* Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <[email protected]>

walking-octopus changed the title ~~Embeddings model?~~ Embeddings models? Mar 21, 2023

walking-octopus mentioned this issue Apr 27, 2023

Implementing Roberta #56

Open

walking-octopus closed this as completed May 17, 2023

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023

Reduce model loading time (ggerganov#43)

63fd76f

* Use buffering * Use vector * Minor --------- Co-authored-by: Georgi Gerganov <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embeddings models? #43

Embeddings models? #43

walking-octopus commented Mar 21, 2023

rekola commented Mar 22, 2023

ggerganov commented Mar 22, 2023

walking-octopus commented Mar 22, 2023

walking-octopus commented Mar 22, 2023

rekola commented Mar 23, 2023

skeskinen commented Apr 27, 2023

walking-octopus commented Apr 27, 2023

Embeddings models? #43

Embeddings models? #43

Comments

walking-octopus commented Mar 21, 2023

rekola commented Mar 22, 2023

ggerganov commented Mar 22, 2023

walking-octopus commented Mar 22, 2023

walking-octopus commented Mar 22, 2023

rekola commented Mar 23, 2023

skeskinen commented Apr 27, 2023

walking-octopus commented Apr 27, 2023