Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp ./embedding #17

Open
Philipp-Sc opened this issue Oct 25, 2023 · 4 comments
Open

llama.cpp ./embedding #17

Philipp-Sc opened this issue Oct 25, 2023 · 4 comments
Labels
question Further information is requested

Comments

@Philipp-Sc
Copy link

Is there no rust binding to get the embeddings?

Using llama.cpp one would use:

./embedding -m ./path/to/model --log-disable -p "Hello World!" 2>/dev/null
@mdrokz
Copy link
Owner

mdrokz commented Oct 25, 2023

Is there no rust binding to get the embeddings?

Using llama.cpp one would use:

./embedding -m ./path/to/model --log-disable -p "Hello World!" 2>/dev/null

have you tried this

pub fn embeddings(
method ?

this will get you the embeddings for the prompt

@mdrokz mdrokz added the question Further information is requested label Oct 25, 2023
@Philipp-Sc
Copy link
Author

@mdrokz thank you for your response.

I tried it before trying the direct llama.cpp ./embedding executable.

The function would always return an empty vector:

[]

I tried multiple configurations but could not fix the issue.

@mdrokz
Copy link
Owner

mdrokz commented Oct 26, 2023

@mdrokz thank you for your response.

I tried it before trying the direct llama.cpp ./embedding executable.

The function would always return an empty vector:

[]

I tried multiple configurations but could not fix the issue.

Alright i will test on my end see whats happening. Thanks

@Philipp-Sc
Copy link
Author

Philipp-Sc commented Oct 27, 2023

I was using zephyr-7B-alpha-GGUF with:

context_size: 8192 
n_batch: 512 
embeddings: true

without any GPU assistance.


Note:
There also was some strange behavior regarding n_batch and n_token where a longer prompt (still way below the context length) lead to an unexpected error:

GGML_ASSERT: n_token <= n_batch

Right now my workaround is to use a rust wrapper (Command::new) around the ./embedding binary and parse the float values from stdout back into a string, then vector. The only parameter I set is --ctx-size 8192 and --mlock.
I imagine this is less efficient as the model needs to be reloaded for each call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants