Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomic Embed integration via llama.cpp BERT implementation #2086

Merged
merged 41 commits into from
Mar 13, 2024
Merged

Conversation

cebtenzzre
Copy link
Member

@cebtenzzre cebtenzzre commented Mar 7, 2024

Backend changes

  • Removed bert.cpp
  • Implemented LLamaModel::embed on top of new llama.cpp BERT implementation, which intelligently splits and batches inputs depending on length and n_ctx
  • Added BERT and Nomic BERT to llama.cpp arch whitelist
  • Explicitly blacklist old MiniLM GGUF
  • Dynamic supportsCompletion/supportsEmbedding based on model arch
  • Implemented compatibility with llama : fix embeddings ggerganov/llama.cpp#5796
  • Implemented Matryoshka dimensionality reduction for nomic-embed-text-v1.5
  • Set n_batch to trained context length for embedding models
  • Implemented compatibility with Nomic Atlas API:
    • Built-in support for prefix to add to each embedding input
    • Hardcoded map of model names to query/storage prefixes
    • Change overlap to 8 tokens
    • Use the "nomic empty" placeholder when embedding an empty string
    • Add a mode where the input is truncated instead of averaging the embeddings
    • Optional length limits that match Atlas
    • L2 normalize before and after averaging embeddings
  • Return proper error strings or print warnings in certain cases instead of opaque failure or incorrect results:
    • Error when setting dimensionality on a model that isn't nomic-embed-text-v1.5
    • Error when setting dimensionality to larger than n_embd
    • Error when using a prefix that is not known to the model (if in hardcoded map)
    • Error when n_ctx - prefix is smaller than overlap
    • Warn when text tokenizes to an empty result
  • Added MiniLM, Nomic Embed v1 and v1.5 to models3.json
  • Implement embeddingModel key in models3.json so they can be identified before downloading

Python bindings changes

  • Embed4All now accepts a list of texts to embed
  • Added prefix, domensionality, do_mean, and atlas parameters to Embed4All
  • Dimensionality argument is pre-checked for sanity

Chat UI changes

  • Identify embedding models based on GGUF metadata, and remove special-casing of MiniLM GGUF
  • Use appropriate task type automatically in the chat UI, for any known embedding model
  • Restricted to use MiniLM for local embeddings for now, but the whitelist can easily be removed

TODO

  • Update models3.json based on what GPT4All release this will be included in
  • Add example usage to python docs
  • Implement Nomic python client integration with local/dynamic mode

Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Fix a few bugs in the implementation so that it actually works.

Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
The intended value is the trained context length.

Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
We need to be able to set embeddingModel in models.json, otherwise we
cannot reliably know whether the model is an embedding model because we
cannot open it to check its architecture.

Signed-off-by: Jared Van Bortel <[email protected]>
This is a warning in the Nomic client, so it should be a warning here.

Signed-off-by: Jared Van Bortel <[email protected]>
@cebtenzzre cebtenzzre marked this pull request as ready for review March 12, 2024 17:02
gpt4all-backend/llamamodel.cpp Outdated Show resolved Hide resolved
gpt4all-backend/llamamodel.cpp Show resolved Hide resolved
gpt4all-backend/llamamodel.cpp Show resolved Hide resolved
gpt4all-backend/llamamodel.cpp Outdated Show resolved Hide resolved
gpt4all-backend/llamamodel.cpp Outdated Show resolved Hide resolved
gpt4all-chat/modellist.cpp Outdated Show resolved Hide resolved
gpt4all-chat/modellist.cpp Show resolved Hide resolved
gpt4all-chat/modellist.cpp Outdated Show resolved Hide resolved
gpt4all-chat/modellist.h Show resolved Hide resolved
gpt4all-chat/modellist.h Show resolved Hide resolved
Copy link
Collaborator

@manyoso manyoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address comments.

@cebtenzzre cebtenzzre linked an issue Mar 12, 2024 that may be closed by this pull request
@cebtenzzre cebtenzzre merged commit 406e88b into main Mar 13, 2024
6 of 19 checks passed
@cebtenzzre cebtenzzre deleted the new-bert branch March 13, 2024 22:09
cebtenzzre added a commit that referenced this pull request Mar 14, 2024
Signed-off-by: Jared Van Bortel <[email protected]>
cebtenzzre added a commit that referenced this pull request Mar 15, 2024
Key changes:
* honor empty system prompt argument
* current_chat_session is now read-only and defaults to None
* deprecate fallback prompt template for unknown models
* fix mistakes from #2086

Signed-off-by: Jared Van Bortel <[email protected]>
cebtenzzre added a commit that referenced this pull request Jul 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash on loading embedded model.
2 participants