Nomic Embed integration via llama.cpp BERT implementation #2086

cebtenzzre · 2024-03-07T17:22:40Z

Backend changes

Removed bert.cpp
Implemented LLamaModel::embed on top of new llama.cpp BERT implementation, which intelligently splits and batches inputs depending on length and n_ctx
Added BERT and Nomic BERT to llama.cpp arch whitelist
Explicitly blacklist old MiniLM GGUF
Dynamic supportsCompletion/supportsEmbedding based on model arch
Implemented compatibility with llama : fix embeddings ggerganov/llama.cpp#5796
Implemented Matryoshka dimensionality reduction for nomic-embed-text-v1.5
Set n_batch to trained context length for embedding models
Implemented compatibility with Nomic Atlas API:
- Built-in support for prefix to add to each embedding input
- Hardcoded map of model names to query/storage prefixes
- Change overlap to 8 tokens
- Use the "nomic empty" placeholder when embedding an empty string
- Add a mode where the input is truncated instead of averaging the embeddings
- Optional length limits that match Atlas
- L2 normalize before and after averaging embeddings
Return proper error strings or print warnings in certain cases instead of opaque failure or incorrect results:
- Error when setting dimensionality on a model that isn't nomic-embed-text-v1.5
- Error when setting dimensionality to larger than n_embd
- Error when using a prefix that is not known to the model (if in hardcoded map)
- Error when n_ctx - prefix is smaller than overlap
- Warn when text tokenizes to an empty result
Added MiniLM, Nomic Embed v1 and v1.5 to models3.json
Implement embeddingModel key in models3.json so they can be identified before downloading

Python bindings changes

Embed4All now accepts a list of texts to embed
Added prefix, domensionality, do_mean, and atlas parameters to Embed4All
Dimensionality argument is pre-checked for sanity

Chat UI changes

Identify embedding models based on GGUF metadata, and remove special-casing of MiniLM GGUF
Use appropriate task type automatically in the chat UI, for any known embedding model
Restricted to use MiniLM for local embeddings for now, but the whitelist can easily be removed

TODO

Update models3.json based on what GPT4All release this will be included in
Add example usage to python docs
Implement Nomic python client integration with local/dynamic mode

Signed-off-by: Jared Van Bortel <[email protected]>

Fix a few bugs in the implementation so that it actually works. Signed-off-by: Jared Van Bortel <[email protected]>

Signed-off-by: Jared Van Bortel <[email protected]>

The intended value is the trained context length. Signed-off-by: Jared Van Bortel <[email protected]>

Signed-off-by: Jared Van Bortel <[email protected]>

We need to be able to set embeddingModel in models.json, otherwise we cannot reliably know whether the model is an embedding model because we cannot open it to check its architecture. Signed-off-by: Jared Van Bortel <[email protected]>

Signed-off-by: Jared Van Bortel <[email protected]>

This is a warning in the Nomic client, so it should be a warning here. Signed-off-by: Jared Van Bortel <[email protected]>

gpt4all-backend/llamamodel.cpp

gpt4all-chat/modellist.cpp

gpt4all-chat/modellist.h

manyoso

Please address comments.

Signed-off-by: Jared Van Bortel <[email protected]>

Key changes: * honor empty system prompt argument * current_chat_session is now read-only and defaults to None * deprecate fallback prompt template for unknown models * fix mistakes from #2086 Signed-off-by: Jared Van Bortel <[email protected]>

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre added 23 commits February 28, 2024 16:05

llmodel_c: simplify casts and callbacks

3077b97

Signed-off-by: Jared Van Bortel <[email protected]>

modellist: fix a memory leak

3d8159b

Signed-off-by: Jared Van Bortel <[email protected]>

modellist: remove modelDirPath(), which was never used

536d666

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: initial LLamaModel::embed implementation

520b3af

Signed-off-by: Jared Van Bortel <[email protected]>

s/prompts/texts/

7b37051

Signed-off-by: Jared Van Bortel <[email protected]>

python: adapt to the new LLModel::embed API

4f11471

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: whitelist BERT and Nomic BERT

54e8fbe

Signed-off-by: Jared Van Bortel <[email protected]>

python: Nomic Embed is actually working now

e38140e

Fix a few bugs in the implementation so that it actually works. Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: blacklist old MiniLM quant

735127a

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: only allow LLamaModel::embed on embedding models

9134410

Signed-off-by: Jared Van Bortel <[email protected]>

chat: initial integration of new embedding code with chat UI

5671c0e

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: use embedding code from "fix embeddings" PR

4751cd3

Signed-off-by: Jared Van Bortel <[email protected]>

matryoshka

99f1e93

Signed-off-by: Jared Van Bortel <[email protected]>

fix n_batch for embedding models

b998791

The intended value is the trained context length. Signed-off-by: Jared Van Bortel <[email protected]>

update to latest changes from PR 5796 (merged)

979b947

Signed-off-by: Jared Van Bortel <[email protected]>

better alignment with Atlas API

9b51271

Signed-off-by: Jared Van Bortel <[email protected]>

closer adherence to Atlas API

8e5645c

Signed-off-by: Jared Van Bortel <[email protected]>

it compiles!

d00a9e0

Signed-off-by: Jared Van Bortel <[email protected]>

python fixup

6c10840

Signed-off-by: Jared Van Bortel <[email protected]>

chat: don't do anything with pre-GGUF .bin files

5c10a2d

Signed-off-by: Jared Van Bortel <[email protected]>

chat: restrict local embed to all-MiniLM-L6-v2-gguf2.f16.gguf for now

56768ff

Signed-off-by: Jared Van Bortel <[email protected]>

models3.json: add new MiniLM quant

910b472

Signed-off-by: Jared Van Bortel <[email protected]>

models3.json: add Nomic Embed v1 and v1.5

17fe0b3

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre force-pushed the new-bert branch from fba16b8 to 17fe0b3 Compare March 8, 2024 21:12

cebtenzzre added 6 commits March 8, 2024 16:46

Merge branch 'main' into new-bert

ab8bfd9

llmodel_c: fix typo

5f7103e

Signed-off-by: Jared Van Bortel <[email protected]>

Merge branch 'main' into new-bert

bb1fc75

fix botched merge

6ee8c76

Signed-off-by: Jared Van Bortel <[email protected]>

chat: fix filename bug caused by merge

eabc9cd

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre added 4 commits March 8, 2024 18:46

modellist: also whitelist nomic-embed-text-v1.txt

c39fa52

Signed-off-by: Jared Van Bortel <[email protected]>

python: use slice to index embedding_ptr

33a0cd6

Signed-off-by: Jared Van Bortel <[email protected]>

python: change dimensionality exception to a warning

db10b7d

This is a warning in the Nomic client, so it should be a warning here. Signed-off-by: Jared Van Bortel <[email protected]>

Merge branch 'main' into new-bert

cb214df

cebtenzzre marked this pull request as ready for review March 12, 2024 17:02

manyoso reviewed Mar 12, 2024

View reviewed changes

manyoso requested changes Mar 12, 2024

View reviewed changes

cebtenzzre linked an issue Mar 12, 2024 that may be closed by this pull request

Crash on loading embedded model. #2079

Closed

cebtenzzre added 6 commits March 12, 2024 17:10

style: do not use braces with single-line ifs

46d64d6

Signed-off-by: Jared Van Bortel <[email protected]>

embllm: make sendAtlasRequest private

39fb564

Signed-off-by: Jared Van Bortel <[email protected]>

modellist: clarify arument to EmbeddingModels

e3fde3b

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: handle null d_ptr->model in auto-prefix embed()

223090b

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: fix missing const

5c1ddb0

Signed-off-by: Jared Van Bortel <[email protected]>

llamamodel: make magic constant less magic

9b93ab1

Signed-off-by: Jared Van Bortel <[email protected]>

manyoso approved these changes Mar 12, 2024

View reviewed changes

cebtenzzre added 2 commits March 13, 2024 17:24

models3.json: assume these changes will be released in v2.7.4

967616e

Signed-off-by: Jared Van Bortel <[email protected]>

modellist: never list or use models with disableGUI set

ad86bf5

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre merged commit 406e88b into main Mar 13, 2024
6 of 19 checks passed

cebtenzzre deleted the new-bert branch March 13, 2024 22:09

cebtenzzre added a commit that referenced this pull request Mar 14, 2024

python: fix mistakes from #2086

e23dd86

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre added a commit that referenced this pull request Jul 3, 2024

modellist: revert a buggy change from #2086

8545522

Signed-off-by: Jared Van Bortel <[email protected]>

cebtenzzre mentioned this pull request Jul 3, 2024

modellist: work around filtered item models getting out of sync #2545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nomic Embed integration via llama.cpp BERT implementation #2086

Nomic Embed integration via llama.cpp BERT implementation #2086

cebtenzzre commented Mar 7, 2024 •

edited

Loading

manyoso left a comment

Nomic Embed integration via llama.cpp BERT implementation #2086

Nomic Embed integration via llama.cpp BERT implementation #2086

Conversation

cebtenzzre commented Mar 7, 2024 • edited Loading

Backend changes

Python bindings changes

Chat UI changes

TODO

manyoso left a comment

Choose a reason for hiding this comment

cebtenzzre commented Mar 7, 2024 •

edited

Loading