Update Nvidia integration to support new endpoints #701

silvanocerza · 2024-04-30T09:20:37Z

Fixes #696.

This PR updates both Nvidia generator and embedders to support the new API catalog endpoints.

These new endpoints require a different API key from the previous ones so I opted to use an NVIDIA_CATALOG_API_KEY env var in tests to differentiate it from the previous one.

mattf

the concept of NIM is a service + model that can be deployed anywhere. we host some on https://build.nvidia.com and users can export them to run on their own infrastructure. we happen to use NVIDIA Cloud Functions (nvcf) to serve the hosted NIMs.

the generator nims follow a /chat/completion api (e.g. https://docs.api.nvidia.com/nim/reference/meta-llama3-8b-infer), the embedding nims follow https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer

api keys
- we aim to let users provide param nvidia_api_key -> param api_key -> env NVIDIA_API_KEY, where nvidia_api_key takes precedence over api_key which takes precedence over NVIDIA_API_KEY
- local nim does not require an api key
- it is recommended to use environment for keys
mode switching for hosted and local nim w/ mode()
- parameters are a mode name, literal["nvidia", "nim"], an api key for nvidia mode and base_url for nim mode
- nvidia is the default mode
- e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including /v1, but arguably shouldn't. we're not including /chat/completion or /embedding because the service should provide /models at the same base url
https://catalog.ngc.nvidia.com/ai-foundation-models are deprecated, e.g. nvolveqa_40k, users should use models from https://build.nvidia.com, e.g. https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4
Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?
embedding models support a truncate parameter for service-side truncation, see https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer

shadeMe

Just a couple of minor changes.

integrations/nvidia/pyproject.toml

integrations/nvidia/src/haystack_integrations/components/embedders/nvidia/_nim_backend.py

integrations/nvidia/src/haystack_integrations/components/generators/nvidia/_nim_backend.py

Co-authored-by: Madeesh Kannan <[email protected]>

shadeMe · 2024-05-03T09:24:01Z

@mattf Thanks very much for your comments. A few questions/comments from my side:

1. api keys
   
   * we aim to let users provide param `nvidia_api_key` -> param `api_key` -> env `NVIDIA_API_KEY`, where `nvidia_api_key` takes precedence over `api_key` which takes precedence over `NVIDIA_API_KEY`
   * local nim does not require an api key
   * it is recommended to use environment for keys

Could you expand on that a bit more? Why is there a need for two parameters that accept an authentication secret?

2. mode switching for hosted and local nim w/ `mode()`
   
   * parameters are a mode name, `literal["nvidia", "nim"]`, an api key for nvidia mode and `base_url` for nim mode
   * nvidia is the default mode
   * e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including `/v1`, but arguably shouldn't. we're not including `/chat/completion` or `/embedding` because the service should provide `/models` at the same base url

Correct me if I'm wrong, but it sounds like you're suggesting the above to make the api_key and base_url parameters mutually exclusive? Haystack components don't modify their (user-facing) state after initialization, so a mode method would not be suitable here.

3. [catalog.ngc.nvidia.com/ai-foundation-models](https://catalog.ngc.nvidia.com/ai-foundation-models) are deprecated, e.g. `nvolveqa_40k`, users should use models from [build.nvidia.com](https://build.nvidia.com), e.g. [docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4](https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4)

Okay, so we can safely remove support for that "backend" then? We call it the NvCfBackend in our code.

4. Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?

This depends on how the embedding model is trained. Some wrap their inputs in special markers/meta tokens to indicate their semantics, i.e., is the text part of a query or a passage. As such, those parameters are standard in Haystack embedding components.

5. embedding models support a `truncate` parameter for service-side truncation, see [docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer](https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer)

Thanks for the tip; we can add support for that.

mattf · 2024-05-03T13:20:22Z

@mattf Thanks very much for your comments. A few questions/comments from my side:
1. api keys
   
   * we aim to let users provide param `nvidia_api_key` -> param `api_key` -> env `NVIDIA_API_KEY`, where `nvidia_api_key` takes precedence over `api_key` which takes precedence over `NVIDIA_API_KEY`
   * local nim does not require an api key
   * it is recommended to use environment for keys
Could you expand on that a bit more? Why is there a need for two parameters that accept an authentication secret?

the recommendation is for people to use the NVIDIA_API_KEY env variable, which makes the key handling transparent & easy to switch as well as being more secure by avoiding keys in code. if we only had one way, this should be it.

other connectors are using api_key as a param, so we're aligning with that.

the nvidia_api_key is an expectation we set early on in other frameworks. even today many docs about using NVIDIA APIs still reference nvidia_api_key, though not for Haystack.

having >1 way to do this definitely adds cognitive load.

2. mode switching for hosted and local nim w/ `mode()`
   
   * parameters are a mode name, `literal["nvidia", "nim"]`, an api key for nvidia mode and `base_url` for nim mode
   * nvidia is the default mode
   * e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including `/v1`, but arguably shouldn't. we're not including `/chat/completion` or `/embedding` because the service should provide `/models` at the same base url
Correct me if I'm wrong, but it sounds like you're suggesting the above to make the api_key and base_url parameters mutually exclusive? Haystack components don't modify their (user-facing) state after initialization, so a mode method would not be suitable here.

+1 immutable.

what's the idiomatic way to do this in Haystack?

3. [catalog.ngc.nvidia.com/ai-foundation-models](https://catalog.ngc.nvidia.com/ai-foundation-models) are deprecated, e.g. `nvolveqa_40k`, users should use models from [build.nvidia.com](https://build.nvidia.com), e.g. [docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4](https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4)

Okay, so we can safely remove support for that "backend" then? We call it the NvCfBackend in our code.

we have users of the deprecated models in other communities. our approach is to raise deprecation warnings.

depending on what you think the usage is, you could do the same.

4. Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?
This depends on how the embedding model is trained. Some wrap their inputs in special markers/meta tokens to indicate their semantics, i.e., is the text part of a query or a passage. As such, those parameters are standard in Haystack embedding components.

https://huggingface.co/intfloat/e5-large#faq an example of that?

for NIMs, the service-side handles this based on the input_type parameter. if a user specifies prefix="query: " and runs NVIDIATextEmbedder they'll effective get embeddings for "query: query: {text}"

silvanocerza · 2024-05-06T14:20:58Z

integrations/nvidia/src/haystack_integrations/components/embedders/nvidia/_nvcf_backend.py

@@ -17,6 +18,7 @@ def __init__(
 api_key: Secret,
 model_kwargs: Optional[Dict[str, Any]] = None,
 ):
+ warnings.warn("Nvidia NGC is deprecated, use Nvidia NIM instead.", DeprecationWarning, stacklevel=2)


I'll add a link to the documentation after it's published.

integrations/nvidia/src/haystack_integrations/components/embedders/nvidia/text_embedder.py

integrations/nvidia/src/haystack_integrations/components/embedders/nvidia/truncate.py

integrations/nvidia/src/haystack_integrations/components/embedders/nvidia/text_embedder.py

silvanocerza added 3 commits April 30, 2024 10:55

Add support for Nvidia catalog API for generator

4a6f7f2

Add support for Nvidia catalog API for embedders

0d9b56b

Add NVIDIA_CATALOG_API_KEY in Nvidia integration workflow

5627342

silvanocerza self-assigned this Apr 30, 2024

silvanocerza requested a review from a team as a code owner April 30, 2024 09:20

silvanocerza requested review from shadeMe and removed request for a team April 30, 2024 09:20

github-actions bot added topic:CI integration:nvidia labels Apr 30, 2024

silvanocerza added 2 commits April 30, 2024 11:23

Enable ruff auto formatting for tests

047fb88

Fix linting

e5aa48e

mattf reviewed May 1, 2024

View reviewed changes

shadeMe reviewed May 3, 2024

View reviewed changes

Simplify Secret import and enhance docstring

bb465e5

Co-authored-by: Madeesh Kannan <[email protected]>

github-actions bot added the type:documentation Improvements or additions to documentation label May 3, 2024

silvanocerza added 2 commits May 6, 2024 16:18

Add deprecation warnings for NvcfBackend

5a5c884

Add truncate parameter for embedders

57a657f

silvanocerza commented May 6, 2024

View reviewed changes

silvanocerza requested a review from shadeMe May 6, 2024 14:21

silvanocerza added 2 commits May 6, 2024 16:21

Fix linting

a9c637e

Use enum for truncate mode in embedders

1dd58b7

shadeMe requested changes May 7, 2024

View reviewed changes

silvanocerza added 3 commits May 7, 2024 13:46

Change how truncate argument is handled

45d4fa6

Fix truncate conversion

60e2c03

Update truncate docstring

dce0734

shadeMe approved these changes May 7, 2024

View reviewed changes

silvanocerza merged commit 3c14c52 into main May 7, 2024
10 checks passed

silvanocerza deleted the nvidia-api-catalog branch May 7, 2024 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Nvidia integration to support new endpoints #701

Update Nvidia integration to support new endpoints #701

silvanocerza commented Apr 30, 2024

mattf left a comment

shadeMe left a comment

shadeMe commented May 3, 2024

mattf commented May 3, 2024

silvanocerza May 6, 2024

Update Nvidia integration to support new endpoints #701

Update Nvidia integration to support new endpoints #701

Conversation

silvanocerza commented Apr 30, 2024

mattf left a comment

Choose a reason for hiding this comment

shadeMe left a comment

Choose a reason for hiding this comment

shadeMe commented May 3, 2024

mattf commented May 3, 2024

silvanocerza May 6, 2024

Choose a reason for hiding this comment