Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Nvidia integration to support new endpoints #701

Merged
merged 13 commits into from
May 7, 2024

Conversation

silvanocerza
Copy link
Contributor

Fixes #696.

This PR updates both Nvidia generator and embedders to support the new API catalog endpoints.

These new endpoints require a different API key from the previous ones so I opted to use an NVIDIA_CATALOG_API_KEY env var in tests to differentiate it from the previous one.

@silvanocerza silvanocerza self-assigned this Apr 30, 2024
@silvanocerza silvanocerza requested a review from a team as a code owner April 30, 2024 09:20
@silvanocerza silvanocerza requested review from shadeMe and removed request for a team April 30, 2024 09:20
Copy link
Contributor

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the concept of NIM is a service + model that can be deployed anywhere. we host some on https://build.nvidia.com and users can export them to run on their own infrastructure. we happen to use NVIDIA Cloud Functions (nvcf) to serve the hosted NIMs.

the generator nims follow a /chat/completion api (e.g. https://docs.api.nvidia.com/nim/reference/meta-llama3-8b-infer), the embedding nims follow https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer

  1. api keys
    • we aim to let users provide param nvidia_api_key -> param api_key -> env NVIDIA_API_KEY, where nvidia_api_key takes precedence over api_key which takes precedence over NVIDIA_API_KEY
    • local nim does not require an api key
    • it is recommended to use environment for keys
  2. mode switching for hosted and local nim w/ mode()
    • parameters are a mode name, literal["nvidia", "nim"], an api key for nvidia mode and base_url for nim mode
    • nvidia is the default mode
    • e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including /v1, but arguably shouldn't. we're not including /chat/completion or /embedding because the service should provide /models at the same base url
  3. https://catalog.ngc.nvidia.com/ai-foundation-models are deprecated, e.g. nvolveqa_40k, users should use models from https://build.nvidia.com, e.g. https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4
  4. Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?
  5. embedding models support a truncate parameter for service-side truncation, see https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer

Copy link
Contributor

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor changes.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label May 3, 2024
@shadeMe
Copy link
Contributor

shadeMe commented May 3, 2024

@mattf Thanks very much for your comments. A few questions/comments from my side:

1. api keys
   
   * we aim to let users provide param `nvidia_api_key` -> param `api_key` -> env `NVIDIA_API_KEY`, where `nvidia_api_key` takes precedence over `api_key` which takes precedence over `NVIDIA_API_KEY`
   * local nim does not require an api key
   * it is recommended to use environment for keys

Could you expand on that a bit more? Why is there a need for two parameters that accept an authentication secret?

2. mode switching for hosted and local nim w/ `mode()`
   
   * parameters are a mode name, `literal["nvidia", "nim"]`, an api key for nvidia mode and `base_url` for nim mode
   * nvidia is the default mode
   * e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including `/v1`, but arguably shouldn't. we're not including `/chat/completion` or `/embedding` because the service should provide `/models` at the same base url

Correct me if I'm wrong, but it sounds like you're suggesting the above to make the api_key and base_url parameters mutually exclusive? Haystack components don't modify their (user-facing) state after initialization, so a mode method would not be suitable here.

3. [catalog.ngc.nvidia.com/ai-foundation-models](https://catalog.ngc.nvidia.com/ai-foundation-models) are deprecated, e.g. `nvolveqa_40k`, users should use models from [build.nvidia.com](https://build.nvidia.com), e.g. [docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4](https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4)

Okay, so we can safely remove support for that "backend" then? We call it the NvCfBackend in our code.

4. Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?

This depends on how the embedding model is trained. Some wrap their inputs in special markers/meta tokens to indicate their semantics, i.e., is the text part of a query or a passage. As such, those parameters are standard in Haystack embedding components.

5. embedding models support a `truncate` parameter for service-side truncation, see [docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer](https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer)

Thanks for the tip; we can add support for that.

@mattf
Copy link
Contributor

mattf commented May 3, 2024

@mattf Thanks very much for your comments. A few questions/comments from my side:

1. api keys
   
   * we aim to let users provide param `nvidia_api_key` -> param `api_key` -> env `NVIDIA_API_KEY`, where `nvidia_api_key` takes precedence over `api_key` which takes precedence over `NVIDIA_API_KEY`
   * local nim does not require an api key
   * it is recommended to use environment for keys

Could you expand on that a bit more? Why is there a need for two parameters that accept an authentication secret?

the recommendation is for people to use the NVIDIA_API_KEY env variable, which makes the key handling transparent & easy to switch as well as being more secure by avoiding keys in code. if we only had one way, this should be it.

other connectors are using api_key as a param, so we're aligning with that.

the nvidia_api_key is an expectation we set early on in other frameworks. even today many docs about using NVIDIA APIs still reference nvidia_api_key, though not for Haystack.

having >1 way to do this definitely adds cognitive load.

2. mode switching for hosted and local nim w/ `mode()`
   
   * parameters are a mode name, `literal["nvidia", "nim"]`, an api key for nvidia mode and `base_url` for nim mode
   * nvidia is the default mode
   * e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including `/v1`, but arguably shouldn't. we're not including `/chat/completion` or `/embedding` because the service should provide `/models` at the same base url

Correct me if I'm wrong, but it sounds like you're suggesting the above to make the api_key and base_url parameters mutually exclusive? Haystack components don't modify their (user-facing) state after initialization, so a mode method would not be suitable here.

+1 immutable.

what's the idiomatic way to do this in Haystack?

3. [catalog.ngc.nvidia.com/ai-foundation-models](https://catalog.ngc.nvidia.com/ai-foundation-models) are deprecated, e.g. `nvolveqa_40k`, users should use models from [build.nvidia.com](https://build.nvidia.com), e.g. [docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4](https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4)

Okay, so we can safely remove support for that "backend" then? We call it the NvCfBackend in our code.

we have users of the deprecated models in other communities. our approach is to raise deprecation warnings.

depending on what you think the usage is, you could do the same.

4. Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?

This depends on how the embedding model is trained. Some wrap their inputs in special markers/meta tokens to indicate their semantics, i.e., is the text part of a query or a passage. As such, those parameters are standard in Haystack embedding components.

https://huggingface.co/intfloat/e5-large#faq an example of that?

for NIMs, the service-side handles this based on the input_type parameter. if a user specifies prefix="query: " and runs NVIDIATextEmbedder they'll effective get embeddings for "query: query: {text}"

@@ -17,6 +18,7 @@ def __init__(
api_key: Secret,
model_kwargs: Optional[Dict[str, Any]] = None,
):
warnings.warn("Nvidia NGC is deprecated, use Nvidia NIM instead.", DeprecationWarning, stacklevel=2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a link to the documentation after it's published.

@silvanocerza silvanocerza requested a review from shadeMe May 6, 2024 14:21
@silvanocerza silvanocerza merged commit 3c14c52 into main May 7, 2024
10 checks passed
@silvanocerza silvanocerza deleted the nvidia-api-catalog branch May 7, 2024 15:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integration:nvidia topic:CI type:documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for new NVidia APIs
3 participants