-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Nvidia integration to support new endpoints #701
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the concept of NIM is a service + model that can be deployed anywhere. we host some on https://build.nvidia.com and users can export them to run on their own infrastructure. we happen to use NVIDIA Cloud Functions (nvcf) to serve the hosted NIMs.
the generator nims follow a /chat/completion
api (e.g. https://docs.api.nvidia.com/nim/reference/meta-llama3-8b-infer), the embedding nims follow https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer
- api keys
- we aim to let users provide param
nvidia_api_key
-> paramapi_key
-> envNVIDIA_API_KEY
, wherenvidia_api_key
takes precedence overapi_key
which takes precedence overNVIDIA_API_KEY
- local nim does not require an api key
- it is recommended to use environment for keys
- we aim to let users provide param
- mode switching for hosted and local nim w/
mode()
- parameters are a mode name,
literal["nvidia", "nim"]
, an api key for nvidia mode andbase_url
for nim mode - nvidia is the default mode
- e.g. NVIDIAGenerator().mode("nim", base_url="http:https://.../v1") - we're including
/v1
, but arguably shouldn't. we're not including/chat/completion
or/embedding
because the service should provide/models
at the same base url
- parameters are a mode name,
- https://catalog.ngc.nvidia.com/ai-foundation-models are deprecated, e.g.
nvolveqa_40k
, users should use models from https://build.nvidia.com, e.g. https://docs.api.nvidia.com/nim/reference/nvidia-embed-qa-4 - Embedder.prefix/suffix, these aren't common for NIM embedding models, are they idiomatic for haystack?
- embedding models support a
truncate
parameter for service-side truncation, see https://docs.api.nvidia.com/nim/reference/nvidia-embedding-2b-infer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of minor changes.
integrations/nvidia/src/haystack_integrations/components/embedders/nvidia/_nim_backend.py
Outdated
Show resolved
Hide resolved
integrations/nvidia/src/haystack_integrations/components/generators/nvidia/_nim_backend.py
Outdated
Show resolved
Hide resolved
integrations/nvidia/src/haystack_integrations/components/generators/nvidia/_nim_backend.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Madeesh Kannan <[email protected]>
@mattf Thanks very much for your comments. A few questions/comments from my side:
Could you expand on that a bit more? Why is there a need for two parameters that accept an authentication secret?
Correct me if I'm wrong, but it sounds like you're suggesting the above to make the
Okay, so we can safely remove support for that "backend" then? We call it the
This depends on how the embedding model is trained. Some wrap their inputs in special markers/meta tokens to indicate their semantics, i.e., is the text part of a query or a passage. As such, those parameters are standard in Haystack embedding components.
Thanks for the tip; we can add support for that. |
the recommendation is for people to use the NVIDIA_API_KEY env variable, which makes the key handling transparent & easy to switch as well as being more secure by avoiding keys in code. if we only had one way, this should be it. other connectors are using api_key as a param, so we're aligning with that. the nvidia_api_key is an expectation we set early on in other frameworks. even today many docs about using NVIDIA APIs still reference nvidia_api_key, though not for Haystack. having >1 way to do this definitely adds cognitive load.
+1 immutable. what's the idiomatic way to do this in Haystack?
we have users of the deprecated models in other communities. our approach is to raise deprecation warnings. depending on what you think the usage is, you could do the same.
https://huggingface.co/intfloat/e5-large#faq an example of that? for NIMs, the service-side handles this based on the |
@@ -17,6 +18,7 @@ def __init__( | |||
api_key: Secret, | |||
model_kwargs: Optional[Dict[str, Any]] = None, | |||
): | |||
warnings.warn("Nvidia NGC is deprecated, use Nvidia NIM instead.", DeprecationWarning, stacklevel=2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a link to the documentation after it's published.
Fixes #696.
This PR updates both Nvidia generator and embedders to support the new API catalog endpoints.
These new endpoints require a different API key from the previous ones so I opted to use an
NVIDIA_CATALOG_API_KEY
env var in tests to differentiate it from the previous one.