Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InferenceEndpoint Generator fails for AWS because of missing Authorization header #722

Open
erikinfo opened this issue Jun 4, 2024 · 4 comments
Assignees
Labels
generators Interfaces with LLMs

Comments

@erikinfo
Copy link

erikinfo commented Jun 4, 2024

The Authorization request header is necessary to establish a connection the the endpoint hosted on AWS Sagemaker.

The InferenceEndpoint inside huggingface.py should therefore have a new field consisting of auth header sent with the request, similiar to how API keys were treated using environment variables.

@leondz
Copy link
Owner

leondz commented Jun 5, 2024

Thanks, this is useful to know. We'd like to fix it. For clarity, I assume this is about huggingface.InferenceEndpoint and not one of the other InferenceEndpoint classes in garak - please correct this if needed.

@erikinfo Do you know how we can get a sample endpoint for testing & validation?

@jmartin-tech I think an approach similar to how payload is interpreted in nvcf.NvcfChat & .NvcfCompletion could work well here, instead of a custom generator. What do you think?

@leondz leondz added the generators Interfaces with LLMs label Jun 5, 2024
@erikinfo
Copy link
Author

erikinfo commented Jun 5, 2024

Yep,huggingface.InferenceEndpoint.

Unfortunately, I think there is no way besides hosting one yourself. I could also try and help with testing.
If you have an AWS account you can test the creation of an valid request object using this method:

import requests
import json
import boto3
from botocore.auth import SigV4Auth
from botocore.awsrequest import AWSRequest
from botocore.credentials import RefreshableCredentials
from botocore.session import get_session

# Initialize a Boto3 session
session = boto3.Session()

# Retrieve AWS credentials
credentials = session.get_credentials().get_frozen_credentials()

# Define your AWS region and service
region = 'us-east-1'
service = 'sagemaker'

# Define the endpoint URL
endpoint_url = 'https://<your-endpoint>.us-east-1.amazonaws.com/endpoints/<your-endpoint-name>/invocations'

# Define the headers
headers = {
    'Content-Type': 'application/json'
}

# Define the payload
payload = {
    "key1": "value1",
    "key2": "value2"
    # Add other payload data as needed
}

# Create an AWSRequest
request = AWSRequest(method='POST', url=endpoint_url, data=json.dumps(payload), headers=headers)

# Sign the request using SigV4Auth
SigV4Auth(credentials, service, region).add_auth(request)

print(request.method) # POST
print(request.url) # URL e.g. https://<your-endpoint>.us-east-1.amazonaws.com/endpoints/<your-endpoint-name>/invocations
print(dict(request.headers))
print(request.body) # data 

Please note that this method should just be an helper to pin point to the headers attributes that are needed.

The headers attribute should therefore be updated in the huggingface.InferenceEndpoint. Please allow the following header attributes to be added:

  • 'X-Amz-Date' is the current date in format: t = datetime.datetime.utcnow() amzdate = t.strftime('%Y%m%dT%H%M%SZ')
  • 'X-Amz-Security-Token'
  • 'Authorization' (a non-Bearer, AWS token)

@jmartin-tech
Copy link
Collaborator

jmartin-tech commented Jun 5, 2024

I think the payload pattern in NVCF is a reasonable however I think there may be an option to not have to handle the raw request by providing something like a _send_request method to keep the payload more huggingface specific and utilize AWS provided wrappers for requests. Might look something like this (obviously some syntax is not exact):

class AWSInferenceEndpoint(InferenceEndpoint):

    supports_multiple_generations = False

    def __init__(self, name="", generations=10, config_root=_config):
        super().__init__(name, generations=generations, config_root=config_root)
        # gather AWS details here and set on `self` if not provided by `_config`


    def _send_request(self, payload):
        request = AWSRequest(method=self.method, url=self.uri, data=json.dumps(payload), headers=headers)
        SigV4Auth(self.aws_credentials, self.aws_service, self.aws_region).add_auth(request)
        return request.post()

@leondz
Copy link
Owner

leondz commented Jun 6, 2024

@erikinfo Thanks tons for this detailed example, it should be really helpful. Also thanks for volunteering to help test. I hope we can get started using this guide, https://huggingface.co/docs/sagemaker/inference

@jmartin-tech This could work. Two separate generator classes for one named product (https://huggingface.co/docs/inference-endpoints/index) seems a little unintuitive to me, but if it reduced tech debt in exchange for a reasonably-sized dependency, that's a win. Maybe something for or closely after the generators.huggingface refactor?

@jmartin-tech jmartin-tech self-assigned this Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
generators Interfaces with LLMs
Projects
None yet
Development

No branches or pull requests

3 participants