Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on connection disconnect #4178

Merged
merged 32 commits into from
Jun 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
d45db9a
fix(http_handler.py): retry on httpx connection errors
krrishdholakia Jun 13, 2024
46d5752
fix(http_handler.py): add retry logic on httpx.remoteprotocolerror
krrishdholakia Jun 13, 2024
e93727b
test(test_router_debug_logs.py): fix test
krrishdholakia Jun 14, 2024
ba88264
fix - fix redacting messages litellm
ishaan-jaff Jun 13, 2024
a166db1
fix config
ishaan-jaff Jun 13, 2024
cb0639d
fix - redacting messages
ishaan-jaff Jun 13, 2024
6a674b5
test test_redact_msgs_from_logs
ishaan-jaff Jun 13, 2024
ec027b7
docs(alerting.md): add microsoft teams alerting to docs
krrishdholakia Jun 13, 2024
4408c78
docs(alerting.md): add expected response teams alerting image to docs
krrishdholakia Jun 13, 2024
ddb60ed
fix(caching.py): Stop throwing constant spam errors on every single S…
Manouchehri Jun 13, 2024
1a9ab1d
fix - clean up swagger spend endpoints
ishaan-jaff Jun 13, 2024
330df10
feat - add remaining team budget gauge
ishaan-jaff Jun 13, 2024
8b7b2ee
feat - add remaining budget for key on prometheus
ishaan-jaff Jun 13, 2024
0f94133
docs - budget metrics litellm
ishaan-jaff Jun 13, 2024
6d44c56
fix bug when updating team
ishaan-jaff Jun 13, 2024
f6d3865
fix - ui show correct team budget when budget = 0.0
ishaan-jaff Jun 13, 2024
86d64c3
build(ui/teams.tsx): allow resetting teams budget
krrishdholakia Jun 13, 2024
1450a77
build: allow resetting customer budget weekly + edit customer budget …
krrishdholakia Jun 13, 2024
b37e471
feat(__init__.py): allow setting drop_params as an env
krrishdholakia Jun 13, 2024
9598209
doc - setting team budgets
ishaan-jaff Jun 13, 2024
62ff15f
fix /team/update
ishaan-jaff Jun 13, 2024
397a3d8
doc - team based budgets
ishaan-jaff Jun 13, 2024
ecdb817
doc - setting team budgets
ishaan-jaff Jun 14, 2024
17b8a31
update swagger for /team endpoints
ishaan-jaff Jun 14, 2024
39c0f29
fix - update team
ishaan-jaff Jun 14, 2024
bf04085
bump: version 1.40.10 → 1.40.11
ishaan-jaff Jun 14, 2024
80aa492
doc fix creating team budgets
ishaan-jaff Jun 14, 2024
17556cf
llama 3
themrzmaster Jun 14, 2024
8498600
feat(proxy/utils.py): allow budget duration in months
krrishdholakia Jun 13, 2024
acdd8f1
build(ui): new build
krrishdholakia Jun 14, 2024
7117e56
bump: version 1.40.11 → 1.40.12
krrishdholakia Jun 14, 2024
64f50c0
docs(team_budgets.md): fix docs
krrishdholakia Jun 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 49 additions & 1 deletion docs/my-website/docs/proxy/alerting.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
import Image from '@theme/IdealImage';

# 🚨 Alerting / Webhooks

Get alerts for:
Expand All @@ -15,6 +17,11 @@ Get alerts for:
- **Spend** Weekly & Monthly spend per Team, Tag


Works across:
- [Slack](#quick-start)
- [Discord](#advanced---using-discord-webhooks)
- [Microsoft Teams](#advanced---using-ms-teams-webhooks)

## Quick Start

Set up a slack alert channel to receive alerts from proxy.
Expand Down Expand Up @@ -108,6 +115,48 @@ AlertType = Literal[
```


## Advanced - Using MS Teams Webhooks

MS Teams provides a slack compatible webhook url that you can use for alerting

##### Quick Start

1. [Get a webhook url](https://learn.microsoft.com/en-us/microsoftteams/platform/webhooks-and-connectors/how-to/add-incoming-webhook?tabs=newteams%2Cdotnet#create-an-incoming-webhook) for your Microsoft Teams channel

2. Add it to your .env

```bash
SLACK_WEBHOOK_URL="https://berriai.webhook.office.com/webhookb2/...6901/IncomingWebhook/b55fa0c2a48647be8e6effedcd540266/e04b1092-4a3e-44a2-ab6b-29a0a4854d1d"
```

3. Add it to your litellm config

```yaml
model_list:
model_name: "azure-model"
litellm_params:
model: "azure/gpt-35-turbo"
api_key: "my-bad-key" # 👈 bad key

general_settings:
alerting: ["slack"]
alerting_threshold: 300 # sends alerts if requests hang for 5min+ and responses take 5min+
```

4. Run health check!

Call the proxy `/health/services` endpoint to test if your alerting connection is correctly setup.

```bash
curl --location 'http:https://0.0.0.0:4000/health/services?service=slack' \
--header 'Authorization: Bearer sk-1234'
```


**Expected Response**

<Image img={require('../../img/ms_teams_alerting.png')}/>

## Advanced - Using Discord Webhooks

Discord provides a slack compatible webhook url that you can use for alerting
Expand Down Expand Up @@ -139,7 +188,6 @@ environment_variables:
SLACK_WEBHOOK_URL: "https://discord.com/api/webhooks/1240030362193760286/cTLWt5ATn1gKmcy_982rl5xmYHsrM1IWJdmCL1AyOmU9JdQXazrp8L1_PYgUtgxj8x4f/slack"
```

That's it ! You're ready to go !

## Advanced - [BETA] Webhooks for Budget Alerts

Expand Down
9 changes: 8 additions & 1 deletion docs/my-website/docs/proxy/prometheus.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Grafana, Prometheus metrics [BETA]
# 📈 Prometheus metrics [BETA]

LiteLLM Exposes a `/metrics` endpoint for Prometheus to Poll

Expand Down Expand Up @@ -54,6 +54,13 @@ http:https://localhost:4000/metrics
| `litellm_total_tokens` | input + output tokens per `"user", "key", "model", "team", "end-user"` |
| `litellm_llm_api_failed_requests_metric` | Number of failed LLM API requests per `"user", "key", "model", "team", "end-user"` |

### Budget Metrics
| Metric Name | Description |
|----------------------|--------------------------------------|
| `litellm_remaining_team_budget_metric` | Remaining Budget for Team (A team created on LiteLLM) |
| `litellm_remaining_api_key_budget_metric` | Remaining Budget for API Key (A key Created on LiteLLM)|


## Monitor System Health

To monitor the health of litellm adjacent services (redis / postgres), do:
Expand Down
123 changes: 123 additions & 0 deletions docs/my-website/docs/proxy/team_budgets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
import Image from '@theme/IdealImage';
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# 💰 Setting Team Budgets

Track spend, set budgets for your Internal Team

## Setting Monthly Team Budgets

### 1. Create a team
- Set `max_budget=000000001` ($ value the team is allowed to spend)
- Set `budget_duration="1d"` (How frequently the budget should update)


Create a new team and set `max_budget` and `budget_duration`
```shell
curl -X POST 'http:https://0.0.0.0:4000/team/new' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{
"team_alias": "QA Prod Bot",
"max_budget": 0.000000001,
"budget_duration": "1d"
}'
```

Response
```shell
{
"team_alias": "QA Prod Bot",
"team_id": "de35b29e-6ca8-4f47-b804-2b79d07aa99a",
"max_budget": 0.0001,
"budget_duration": "1d",
"budget_reset_at": "2024-06-14T22:48:36.594000Z"
}
```



Possible values for `budget_duration`

| `budget_duration` | When Budget will reset |
| --- | --- |
| `budget_duration="1s"` | every 1 second |
| `budget_duration="1m"` | every 1 min |
| `budget_duration="1h"` | every 1 hour |
| `budget_duration="1d"` | every 1 day |
| `budget_duration="1mo"` | start of every month |


### 2. Create a key for the `team`

Create a key for `team_id="de35b29e-6ca8-4f47-b804-2b79d07aa99a"` from Step 1

💡 **The Budget for Team="QA Prod Bot" budget will apply to this team**

```shell
curl -X POST 'http:https://0.0.0.0:4000/key/generate' \
-H 'Authorization: Bearer sk-1234' \
-H 'Content-Type: application/json' \
-d '{"team_id": "de35b29e-6ca8-4f47-b804-2b79d07aa99a"}'
```

Response

```shell
{"team_id":"de35b29e-6ca8-4f47-b804-2b79d07aa99a", "key":"sk-5qtncoYjzRcxMM4bDRktNQ"}
```


### 3. Test It

Use the key from step 2 and run this Request twice
```shell
curl -X POST 'http:https://0.0.0.0:4000/chat/completions' \
-H 'Authorization: Bearer sk-mso-JSykEGri86KyOvgxBw' \
-H 'Content-Type: application/json' \
-d ' {
"model": "llama3",
"messages": [
{
"role": "user",
"content": "hi"
}
]
}'
```

On the 2nd response - expect to see the following exception

```shell
{
"error": {
"message": "Budget has been exceeded! Current cost: 3.5e-06, Max budget: 1e-09",
"type": "auth_error",
"param": null,
"code": 400
}
}
```

## Advanced

### Prometheus metrics for `remaining_budget`

[More info about Prometheus metrics here](https://docs.litellm.ai/docs/proxy/prometheus)

You'll need the following in your proxy config.yaml

```yaml
litellm_settings:
success_callback: ["prometheus"]
failure_callback: ["prometheus"]
```

Expect to see this metric on prometheus to track the Remaining Budget for the team

```shell
litellm_remaining_team_budget_metric{team_alias="QA Prod Bot",team_id="de35b29e-6ca8-4f47-b804-2b79d07aa99a"} 9.699999999999992e-06
```


Binary file added docs/my-website/img/ms_teams_alerting.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion docs/my-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ const sidebars = {
"proxy/self_serve",
"proxy/users",
"proxy/customers",
"proxy/team_budgets",
"proxy/billing",
"proxy/user_keys",
"proxy/virtual_keys",
Expand All @@ -54,6 +55,7 @@ const sidebars = {
items: ["proxy/logging", "proxy/streaming_logging"],
},
"proxy/ui",
"proxy/prometheus",
"proxy/email",
"proxy/multiple_admins",
"proxy/team_based_routing",
Expand All @@ -70,7 +72,6 @@ const sidebars = {
"proxy/pii_masking",
"proxy/prompt_injection",
"proxy/caching",
"proxy/prometheus",
"proxy/call_hooks",
"proxy/rules",
"proxy/cli",
Expand Down
2 changes: 1 addition & 1 deletion litellm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
)
telemetry = True
max_tokens = 256 # OpenAI Defaults
drop_params = False
drop_params = bool(os.getenv("LITELLM_DROP_PARAMS", False))
modify_params = False
retry = True
### AUTH ###
Expand Down
2 changes: 1 addition & 1 deletion litellm/caching.py
Original file line number Diff line number Diff line change
Expand Up @@ -1192,7 +1192,7 @@ def get_cache(self, key, **kwargs):
return cached_response
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "NoSuchKey":
verbose_logger.error(
verbose_logger.debug(
f"S3 Cache: The specified key '{key}' does not exist in the S3 bucket."
)
return None
Expand Down
30 changes: 15 additions & 15 deletions litellm/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 401
self.message = message
self.message = "litellm.AuthenticationError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -72,7 +72,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 404
self.message = message
self.message = "litellm.NotFoundError: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -117,7 +117,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 400
self.message = message
self.message = "litellm.BadRequestError: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -162,7 +162,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 422
self.message = message
self.message = "litellm.UnprocessableEntityError: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -204,7 +204,7 @@ def __init__(
request=request
) # Call the base class constructor with the parameters it needs
self.status_code = 408
self.message = message
self.message = "litellm.Timeout: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -241,7 +241,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 403
self.message = message
self.message = "litellm.PermissionDeniedError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -280,7 +280,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 429
self.message = message
self.message = "litellm.RateLimitError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -328,7 +328,7 @@ def __init__(
litellm_debug_info: Optional[str] = None,
):
self.status_code = 400
self.message = message
self.message = "litellm.ContextWindowExceededError: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -368,7 +368,7 @@ def __init__(
litellm_debug_info: Optional[str] = None,
):
self.status_code = 400
self.message = message
self.message = "litellm.RejectedRequestError: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -411,7 +411,7 @@ def __init__(
litellm_debug_info: Optional[str] = None,
):
self.status_code = 400
self.message = message
self.message = "litellm.ContentPolicyViolationError: {}".format(message)
self.model = model
self.llm_provider = llm_provider
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -452,7 +452,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 503
self.message = message
self.message = "litellm.ServiceUnavailableError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -501,7 +501,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = 500
self.message = message
self.message = "litellm.InternalServerError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -552,7 +552,7 @@ def __init__(
num_retries: Optional[int] = None,
):
self.status_code = status_code
self.message = message
self.message = "litellm.APIError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.litellm_debug_info = litellm_debug_info
Expand Down Expand Up @@ -589,7 +589,7 @@ def __init__(
max_retries: Optional[int] = None,
num_retries: Optional[int] = None,
):
self.message = message
self.message = "litellm.APIConnectionError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
self.status_code = 500
Expand Down Expand Up @@ -626,7 +626,7 @@ def __init__(
max_retries: Optional[int] = None,
num_retries: Optional[int] = None,
):
self.message = message
self.message = "litellm.APIResponseValidationError: {}".format(message)
self.llm_provider = llm_provider
self.model = model
request = httpx.Request(method="POST", url="https://api.openai.com/v1")
Expand Down
Loading
Loading