LLM router basic template #203

anmscale · 2024-05-07T22:43:27Z

Implementing:

basic flow for data labeling with gpt4 as a judge
basic flow for evaluation

To do:

add 1-2 simple baslines (frequency, BoW classifier)
Iterate on the explanation and overall story

In a following commit:

finetune llm router

kouroshHakha

Overall I think it's going in the right direction, Let's continue. It's half baked towards the end so I didn't put feedback there.

A few things I like so far that I think we should make standard for all the templates:

README.ipynb should not have implementation details in code. All the code implementation should be abstracted in a separate module / function that gets imported and just used.
We should use diagrams when they cut the need for more words.

kouroshHakha · 2024-05-13T16:49:56Z

templates/llm-router/README.ipynb

+ "source": [
+ "# Background\n",
+ "\n",
+ "Whenever we use an LLM we would like to get the highest response quality but are often restricted to a limited cost budget. Closed models, such as GPT-4, are known to be the highest quality models, but they can get very expensive especially when running them on a very number of queries. On the other hand, OSS models can be much cheaper, but their responses may not be of the same quality, especially for complex or domain-specific queries.\n",


Suggested change

"Whenever we use an LLM we would like to get the highest response quality but are often restricted to a limited cost budget. Closed models, such as GPT-4, are known to be the highest quality models, but they can get very expensive especially when running them on a very number of queries. On the other hand, OSS models can be much cheaper, but their responses may not be of the same quality, especially for complex or domain-specific queries.\n",

"Whenever we use an LLM we would like to get the highest response quality but are often restricted to a limited cost budget. Closed models, such as GPT-4, are known to be the highest quality models, but they can get very expensive especially when running them on a very large number of queries. On the other hand, OSS models can be much cheaper, but their responses may not be of the same quality, especially for complex or domain-specific queries.\n",

kouroshHakha · 2024-05-13T16:53:56Z

templates/llm-router/README.ipynb

+ "\n",
+ "Whenever we use an LLM we would like to get the highest response quality but are often restricted to a limited cost budget. Closed models, such as GPT-4, are known to be the highest quality models, but they can get very expensive especially when running them on a very number of queries. On the other hand, OSS models can be much cheaper, but their responses may not be of the same quality, especially for complex or domain-specific queries.\n",
+ "\n",
+ "The goal of this tutorial is to show you how you can train a \"smart router\", i.e. a model that can dynamically decide, based on the query text, whether to call a closed model or an OSS model. Here's a schematic view of a smart router:\n",


Do we want to use "smart router" name? I was thinking we should use "dynamic router" since this is also what chatgpt called it and it will sound more familiar. + SEO will get boosted because openAI used that term. (It's just personal gut feeling)

kouroshHakha · 2024-05-13T16:56:49Z

templates/llm-router/README.ipynb

+ "Whenever we use an LLM we would like to get the highest response quality but are often restricted to a limited cost budget. Closed models, such as GPT-4, are known to be the highest quality models, but they can get very expensive especially when running them on a very number of queries. On the other hand, OSS models can be much cheaper, but their responses may not be of the same quality, especially for complex or domain-specific queries.\n",
+ "\n",
+ "The goal of this tutorial is to show you how you can train a \"smart router\", i.e. a model that can dynamically decide, based on the query text, whether to call a closed model or an OSS model. Here's a schematic view of a smart router:\n",
+ "![Smart Router](assets/router_schema.png)\n",


in the diagram, for the green box, let's say "OSS e.g. Mixtral". The point is that users can repeat this between any 2 or N models (even between gpt-3.5 and gpt-4 themselves)

kouroshHakha · 2024-05-13T16:58:41Z

templates/llm-router/README.ipynb

+ "We are going to train a classifier to decide, based only on the query text, whether to route the query to an OSS model vs. a closed one. In this tutorial, we will make the following design choices: \n",
+ "1. We will quantify a response quality on a scale of `[1, 5]` (5-star).\n",
+ "2. For simplicity, we will assume that the closed always achieves 5-start quality. \n",
+ "3. We will use GPT-4 as a representative of closed models and Mixtral 8x7B for OSS models.\n",


Suggested change

"3. We will use GPT-4 as a representative of closed models and Mixtral 8x7B for OSS models.\n",

"3. We will use GPT-4 as a representative for closed models and Mixtral 8x7B for OSS models.\n",

kouroshHakha · 2024-05-13T16:59:47Z

templates/llm-router/README.ipynb

+ "2. For simplicity, we will assume that the closed always achieves 5-start quality. \n",
+ "3. We will use GPT-4 as a representative of closed models and Mixtral 8x7B for OSS models.\n",
+ "\n",
+ "More concurrently, let us assume that closed models have perfect a quality (5/5 score). our goal is to reduce cost significantly (say by 50%) while maintaining a high overal quality (4.8/5 score).\n"


Suggested change

"More concurrently, let us assume that closed models have perfect a quality (5/5 score). our goal is to reduce cost significantly (say by 50%) while maintaining a high overal quality (4.8/5 score).\n"

"More concretely, let us assume that closed models have perfect a quality (5/5 score). Our goal is to reduce cost significantly (say by 50%) while maintaining a high overall quality (score of 4 to 5).\n"

kouroshHakha · 2024-05-13T17:37:29Z

templates/llm-router/data_utils.py

+ queries = {}
+ for pidx, row in dataset_df.to_dict(orient="index").items():
+ prompt = row["prompt"]
+ if type(prompt) == str:


Suggested change

if type(prompt) == str:

if isinstance(prompt, str):

kouroshHakha · 2024-05-13T17:39:51Z

templates/llm-router/data_utils.py

+ return train_df, validation_df
+
+
+def visualize_label_distribution(dataset_df, key):


Let's move all visualization methods to a different module.

kouroshHakha · 2024-05-13T17:41:08Z

templates/llm-router/evaluation_metrics.py

+ return average_score, routing_percentage, score_auc
+
+
+def plot_quality_cost_curve(


Move visualization into another module (maybe viz.py)

kouroshHakha · 2024-05-13T17:42:33Z

templates/llm-router/online_inference.py

+
+
+@ray.remote(num_cpus=0)
+def get_llm_response(


Any mechanisms to guard against rate limits?

kouroshHakha · 2024-05-13T17:43:35Z

templates/llm-router/online_inference.py

+ return (pidx, "")
+
+
+def generate_batch_responses(


definitely make the docstring for this beefy.

GokuMohandas · 2024-05-15T16:27:18Z

templates/llm-router/README.ipynb

This looks good so far but I'm really interested in how you will explain/show the training workload in the notebook. I'm sure you'll explain parts of the config, how the new classifier head works, applying the template (and again during inference). I'm also excited for the serving part. I think you should have a very small cost analysis too!

Small errors:

"running them on a very number of queries"

"the closed always" --> "the closed model"

"use GPT-4 as a representative of closed models" --> "use GPT to represent closed models"

"More concurrently" --> "More concretely"

…provements Stable diffusion pretraining improvements

Signed-off-by: SumanthRH <[email protected]>

Co-authored-by: kourosh hakhamaneshi <[email protected]>

Signed-off-by: SumanthRH <[email protected]>

[LLM Serving Template] Updated command to non-jupyter cell

…valuation Signed-off-by: SumanthRH <[email protected]>

Signed-off-by: SumanthRH <[email protected]>

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

…nm/llm-router

shomilj · 2024-06-27T05:00:48Z

configs/llm-router/aws.yaml

@@ -0,0 +1,19 @@
+head_node_type:
+ name: head
+ instance_type: p4de.24xlarge


A100 head nodes are not available in Hosted OA - if this template will be exposed in Hosted OA, can we use the serverless config (just directly request a GPU resource of A100-80G and allow the autoscaler to upscale it?)

If we are not planning on exposing this through OA, then it doesn't matter as much. But it's still better practice to run workloads on workers and use cheap CPU nodes for development.

would you consider A10 as cheap gpu? I have enabled training on g5.48xlarge and launched jobs successfully with it, so I can update this config

shomilj · 2024-06-28T00:43:24Z

configs/llm-router/aws.yaml

+ resources:
+ cpu: 8
+
+auto_select_worker_config: true


can you delete everything from this line below? none of it should be needed for single node

shomilj · 2024-06-28T00:43:41Z

configs/llm-router/aws.yaml

+ name: head
+ instance_type: g5.48xlarge
+ resources:
+ cpu: 8


you can delete the logical resource entry here as well

akshay-anyscale · 2024-06-28T01:07:56Z

templates/llm-router/README.md

+!pip install -e .[eval]
+```
+
+ fatal: destination path '/home/ray/default/RouteLLM' already exists and is not an empty directory.


probably don't need to commit the output cells

This one skipped me, good catch! Do you suggest I remove all of them? I kept only a summary showing what the user will see, but maybe not important

yeah I suggest removing all of them unless there's some really important output to display

akshay-anyscale · 2024-07-08T17:56:43Z

configs/llm-router/aws.yaml

@@ -0,0 +1,7 @@
+head_node_type:


you don't need these anymore

@shomilj asked me to keep them but remove worker node configs.

yeah but since you merged into the existing template, you don't need new compute config files at all

ok let me remove those files then

@akshay-anyscale I am not sure what to do about the landing it in the product repo. There I need to specify configs, see e.g. https://github.com/anyscale/product/blob/master/backend/workspace-templates.yaml#L84 and I don't think the configs here: https://github.com/anyscale/product/blob/master/backend/workspace-templates.yaml#L246C12-L246C43 would work

you shouldn't have to make a product repo change for this since the files are in the existing template. Is the only gap that for GCE it doesn't have the serverless config? @kouroshHakha why is that the case?

why not make use of basic-server-less configs everywhere?

head_node_type: name: head instance_type: n1-standard-8 worker_node_types: [] auto_select_worker_config: true

I don't know why llm finetuning is not on serverless for gce. Maybe that has slipped during transition for some reason? I haven't noticed that until now.

I think @anmscale mentioned a GPU head node was a hard req for this workspace; if that has changed, yes, please, let's use serverless :)

anmscale · 2024-07-11T18:01:59Z

Need to rename the branch name to avoid / for template to run

anmscale requested review from GokuMohandas, kouroshHakha and akshay-anyscale May 7, 2024 22:43

kouroshHakha reviewed May 13, 2024

View reviewed changes

GokuMohandas reviewed May 15, 2024

View reviewed changes

marwan116 and others added 25 commits May 16, 2024 15:24

Merge pull request #215 from anyscale/stable-diffusion-pretraining-im…

ea5e08c

…provements Stable diffusion pretraining improvements

Merge branch 'main' into sumanthrh/function-calling

ac68158

cleaned up code; added type hints and addressed review comments

4eb3864

Signed-off-by: SumanthRH <[email protected]>

minor comments

a0aab93

Signed-off-by: SumanthRH <[email protected]>

minor docstring edits

5129ab0

Signed-off-by: SumanthRH <[email protected]>

revamp preprocessing and function extraction

e83a44d

Signed-off-by: SumanthRH <[email protected]>

Apply suggestions from code review

71b12bd

Co-authored-by: kourosh hakhamaneshi <[email protected]>

revamp eval utils and separate openai and anyscale postprocessing

cec98ba

Signed-off-by: SumanthRH <[email protected]>

added a response parser dataclass, refactored eval utils

f56287c

Signed-off-by: SumanthRH <[email protected]>

fixed response parsing issues with tool response id

835b2c7

Signed-off-by: SumanthRH <[email protected]>

switch to simple id for function calls

04a29b0

Signed-off-by: SumanthRH <[email protected]>

remove call id in training; refactor function extraction

747f1b8

Signed-off-by: SumanthRH <[email protected]>

remove testing code in preprocessing

4e6e5ac

Signed-off-by: SumanthRH <[email protected]>

switched to colorama for pprinting

d1586a4

Signed-off-by: SumanthRH <[email protected]>

minor fixes

024adb0

Signed-off-by: SumanthRH <[email protected]>

minor fixes

050900f

Signed-off-by: SumanthRH <[email protected]>

convert pprinted string to html for display in markdown

52cd25f

Signed-off-by: SumanthRH <[email protected]>

add print utils

35abba8

Signed-off-by: SumanthRH <[email protected]>

Updated command to non-jupyter cell

e4c72b6

Merge pull request #219 from anyscale/sp/polish-rayllm

dd1b06b

[LLM Serving Template] Updated command to non-jupyter cell

better error handling; improved check_match logic; added base model e…

51fceaf

…valuation Signed-off-by: SumanthRH <[email protected]>

added final base model results and figures

f080802

Signed-off-by: SumanthRH <[email protected]>

add figures

93ba8c8

Signed-off-by: SumanthRH <[email protected]>

move fwd and model tag into the yamls

ed8b629

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

wip

fe1ac56

Signed-off-by: Kourosh Hakhamaneshi <[email protected]>

GokuMohandas and others added 6 commits June 25, 2024 09:41

Merge branch 'main' into anm/llm-router

0fd2bcc

add md files

ed80144

Merge branch 'anm/llm-router' of github.com:anyscale/templates into a…

3fdd882

…nm/llm-router

feature complete

d19563c

remove md files

0df9166

md modifs

bbe31a4

shomilj reviewed Jun 27, 2024

View reviewed changes

anmscale added 3 commits June 27, 2024 23:19

blog post changes

078effc

running e2e

3ed3003

change instance types

e0d2161

shomilj reviewed Jun 28, 2024

View reviewed changes

akshay-anyscale reviewed Jun 28, 2024

View reviewed changes

anmscale added 2 commits July 8, 2024 16:28

Merged main into your-branch-name and accepted all changes from main

2ad2f7a

latest changes

261ac02

anmscale force-pushed the anm/llm-router branch from ddf8da5 to 261ac02 Compare July 8, 2024 16:41

anmscale added 3 commits July 8, 2024 16:58

update configs

62d6d47

cleanup GPU mem before batch eval

3e22b28

move llm-router to ft e2e examples

6715890

akshay-anyscale reviewed Jul 8, 2024

View reviewed changes

akshay-anyscale approved these changes Jul 8, 2024

View reviewed changes

anmscale added 2 commits July 8, 2024 18:27

edit configs

d3be03a

remove configs

d29f2ad

scottsun94 approved these changes Jul 8, 2024

View reviewed changes

anmscale added 3 commits July 8, 2024 19:51

put llm-router specific configs

55ee2e2

moving to a separate template

656c650

update to recent llmforge image

da71a45

anmscale closed this Jul 11, 2024

anmscale deleted the anm/llm-router branch July 11, 2024 17:59

anmscale mentioned this pull request Jul 11, 2024

Add llm router template #265

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM router basic template #203

LLM router basic template #203

anmscale commented May 7, 2024

kouroshHakha left a comment

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

kouroshHakha May 13, 2024

GokuMohandas May 15, 2024

shomilj Jun 27, 2024

anmscale Jun 27, 2024 •

edited

Loading

shomilj Jun 28, 2024

shomilj Jun 28, 2024

akshay-anyscale Jun 28, 2024

anmscale Jun 28, 2024

akshay-anyscale Jun 30, 2024

akshay-anyscale Jul 8, 2024

anmscale Jul 8, 2024

akshay-anyscale Jul 8, 2024 •

edited

Loading

anmscale Jul 8, 2024

anmscale Jul 8, 2024

akshay-anyscale Jul 8, 2024

kouroshHakha Jul 8, 2024

kouroshHakha Jul 8, 2024

shomilj Jul 8, 2024

anmscale commented Jul 11, 2024

	"3. We will use GPT-4 as a representative of closed models and Mixtral 8x7B for OSS models.\n",
	"3. We will use GPT-4 as a representative for closed models and Mixtral 8x7B for OSS models.\n",

	"More concurrently, let us assume that closed models have perfect a quality (5/5 score). our goal is to reduce cost significantly (say by 50%) while maintaining a high overal quality (4.8/5 score).\n"
	"More concretely, let us assume that closed models have perfect a quality (5/5 score). Our goal is to reduce cost significantly (say by 50%) while maintaining a high overall quality (score of 4 to 5).\n"

		return train_df, validation_df


		def visualize_label_distribution(dataset_df, key):

		return average_score, routing_percentage, score_auc


		def plot_quality_cost_curve(

LLM router basic template #203

LLM router basic template #203

Conversation

anmscale commented May 7, 2024

kouroshHakha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmscale Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

akshay-anyscale Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anmscale commented Jul 11, 2024

anmscale Jun 27, 2024 •

edited

Loading

akshay-anyscale Jul 8, 2024 •

edited

Loading