RagRayAgent

Overview

This repository does the following:

Fine tuning an LLM
Populate a vector database with and embedding model, so able to query your context similarty in the vector database
Fine tune with Ray framework
Use CPU and GPU for fine tuning and serving
Serve your fine tuned LLM as REST API.

Configurations

Please set the API keys accordingly save the content below in llm_agent/.env.

OPENAI_API_KEY=
ANYSCALE_API_KEY=
OPENAI_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
DB_CONNECTION_STRING="postgresql:https://testUser:testPassword@localhost:15432/testDB"
EMBEDDING_INDEX_DIR=/tmp/embedding_index_sql
VECTOR_TABLE_NAME=document
VECTOR_TABLE_DUMP_OUTPUT_PATH=/tmp/vector.document.dump.sql
RAYDOCS_ROOT=/tmp/raydocs
NUM_CPUS=14
NUM_GPUS=1
NUM_CHUNKS=5
CHUNK_SIZE=500
CHUNK_OVERLAP=50
EMBEDDING_MODEL_NAME="thenlper/gte-base"
LLM_MODEL_NAME=meta-llama/Llama-2-70b-chat-hf

#How much data should be fed for fine tuning
#give a floating number between >0.001 and 1 (1 included, which means use all the data for fine tuning)
USE_THIS_PORTION_OF_DATA=0.05

Makefile Commands for Project Setup

make scrape # Scrap the web page

make vectordb # Configure Postgres Vector DB

make postgres-client # Install Postgres Client

Then, in a seperate terminal
make port-forward-postgres # Port Forward DB

make vector-support # Enable Vector Support

make vector-table # Create Vector Table

make embedding-table # Get Vector Table

# result:
               List of relations
 Schema |      Name       |   Type   |  Owner
--------+-----------------+----------+----------
 public | document        | table    | testUser
 public | document_id_seq | sequence | testUser
(2 rows)

make pods-preview # Get Pods

make install-pip-deps # Install Pip Dependencies

Finetuning

Once Setup, the following commands enable finetuning on a ray cluster:

make ray-cluster # Start Ray Cluster

make profile-ray-cluster # Profile Cluster

make finetune # Finetune LLM

At the end you will see something like below:

The default batch size for map_batches is rollout_fragment_length * num_envs.

which indicates that LLM fine tuning is done, vector db is populated, and a query is sent to LLM with the context identified by your vector DB.

Note: My machine has 16 CPUs and 1 GPU, so I set up NUM_CPUS and NUM_GPUs accordingly. These numbers may differ according to your machine. The principle here is that you can not set up a number larger than existing resources (CPU and GPU).

Pleae note that we are using thenlper/gte-base as an embedding model, this is a relatively small model, you might like to change it. LLM_MODEL_NAME is to meta-llama/Llama-2-70b-chat-hf, which is good for this setup, but again you might like to change it.

Serving

make dev-deploy

make test-query

Should yield:

b'"{\\"question\\": \\"What is the default batch size for map_batches?\\", \\"sources\\": [\\"https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-rollout-workers\\", \\"https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-rollout-workers\\", \\"https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.policy.Policy.compute_log_likelihoods.html#ray-rllib-policy-policy-policy-compute-log-likelihoods\\", \\"https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.policy.Policy.compute_log_likelihoods.html#ray-rllib-policy-policy-policy-compute-log-likelihoods\\", \\"https://docs.ray.io/en/master/rllib/rllib-algorithms.html#importance-weighted-actor-learner-architecture-impala\\"], \\"answer\\": \\" The default batch size for map_batches is rollout_fragment_length * num_envs.\\", \\"llm\\": \\"meta-llama/Llama-2-70b-chat-hf\\"}"'

TODO

Spot Instance/Fleet Provisioning for Cost Effective Training
CUDA devcontainer configurations
Dockerfiles
Terraform Configuration for 3-Tier Cloud Deployment
linting, testing
Github Push/Pull Actions + CI/CD Building
Intergrating Other DB Backends
Quantization

References

1 A Comprehensive Guide for Building RAG-based LLM Applications (Part 1). Any Scale Blog.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.devcontainer		.devcontainer
.github		.github
docker/llm_agent		docker/llm_agent
llm_agent		llm_agent
scripts		scripts
yamls		yamls
.gitignore		.gitignore
Makefile		Makefile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RagRayAgent

Overview

Configurations

Makefile Commands for Project Setup

Finetuning

Serving

TODO

References

About

Releases

Packages

Languages

codeamt/RagRayAgent

Folders and files

Latest commit

History

Repository files navigation

RagRayAgent

Overview

Configurations

Makefile Commands for Project Setup

Finetuning

Serving

TODO

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages