Skip to content

codeamt/RagRayAgent

Repository files navigation

RagRayAgent

Overview

This repository does the following:

  1. Fine tuning an LLM
  2. Populate a vector database with and embedding model, so able to query your context similarty in the vector database
  3. Fine tune with Ray framework
  4. Use CPU and GPU for fine tuning and serving
  5. Serve your fine tuned LLM as REST API.

Configurations

Please set the API keys accordingly save the content below in llm_agent/.env.

OPENAI_API_KEY=
ANYSCALE_API_KEY=
OPENAI_API_BASE="https://api.endpoints.anyscale.com/v1"
ANYSCALE_API_BASE="https://api.endpoints.anyscale.com/v1"
DB_CONNECTION_STRING="postgresql:https://testUser:testPassword@localhost:15432/testDB"
EMBEDDING_INDEX_DIR=/tmp/embedding_index_sql
VECTOR_TABLE_NAME=document
VECTOR_TABLE_DUMP_OUTPUT_PATH=/tmp/vector.document.dump.sql
RAYDOCS_ROOT=/tmp/raydocs
NUM_CPUS=14
NUM_GPUS=1
NUM_CHUNKS=5
CHUNK_SIZE=500
CHUNK_OVERLAP=50
EMBEDDING_MODEL_NAME="thenlper/gte-base"
LLM_MODEL_NAME=meta-llama/Llama-2-70b-chat-hf

#How much data should be fed for fine tuning
#give a floating number between >0.001 and 1 (1 included, which means use all the data for fine tuning)
USE_THIS_PORTION_OF_DATA=0.05

Makefile Commands for Project Setup

make scrape # Scrap the web page

make vectordb # Configure Postgres Vector DB

make postgres-client # Install Postgres Client

Then, in a seperate terminal
make port-forward-postgres # Port Forward DB

make vector-support # Enable Vector Support

make vector-table # Create Vector Table

make embedding-table # Get Vector Table

# result:
               List of relations
 Schema |      Name       |   Type   |  Owner
--------+-----------------+----------+----------
 public | document        | table    | testUser
 public | document_id_seq | sequence | testUser
(2 rows)

make pods-preview # Get Pods

make install-pip-deps # Install Pip Dependencies

Finetuning

Once Setup, the following commands enable finetuning on a ray cluster:

make ray-cluster # Start Ray Cluster

make profile-ray-cluster # Profile Cluster

make finetune # Finetune LLM

At the end you will see something like below:

The default batch size for map_batches is rollout_fragment_length * num_envs.

which indicates that LLM fine tuning is done, vector db is populated, and a query is sent to LLM with the context identified by your vector DB.

Note: My machine has 16 CPUs and 1 GPU, so I set up NUM_CPUS and NUM_GPUs accordingly. These numbers may differ according to your machine. The principle here is that you can not set up a number larger than existing resources (CPU and GPU).

Pleae note that we are using thenlper/gte-base as an embedding model, this is a relatively small model, you might like to change it. LLM_MODEL_NAME is to meta-llama/Llama-2-70b-chat-hf, which is good for this setup, but again you might like to change it.

Serving

make dev-deploy

make test-query

Should yield:

b'"{\\"question\\": \\"What is the default batch size for map_batches?\\", \\"sources\\": [\\"https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-rollout-workers\\", \\"https://docs.ray.io/en/master/rllib/rllib-training.html#specifying-rollout-workers\\", \\"https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.policy.Policy.compute_log_likelihoods.html#ray-rllib-policy-policy-policy-compute-log-likelihoods\\", \\"https://docs.ray.io/en/master/rllib/package_ref/doc/ray.rllib.policy.policy.Policy.compute_log_likelihoods.html#ray-rllib-policy-policy-policy-compute-log-likelihoods\\", \\"https://docs.ray.io/en/master/rllib/rllib-algorithms.html#importance-weighted-actor-learner-architecture-impala\\"], \\"answer\\": \\" The default batch size for map_batches is rollout_fragment_length * num_envs.\\", \\"llm\\": \\"meta-llama/Llama-2-70b-chat-hf\\"}"'

TODO

  • Spot Instance/Fleet Provisioning for Cost Effective Training
  • CUDA devcontainer configurations
  • Dockerfiles
  • Terraform Configuration for 3-Tier Cloud Deployment
  • linting, testing
  • Github Push/Pull Actions + CI/CD Building
  • Intergrating Other DB Backends
  • Quantization

References

1 A Comprehensive Guide for Building RAG-based LLM Applications (Part 1). Any Scale Blog.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published