The RAG Experiment Accelerator is a versatile tool that helps you conduct experiments and evaluations using Azure AI Search and RAG pattern.
The top-level flow.dag.yaml
runs the rag experiment end-to-end based on the configuration provided in config.json
.
The setup
node runs first and loads the required environment variables from a custom connection
The index
node will:
- Create indexes based on the parameters set in
config.json
. Each index name will be in the following format:{name_prefix}-{chunk_size}-{overlap}-{dimension}-{ef_construction}-{ef_search}
- Chunk documents based on the chunking parameters in
config.json
- Generate a summary and title for each chunk
- Create embeddings for each chunk's content, generated title, and generated summary
- Uploads the embeddings to Azure AI search Service
If the indexes have been previously created, this node is optional and can be skipped by setting the input should_index
to False
.
The qa_generation
node will chunk each document and generate ground truth questions and answers for each chunk.
Optionally, this node can be skipped by setting the input should_generate_qa
to false
and a set of user-provided ground truth questions and answers can be used. User-provided questions and answers should be in the jsonl
file format and by default, in the location ./artifacts/eval_data.jsonl
. This location can be configured by updating the eval_data_jsonl_file_path
value in config.json
. Each line of the jsonl
file should contain the keys:
user_prompt
field contains the generated questionoutput_prompt
field contains the generated answercontext
field contains the document sections from which the question-answer pair was generated
The querying
node takes the user_prompt
's that were generated from the qa_generation
node and searches Azure AI Search for using the search_types
specified in config.json
.
For each user_prompt
and search_type
:
- If the
user_prompt
is complex, it is broken down into multiple prompts and both prompts are used in Azure AI Search - The search results are optionally reranked based on the
rerank
setting inconfig.json
- Search result metrics are calculated
- The content from the search results are added as context to the
user_prompt
and the LLM is called. - The responses from the LLM are uploaded as a data asset and used by evaluation node.
The evaluation
node takes the results generated by the querying
node and logs the metrics specified in config.json
to mlflow. The metrics and configuration parameters can be inspected and compared to past experiments your ML workspace by selecting the Jobs
tab under Assets
and clicking on the latest experiment run.
- Azure AI Search Service (Note: Semantic Search is available in Azure AI Search Service, at Basic tier or higher.)
- Azure OpenAI Service
- Azure Machine Learning Resources
To run the RAG Experiment Accelerator end-to-end in VSCode, follow these steps:
- Ensure you have installed the promptflow extension and pip installed the promptflow and promptflow-tools packages
- Run:
pip install ./custom_environment/rag_experiment_accelerator-0.9-py3-none-any.whl
- Create a custom connection. See env_setup.
- Add your own documents to the
./data
folder. (a set of sample documents are provided for testing purposes) - Modify the
config.json
file with the hyperparameters for your experiment. Full documentation on the configuration elements can be found here - Run the flow from the extension UI or from the CLI by running
pf flow test --flow ./flow.dag.yaml
- Inspect the results in your ML workspace by selecting the
Jobs
tab underAssets
. Click on the latest experiment run to view the metrics and results.
To run the RAG Experiment Accelerator end-to-end in Prompt Flow, follow these steps:
- Create a custom environment using the provided Dockerfile (this will take several minutes)
az login
az account set --subscription <subscription ID>
az extension add --name ml
az configure --defaults workspace=$MLWorkSpaceName group=$ResourceGroupName
cd ./custom_environment
az ml environment create --file ./environment.yaml -w $MLWorkSpaceName
- Create a custom runtime using the newly created environment. See Create runtime in UI.
- Modify the
config.json
file with the hyperparameters for your experiment. Full documentation can be found here - Create a custom connection. See env_setup.
- Add your own documents to the
./data
folder. (a set of sample documents are provided for testing purposes) - Modify the
config.json
file with the hyperparameters for your experiment. Full documentation on can be found here - Upload the flow to the ML workspace
- Ensure you have also uploaded
config.json
and optionallyprompt_config.json
to the ML workspace. - Select the custom runtime in Prompt Flow
- Click run in the UI
- Inspect the results in your ML workspace by selecting the
Jobs
tab underAssets
. Click on the latest experiment run to view the metrics and results.
{
"name_prefix": "Name of experiment, search index name used for tracking and comparing jobs",
"chunking": {
"chunk_size": "Size of each chunk e.g. [500, 1000, 2000]" ,
"overlap_size": "Overlap Size for each chunk e.g. [100, 200, 300]"
},
"embedding_dimension" : "embedding size for each chunk e.g. [384, 1024]. Valid values are 384, 768,1024" ,
"ef_construction" : "ef_construction value determines the value of Azure AI Search vector configuration." ,
"ef_search": "ef_search value determines the value of Azure AI Search vector configuration.",
"language": {
"analyzer_name" : "name of the analyzer to use for the field. This option can be used only with searchable fields and it can't be set together with either searchAnalyzer or indexAnalyzer.",
"index_analyzer_name" : "name of the analyzer used at indexing time for the field. This option can be used only with searchable fields. It must be set together with searchAnalyzer and it cannot be set together with the analyzer option.",
"search_analyzer_name" : "name of the analyzer used at search time for the field. This option can be used only with searchable fields. It must be set together with indexAnalyzer and it cannot be set together with the analyzer option. This property cannot be set to the name of a language analyzer; use the analyzer property instead if you need a language analyzer.",
},
"rerank": "determines if search results should be re-ranked. Value values are TRUE or FALSE" ,
"rerank_type": "determines the type of re-ranking. Value values are llm or crossencoder",
"llm_re_rank_threshold": "determines the threshold when using llm re-ranking. Chunks with rank above this number are selected in range from 1 - 10." ,
"cross_encoder_at_k": "determines the threshold when using cross-encoding re-ranking. Chunks with given rank value are selected." ,
"crossencoder_model" :"determines the model used for cross-encoding re-ranking step. Valid value is cross-encoder/stsb-roberta-base",
"search_types" : "determines the search types used for experimentation. Valid value are search_for_match_semantic, search_for_match_Hybrid_multi, search_for_match_Hybrid_cross, search_for_match_text, search_for_match_pure_vector, search_for_match_pure_vector_multi, search_for_match_pure_vector_cross, search_for_manual_hybrid. e.g. ['search_for_manual_hybrid', 'search_for_match_Hybrid_multi','search_for_match_semantic' ]",
"retrieve_num_of_documents": "determines the number of chunks to retrieve from the search index",
"metric_types" : "determines the metrics used for evaluation purpose. Valid value are lcsstr, lcsseq, cosine, jaro_winkler, hamming, jaccard, levenshtein, fuzzy, bert_all_MiniLM_L6_v2, bert_base_nli_mean_tokens, bert_large_nli_mean_tokens, bert_large_nli_stsb_mean_tokens, bert_distilbert_base_nli_stsb_mean_tokens, bert_paraphrase_multilingual_MiniLM_L12_v2 llm_context_precision, llm_answer_relevance. e.g ['fuzzy','bert_all_MiniLM_L6_v2','cosine','bert_distilbert_base_nli_stsb_mean_tokens']",
"azure_oai_chat_deployment_name": "determines the Azure OpenAI chat deployment name",
"azure_oai_eval_deployment_name": "determines the Azure OpenAI evaluation deployment name",
"embedding_model_name": "embedding model name",
"openai_temperature": "determines the OpenAI temperature. Valid value ranges from 0 to 1.",
"search_relevancy_threshold": "the similarity threshold to determine if a doc is relevant. Valid ranges are from 0.0 to 1.0",
"eval_data_jsonl_file_path": "the file path of the ground truth questions and answers. This must be a jsonl file and each line should contain the keys: user_prompt (question), output_prompt (answer), context (the document context that contains the answer)"
}
The top-level flow.dag.yaml
runs the RAG experiments end-to-end and each step can be run independently.
data
- the directory for the documents. A set of sample documents are provided for testingcustom_environment
- contains theDockerfile
,environment.yaml
and therag-experiment-accelerator
.whl
file. Building an image is necessary when running in Prompt Flow from the ML workspace.images
- contains the images used in thisREADME.md
Flows:
setup
- (sets the necessary environment variables)index
- (contains theindex
flow)qa_generation
- (contains theqa_generation
flow)querying
- (contains thequerying
flow)evaluation
- (contains theevaluation
flow)
Each step can also be run independently and the flow is contained in its corresponding folder. When running the flows independently, an initial setup
will run to ensure the proper environment variables are set.