Skip to content

Latest commit





RAG Experiment Accelerator with Prompt Flow

Flow description

The RAG Experiment Accelerator is a versatile tool that helps you conduct experiments and evaluations using Azure AI Search and RAG pattern.

The top-level flow.dag.yaml runs the rag experiment end-to-end based on the configuration provided in config.json.


Set Up

The setup node runs first and loads the required environment variables from a custom connection


The index node will:

  • Create indexes based on the parameters set in config.json. Each index name will be in the following format: {name_prefix}-{chunk_size}-{overlap}-{dimension}-{ef_construction}-{ef_search}
  • Chunk documents based on the chunking parameters in config.json
  • Generate a summary and title for each chunk
  • Create embeddings for each chunk's content, generated title, and generated summary
  • Uploads the embeddings to Azure AI search Service

If the indexes have been previously created, this node is optional and can be skipped by setting the input should_index to False.

QA Generation

The qa_generation node will chunk each document and generate ground truth questions and answers for each chunk.

Optionally, this node can be skipped by setting the input should_generate_qa to false and a set of user-provided ground truth questions and answers can be used. User-provided questions and answers should be in the jsonl file format and by default, in the location ./artifacts/eval_data.jsonl. This location can be configured by updating the eval_data_jsonl_file_path value in config.json. Each line of the jsonl file should contain the keys:

  • user_prompt field contains the generated question
  • output_prompt field contains the generated answer
  • context field contains the document sections from which the question-answer pair was generated


The querying node takes the user_prompt's that were generated from the qa_generation node and searches Azure AI Search for using the search_types specified in config.json.

For each user_prompt and search_type:

  • If the user_prompt is complex, it is broken down into multiple prompts and both prompts are used in Azure AI Search
  • The search results are optionally reranked based on the rerank setting in config.json
  • Search result metrics are calculated
  • The content from the search results are added as context to the user_prompt and the LLM is called.
  • The responses from the LLM are uploaded as a data asset and used by evaluation node.


The evaluation node takes the results generated by the querying node and logs the metrics specified in config.json to mlflow. The metrics and configuration parameters can be inspected and compared to past experiments your ML workspace by selecting the Jobs tab under Assets and clicking on the latest experiment run.


Getting Started


To run the RAG Experiment Accelerator end-to-end in VSCode, follow these steps:

  1. Ensure you have installed the promptflow extension and pip installed the promptflow and promptflow-tools packages
  2. Run: pip install ./custom_environment/rag_experiment_accelerator-0.9-py3-none-any.whl
  3. Create a custom connection. See env_setup.
  4. Add your own documents to the ./data folder. (a set of sample documents are provided for testing purposes)
  5. Modify the config.json file with the hyperparameters for your experiment. Full documentation on the configuration elements can be found here
  6. Run the flow from the extension UI or from the CLI by running pf flow test --flow ./flow.dag.yaml
  7. Inspect the results in your ML workspace by selecting the Jobs tab under Assets. Click on the latest experiment run to view the metrics and results.

ML Workspace

To run the RAG Experiment Accelerator end-to-end in Prompt Flow, follow these steps:

  1. Create a custom environment using the provided Dockerfile (this will take several minutes)
az login

az account set --subscription <subscription ID>

az extension add --name ml

az configure --defaults workspace=$MLWorkSpaceName group=$ResourceGroupName

cd ./custom_environment 

az ml environment create --file ./environment.yaml -w $MLWorkSpaceName
  1. Create a custom runtime using the newly created environment. See Create runtime in UI.
  2. Modify the config.json file with the hyperparameters for your experiment. Full documentation can be found here
  3. Create a custom connection. See env_setup.
  4. Add your own documents to the ./data folder. (a set of sample documents are provided for testing purposes)
  5. Modify the config.json file with the hyperparameters for your experiment. Full documentation on can be found here
  6. Upload the flow to the ML workspace how to upload a local flow
  7. Ensure you have also uploaded config.json and optionally prompt_config.json to the ML workspace.
  8. Select the custom runtime in Prompt Flow
  9. Click run in the UI
  10. Inspect the results in your ML workspace by selecting the Jobs tab under Assets. Click on the latest experiment run to view the metrics and results.

Description of configuration elements

    "name_prefix": "Name of experiment, search index name used for tracking and comparing jobs",
    "chunking": {
        "chunk_size": "Size of each chunk e.g. [500, 1000, 2000]" ,
        "overlap_size": "Overlap Size for each chunk e.g. [100, 200, 300]" 
    "embedding_dimension" : "embedding size for each chunk e.g. [384, 1024]. Valid values are 384, 768,1024" ,
    "ef_construction" : "ef_construction value determines the value of Azure AI Search vector configuration." ,
    "ef_search":  "ef_search value determines the value of Azure AI Search vector configuration.",
    "language": {
        "analyzer_name" : "name of the analyzer to use for the field. This option can be used only with searchable fields and it can't be set together with either searchAnalyzer or indexAnalyzer.",
        "index_analyzer_name" : "name of the analyzer used at indexing time for the field. This option can be used only with searchable fields. It must be set together with searchAnalyzer and it cannot be set together with the analyzer option.",
        "search_analyzer_name" : "name of the analyzer used at search time for the field. This option can be used only with searchable fields. It must be set together with indexAnalyzer and it cannot be set together with the analyzer option. This property cannot be set to the name of a language analyzer; use the analyzer property instead if you need a language analyzer.",
    "rerank": "determines if search results should be re-ranked. Value values are TRUE or FALSE" ,
    "rerank_type": "determines the type of re-ranking. Value values are llm or crossencoder", 
    "llm_re_rank_threshold": "determines the threshold when using llm re-ranking. Chunks with rank above this number are selected in range from 1 - 10." ,
    "cross_encoder_at_k": "determines the threshold when using cross-encoding re-ranking. Chunks with given rank value are selected." ,
    "crossencoder_model" :"determines the model used for cross-encoding re-ranking step. Valid value is cross-encoder/stsb-roberta-base",
    "search_types" : "determines the search types used for experimentation. Valid value are search_for_match_semantic, search_for_match_Hybrid_multi, search_for_match_Hybrid_cross, search_for_match_text, search_for_match_pure_vector, search_for_match_pure_vector_multi, search_for_match_pure_vector_cross, search_for_manual_hybrid. e.g. ['search_for_manual_hybrid', 'search_for_match_Hybrid_multi','search_for_match_semantic' ]",
    "retrieve_num_of_documents": "determines the number of chunks to retrieve from the search index",
    "metric_types" : "determines the metrics used for evaluation purpose. Valid value are lcsstr, lcsseq, cosine, jaro_winkler, hamming, jaccard, levenshtein, fuzzy, bert_all_MiniLM_L6_v2, bert_base_nli_mean_tokens, bert_large_nli_mean_tokens, bert_large_nli_stsb_mean_tokens, bert_distilbert_base_nli_stsb_mean_tokens, bert_paraphrase_multilingual_MiniLM_L12_v2 llm_context_precision, llm_answer_relevance. e.g ['fuzzy','bert_all_MiniLM_L6_v2','cosine','bert_distilbert_base_nli_stsb_mean_tokens']",
    "azure_oai_chat_deployment_name":  "determines the Azure OpenAI chat deployment name",
    "azure_oai_eval_deployment_name":  "determines the Azure OpenAI evaluation deployment name",
    "embedding_model_name": "embedding model name",
    "openai_temperature": "determines the OpenAI temperature. Valid value ranges from 0 to 1.",
    "search_relevancy_threshold": "the similarity threshold to determine if a doc is relevant. Valid ranges are from 0.0 to 1.0",
    "eval_data_jsonl_file_path": "the file path of the ground truth questions and answers. This must be a jsonl file and each line should contain the keys: user_prompt (question), output_prompt (answer), context (the document context that contains the answer)"

Folder structure

The top-level flow.dag.yaml runs the RAG experiments end-to-end and each step can be run independently.

  • data - the directory for the documents. A set of sample documents are provided for testing
  • custom_environment - contains the Dockerfile, environment.yaml and the rag-experiment-accelerator .whl file. Building an image is necessary when running in Prompt Flow from the ML workspace.
  • images - contains the images used in this


  • setup - (sets the necessary environment variables)
  • index - (contains the index flow)
  • qa_generation- (contains the qa_generation flow)
  • querying- (contains the querying flow)
  • evaluation - (contains the evaluation flow)

Each step can also be run independently and the flow is contained in its corresponding folder. When running the flows independently, an initial setup will run to ensure the proper environment variables are set.