Medical Chat Performance Evaluation

The open-source performance evaluation for evaluating the Medical Chat model.

Results

USMLE Sample Exam

The Medical Chat model demonstrates an exceptional accuracy performance, achieving a remarkable accuracy rate of 98.1%(637/649) on the United States Medical Licensing Sample Exam(USMLE).

As far as our knowledge extends, this represents the highest level of performance among question-answering systems evaluated on the USMLE sample exam. The accompanying graph provides a visual representation of how Medical Chat compares to other publicly available models.

The 2022 USMLE sample benchmark served as the initial assessment platform for evaluating the medical question-answering proficiency of ChatGPT 1. Performance metrics for other systems, namely OpenEvidence 2, GPT4 3, and Claude 2 4, were derived from their respective publications and reports.

MedQA US Samples Exam

MedQA serves as a benchmark akin to the USMLE sample exam, encompassing a dataset curated from various medical board examinations. This dataset comprises multiple-choice questions designed to assess proficiency in subjects such as Internal Medicine, Pediatrics, Psychiatry, and Surgery, among others. The evaluation of Medical Chat was conducted on MedQA's 4-option English test set, encompassing a total of 1,273 questions.

Medical Chat also demonstrated the highest performance, achieving an accuracy rate of 97.8%. This outcome places Medical Chat in first position on the Official Leaderboard, surpassing Google's Med-PaLM 2 and Google's Flan-PaLM (67.6%). The results from the MedQA evaluation assert that Medical Chat stands out as the most accurate medical question-answering system available for public use."

How to reproduce the evaluation results

This repository provides the procedures for replicating the performance evaluation results mentioned earlier.

1. Clone the repo

git clone https://github.com/chat-data-llc/medical_chat_performance_evaluation.git

2. Install Dependencies

Open a terminal in the root directory of your local Medical Chat Performance Evaluation repository and run:

npm install

3. Set Up the Environment Variables

The Medical Chat model can be called programmatically only through creating a chatbot in the Chat Data platform. Obtain the following two environment variables:

API_KEY: After purchasing the Entry plan, users can generate an API key under your account.

CHATBOT_ID:Users can create a chatbot based on the medical-chat-human model in the my-chatbots page.

The environment variables can be specified within the .env file located at the root directory of the repository.

4. Set Up the Base Prompt

To ensure the Medical Chat chatbot performs well, use the following base prompt:

You are a  medical assistant hosted on the Medical Chat website that accurately answers medical queries. The questions are 4 option questions. Choose the right option as the answer and then give explaination for your choice.

For example:
The correct choice is (A)
Explaination:
{Your reasoning}

It's recommended to use the same base prompt.

5. Run the evaluate script

In the root directory of your local Medical Chat Performance Evaluation repository, run:

npm run evaluate

The evaluating script should automatically retrieve the questions in the test_database folder and generate the output results in the output_results folder. The entire process should take several hours.

5. Count the Correct Results

The outputted results lack the Correctness column. Users need to manually compare the output from the Medical Chat model with the correct answers to calculate the percentage of correct results generated by the Medical Chat model. Our outputted results are available in the output_results folder. Users are free to delete the files in this folder as needed.

Contact

Chat Data LLC

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
output_results		output_results
public		public
scripts		scripts
test_datasets		test_datasets
.env		.env
.gitignore		.gitignore
README.md		README.md
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Chat Performance Evaluation

Results

USMLE Sample Exam

MedQA US Samples Exam

How to reproduce the evaluation results

1. Clone the repo

2. Install Dependencies

3. Set Up the Environment Variables

4. Set Up the Base Prompt

5. Run the evaluate script

5. Count the Correct Results

Contact

About

Releases

Packages

Languages

antonpolishko/medical_chat_performance_evaluation

Folders and files

Latest commit

History

Repository files navigation

Medical Chat Performance Evaluation

Results

USMLE Sample Exam

MedQA US Samples Exam

How to reproduce the evaluation results

1. Clone the repo

2. Install Dependencies

3. Set Up the Environment Variables

4. Set Up the Base Prompt

5. Run the evaluate script

5. Count the Correct Results

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages