update how to reproduce

Luehang · Aug 30, 2023 · 6e86dfa · 6e86dfa
1 parent 53999f1
commit 6e86dfa
Show file tree

Hide file tree

Showing 2 changed files with 66 additions and 1 deletion.
diff --git a/WizardCoder/README.md b/WizardCoder/README.md
@@ -255,6 +255,71 @@ python process_humaneval.py --path ${output_path} --out_path ${output_path}.json
 evaluate_functional_correctness ${output_path}.jsonl
 ```
 
+### How to Reproduce the 73.2 Pass@1 on HumanEval with Greedy Decoding?
+
+- Step 1: Setup the environment
+```bash
+conda create -n eval python=3.10
+
+conda activate eval
+
+conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
+```
+
+- Step 2: Install the packages
+```
+transformers==4.31.0
+numpy
+fire
+sentencepiece
+deepspeed==0.10.0
+accelerate
+vllm==0.1.4
+pandas
+ray
+pyarrow
+```
+
+- Step 3: Install Human-Eval from OpenAI
+```bash
+git clone https://github.com/openai/human-eval.git
+pip install -e human-eval
+```
+uncomment the execution call in `human-eval/human_eval/execution.py`
+
+- Step 4: Generate Answer
+
+use the code `WizardLM/blob/main/WizardCoder/src/humaneval_gen_vllm.py` to generate answer.
+```bash
+model="WizardLM/WizardCoder-Python-34B-V1.0"
+temp=0.0
+max_len=2048
+pred_num=1
+num_seqs_per_iter=1
+
+output_path=preds/T${temp}_N${pred_num}
+
+mkdir -p ${output_path}
+echo 'Output path: '$output_path
+echo 'Model to eval: '$model
+
+CUDA_VISIBLE_DEVICES=0,1,2,3 python humaneval_gen_vllm.py --model ${model} \
+ --start_index 0 --end_index 164 --temperature ${temp} \
+ --num_seqs_per_iter ${num_seqs_per_iter} --N ${pred_num} --max_len ${max_len} --output_path ${output_path} --num_gpus 4
+```
+
+- Step 5: Get the score
+use the code `WizardLM/blob/main/WizardCoder/src/process_humaneval.py` to get the score.
+```bash
+output_path=preds/T0.0_N1
+
+echo 'Output path: '$output_path
+python process_humaneval.py --path ${output_path} --out_path ${output_path}.jsonl --add_prompt
+
+evaluate_functional_correctness ${output_path}.jsonl
+```
+
+
 ### How to Reproduce the 59.8 Pass@1 on HumanEval with Greedy Decoding?
 
 ❗❗❗**This performance is 100% reproducible!**

diff --git a/WizardCoder/src/humaneval_gen_vllm.py b/WizardCoder/src/humaneval_gen_vllm.py
@@ -62,7 +62,7 @@ def main():
  num_samples = len(prompts)
  print("Number of samples: {}".format(num_samples))
 
- llm = LLM(base_model=args.model, tensor_parallel_size=args.num_gpus)
+ llm = LLM(model=args.model, tensor_parallel_size=args.num_gpus)
  sampling_params = SamplingParams(temperature=args.temperature, top_p=1, max_tokens=args.max_len)
 
  print(f"Loaded {args.model}.")