GPT-4-ENEM

*** Most of the code in this repository was copied from the original Language Model Evaluation Harness. ***

Introduction

This repository contains code and data used in the paper "Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams". The study explores the capabilities of Language Models (LMs) in solving high-stakes multiple-choice tests, using the Exame Nacional do Ensino Médio (ENEM) as a case study. The ENEM is a multidisciplinary entrance examination widely adopted by Brazilian universities, which poses challenging tasks for LMs since its questions may span multiple fields of knowledge, requiring understanding of information from diverse domains.

The paper analyzes responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations to answers.

On the 2022 edition, the best-performing model, GPT-4 with CoT, achieved an accuracy of 87%, largely surpassing GPT-3.5 by 11 points.

Area	code-davinci-002			gpt-3.5-turbo			gpt-4
	zero-shot	three-shot	three-shot with CoT	zero-shot	three-shot	three-shot with CoT	zero-shot	three-shot	three-shot with CoT
Languages and Codes	78.79	87.88	72.73	75.76	81.82	69.70	84.85	87.88	87.88
Human Sciences	89.19	94.59	91.89	91.89	89.19	94.59	94.59	94.59	94.59
Natural Sciences	69.23	61.54	53.85	73.08	84.62	65.38	84.62	76.92	88.46
Mathematics	18.18	27.27	50.00	18.18	36.36	54.55	40.91	50.00	72.73
Total	68.64	72.88	70.34	69.49	76.27	73.73	79.66	80.51	87.29

Results on ENEM 2022. Questions that require image comprehension were removed.

We make available all explanations, targets and predictions generated for all experiments with the ENEM 2022 dataset in the reports folder.

Data

We made available the ENEM-2022 dataset, created by parsing questions and alternatives from the latest edition of the ENEM test. The dataset was structured and annotated with tags indicating the domain:

TU - Text Understanding
IU - Image Understanding
MR - Mathematical Reasoning
CE - Chemical Elements
ML - Multilanguage

Reproducing the results

To reproduce the experiments described in the paper, please follow the steps below:

1. Clone the repository:

git clone https://github.com/piresramon/gpt-4-enem.git

2. Install the required packages:

pip install -e .

3. Set the OPENAI API key:

Visit openai to retrieve API keys and insert into your the env variable.

OPENAI_API_SECRET_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

4. Run the experiments:

# running 3-shot with CoT for chatgpt
python main.py \
    --model chatgpt \
    --model_args engine=gpt-3.5-turbo-0301 \
    --task enem_cot_2022 \
    --num_fewshot 3 \
    --description_dict_path description.json

# running 3-shot with CoT for gpt-4
python main.py \
    --model chatgpt \
    --model_args engine=gpt-4-0314 \
    --task enem_cot_2022 \
    --num_fewshot 3 \
    --description_dict_path description.json

We have four different tasks:

enem: Enem Challenge (2009-2017) without Chain-of-thought prompting.
enem_cot: Enem Challenge (2009-2017) with Chain-of-thought prompting.
enem_2022: Enem 2022 without Chain-of-thought prompting.
enem_cot_2022: Enem 2022 with Chain-of-thought prompting.

It is possible to use a different number of few-shot examples (maximum 3).

Citation

If you use this code or data in your research, please cite the following paper:

@misc{nunes2023evaluating,
      title={Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams}, 
      author={Desnes Nunes and Ricardo Primi and Ramon Pires and Roberto Lotufo and Rodrigo Nogueira},
      year={2023},
      eprint={2303.17003},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data/enem		data/enem
docs		docs
lm_eval		lm_eval
reports		reports
scripts		scripts
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
description.json		description.json
ignore.txt		ignore.txt
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPT-4-ENEM

Introduction

Data

Reproducing the results

1. Clone the repository:

2. Install the required packages:

3. Set the OPENAI API key:

4. Run the experiments:

Citation

About

Releases

Packages

Languages

License

finardi/gpt-4-enem

Folders and files

Latest commit

History

Repository files navigation

GPT-4-ENEM

Introduction

Data

Reproducing the results

1. Clone the repository:

2. Install the required packages:

3. Set the OPENAI API key:

4. Run the experiments:

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages