Getting Started

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Repository for our EMNLP 2023 paper "The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models"

Getting Started

Adjust prefix in unanswerability_env.yml to your Anaconda environment path.
Run these commands:

conda env create -f unanswerability_env.yml
python -m spacy download en_core_web_sm
conda activate unanswerability_env

Download Dataset

To download the dataset, run:

wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1q-6FIEGufKVBE3s6OdFoLWL2iHQPJh8h' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1q-6FIEGufKVBE3s6OdFoLWL2iHQPJh8h" -O data.zip && rm -rf /tmp/cookies.txt

or directly download the file from Google Drive

uzip:

unzip raw_data.zip

Prompt Manipulations and Beam Relaxation

Zero-shot Prompting

To perform the zero-shot prompt-manipulation experiment, run:

python zero_shot_prompting.py --models <MODELS> --datasets <DATASETS> --return-only-generated-text --outdir /path/to/outdir

<MODELS> - any one of 'Flan-UL2', 'Flan-T5-xxl', or 'OPT-IML' (can pass more than one).
<DATASETS> - any one of 'squad', 'NQ', or 'musique' (can pass more than one).
For prompt variants, add --prompt-variant <VARIANT_LIST>:
- <VARIANT_LIST> - any one of 'variant1', 'variant2', 'variant3' (can pass more than one).
  - Default - 'variant1'.
For development set experiments, add --devset.
Output: Saves two .pt files in the specified outdir, one for answerable and one for un-answerable prompts.
- Also saves the actual generated outputs in the subdir regular_decoding.

Few-shot Prompting

To perform the few-shot prompt-manipulation experiment, run:

python few_shot_prompting.py --models <MODELS> --datasets <DATASETS> --return-only-generated-text --outdir /path/to/outdir

<MODELS> and <DATASETS> are similar to those in Zero-shot Prompting.
Prompt variant can be changed like in Zero-shot Prompting.
For in-context-learning examples variants - add --icl-examples-variant <ICL_VARIANT_LIST>:
- <ICL_VARIANT_LIST> - any one of '1', '2', '3' (can pass more than one).

Beam Relaxation

For beam relaxation experiments, just add --k-beams <BEAM_SIZE> to the Zero-shot Prompting command.

Output: In addition to the subdir regular_decoding, an additional beam-relaxation subdir will be generated, with the beam-relaxed responses.

Evaluation

To evaluate the generated texts, run:

python -m evaluation.evaluate --indirs <INDIRS> --outdir /path/to/outdir

<INDIRS>: output directories from the prompting experiments.
output: save under outdir:
- QA-task-results.csv - results on the QA task for each prompt type (e.g., Regular-Prompt or Hint-Prompt).
- unanswerability_classification_results.xlsx - unanswerability classification results for each prompt type.
For results on development set, add --devset.

Probing Experiments

Preliminaries - Get Embeddings

Generate Test Set Embeddings: Run the Zero-shot Prompting experiments without the --return-only-generated-text parameter.
- This will also save the generations' embeddings (last hidden layer of first generated token) of the test set.
Generate Train Set Embeddings: In addition to step 1, also add --trainset.

to run steps 1 and 2 on the first hidden layer of the first generated token, add --return-first-layer.
Prompt variant can be changed like in Zero-shot Prompting.

Train Answerability Linear Classifiers

Run:

python train_linear_classifiers.py --indir <INDIR> --outdir /path/to/outdir --dataset <DATASET> --prompt-type <PROMPT_TYPE> --epochs 100 --batch-size 16 --num-instances 1000

<INDIR> - path to the directory with the saved embeddings (pt files) of the train set.
<DATASET> - any one of 'squad', 'NQ', 'musique'.
<PROMPT_TYPE> - 'Regular-Prompt' or 'Hint-Prompt'.
To train a classifier on the first hidden layer of the first generated token, add --embedding-type first_hidden_embedding.
output - save under "outdir//<EMBEDDING_TYPE>/<PROMPT_TYPE>/only_first_tkn/<MODEL_NAME>_1000N" the trained classifier.
- <EMBEDDING_TYPE> - 'first_hidden_embedding' or 'last_hidden_embedding'.
- <MODEL_NAME> - name of the model whose embeddings were used to train the classifier.

Evaluate Answerability Linear Classifiers

Run:

python evaluation/eval_linear_classifiers.py --indir <DATA_INDIR> --classifier-dir <CLASSIFIER_INDIR> --dataset <DATASET> --prompt-type <PROMPT_TYPE> --embedding-type <EMBEDDING_TYPE>

<DATA_INDIR> - path to directory with the test set saved embeddings (pt files).
<CLASSIFIER_INDIR> - path to the trained linear classifier.
<DATASET> - any one of 'squad', 'NQ', 'musique' (should represent the dataset of the test set).
<PROMPT_TYPE> - 'Regular-Prompt' or 'Hint-Prompt'.
<EMBEDDING_TYPE> - 'first_hidden_embedding' or 'last_hidden_embedding'.

Visualize Embedding Space

Run:

python figures_generation/PCA_plots_generation.py -i /path/to/folder/with/pt_files -o /path/to/outdir --prompt-type <PROMPT_TYPE>

<PROMPT_TYPE> - 'Regular-Prompt' or 'Hint-Prompt'.
output - The generated 3-D PCA plots of the embedding space will be saved under "/path/to/outdir/last_hidden_embedding/only_first_tkn/<PROMPT_TYPE>".

Answerability Subspace Erasure

Preliminaries

Set Up Environment - Create a separate Conda environment for this experiment:
- Adjust prefix in subspace_erasure.yml to your Anaconda environment path.
- Run these commands:

conda env create -f subspace_erasure.yml
conda activate subspace_erasure

Make sure you have the embeddings of the train set from Preliminaries - Get Embeddings.

Train Concept Eraser

Run:

python train_concept_eraser.py --indir <INDIR> --outdir /path/to/outdir --dataset <DATASET> --prompt-type <PROMPT_TYPE> --epochs 500 --batch-size 16 --num-instances 1000

<INDIR> - path to the directory with the embeddings (pt files) of the train set.
<DATASET> - any one of 'squad', 'NQ', 'musique'
<PROMPT_TYPE> - 'Regular-Prompt' or 'Hint-Prompt'
output - trained eraser will be under "/path/to/outdir//<PROMPT_TYPE>"

Prompting with Concept Erasure

Run:

python zero_shot_erasure_prompting.py --models <MODELS> --datasets <DATASETS> --outdir /path/to/outdir --eraser-dir /path/to/trained_eraser --only-first-decoding

<MODELS> and <DATASETS> are similar to those in Zero-shot Prompting.
Output: Saves two .pt files in the specified outdir, one for answerable and one for un-answerable prompts.
- Also saves the actual generated outputs in the subdir regular_decoding.
To evaluate the responses, follow the instructions under Evaluation.
To visualize the embeddings, follow the instructions under Visualize Embedding Space.

Citation

If you use this in your work, please cite:

@inproceedings{slobodkin-etal-2023-curious,
    title = "The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models",
    author = "Slobodkin, Aviv  and
      Goldman, Omer  and
      Caciularu, Avi  and
      Dagan, Ido  and
      Ravfogel, Shauli",
    editor = "Bouamor, Houda  and
      Pino, Juan  and
      Bali, Kalika",
    booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2023",
    address = "Singapore",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.emnlp-main.220",
    doi = "10.18653/v1/2023.emnlp-main.220",
    pages = "3607--3625",
    abstract = "Large language models (LLMs) have been shown to possess impressive capabilities, while also raising crucial concerns about the faithfulness of their responses. A primary issue arising in this context is the management of (un)answerable queries by LLMs, which often results in hallucinatory behavior due to overconfidence. In this paper, we explore the behavior of LLMs when presented with (un)answerable queries. We ask: do models \textit{represent} the fact that the question is (un)answerable when generating a hallucinatory answer? Our results show strong indications that such models encode the answerability of an input query, with the representation of the first decoded token often being a strong indicator. These findings shed new light on the spatial organization within the latent representations of LLMs, unveiling previously unexplored facets of these models. Moreover, they pave the way for the development of improved decoding techniques with better adherence to factual generation, particularly in scenarios where query (un)answerability is a concern.",
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Getting Started

Download Dataset

Prompt Manipulations and Beam Relaxation

Zero-shot Prompting

Few-shot Prompting

Beam Relaxation

Evaluation

Probing Experiments

Preliminaries - Get Embeddings

Train Answerability Linear Classifiers

Evaluate Answerability Linear Classifiers

Visualize Embedding Space

Answerability Subspace Erasure

Preliminaries

Train Concept Eraser

Prompting with Concept Erasure

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
evaluation		evaluation
figures_generation		figures_generation
post_processing		post_processing
prompts		prompts
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
few_shot_prompting.py		few_shot_prompting.py
subspace_erasure.yml		subspace_erasure.yml
train_concept_eraser.py		train_concept_eraser.py
train_linear_classifiers.py		train_linear_classifiers.py
unanswerability_env.yml		unanswerability_env.yml
utils.py		utils.py
zero_shot_erasure_prompting.py		zero_shot_erasure_prompting.py
zero_shot_prompting.py		zero_shot_prompting.py

License

lovodkin93/unanswerability

Folders and files

Latest commit

History

Repository files navigation

The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models

Getting Started

Download Dataset

Prompt Manipulations and Beam Relaxation

Zero-shot Prompting

Few-shot Prompting

Beam Relaxation

Evaluation

Probing Experiments

Preliminaries - Get Embeddings

Train Answerability Linear Classifiers

Evaluate Answerability Linear Classifiers

Visualize Embedding Space

Answerability Subspace Erasure

Preliminaries

Train Concept Eraser

Prompting with Concept Erasure

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages