Negative perplexity values #1595

shikhar-srivastava · 2024-03-17T20:48:57Z

Hey folks,

So, the perplexity values on a per sample/doc basis are ALL negative.
Can someone explain why this is ?

This is using the --log-samples option.

Command:

lm_eval --model hf  
 --model_args pretrained=EleutherAI/pythia-160m,revision=step100000,dtype="float"  
 --tasks lambada_openai     --device cuda:0     --batch_size 32 
 --log_samples 
 --output_path test_nlu_eval/

Sample output:

{
    "doc_id": 0,
    "doc": {
      "text": "In my palm is a clear stone, and inside it is a small ivory statuette. A guardian angel.\n\n\"Figured if you're going to be out at night getting hit by cars, you might as well have some backup.\"\n\nI look at him, feeling stunned. Like this is some sort of sign. But as I stare at Harlin, his mouth curved in a confident grin, I don't care about signs"
    },
    "target": " signs",
    "arguments": [
      [
        "In my palm is a clear stone, and inside it is a small ivory statuette. A guardian angel.\n\n\"Figured if you're going to be out at night getting hit by cars, you might as well have some backup.\"\n\nI look at him, feeling stunned. Like this is some sort of sign. But as I stare at Harlin, his mouth curved in a confident grin, I don't care about",
        " signs"
      ]
    ],
    "resps": [
      [
        [
          -8.487064361572266,
          false
        ]
      ]
    ],
    "filtered_resps": [
      [
        -8.487064361572266,
        false
      ]
    ],
    "perplexity": -8.487064361572266,
    "acc": 0
  },

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2024-03-17T21:42:21Z

Hi! Sorry, been meaning to more clearly document the sample log formats and semantic meaning of per-sample metrics for e.g. perplexity.

Here the per-sample perplexity values are loglikelihoods of the target string for the document, and the list of these loglikelihoods goes through

lm-evaluation-harness/lm_eval/api/metrics.py

Lines 36 to 38 in dc90fec

 @register_aggregation("perplexity") 

 def perplexity(items): 

 return math.exp(-mean(items))

to turn it into the dataset-level perplexity. Sorry for the confusion!

shikhar-srivastava · 2024-03-17T23:49:56Z

Thanks for clarifying!
Another great thing to add would be a [predicted word] along with the [target word] to the per-sample outputs.

Any quick ways to get that too?

haileyschoelkopf · 2024-03-18T11:39:52Z

Ah, unfortunately not currently--though you can tell based on the 0/1 logged for acc on lambada whether the word was correctly predicted.

shikhar-srivastava · 2024-03-18T16:22:52Z

Ah, I see. Unfortunately, I need access to the predicted word for the per-sample output.
How would you suggest possibly doing that?

shikhar-srivastava changed the title ~~Negative Perplexity values~~ Negative perplexity values Mar 17, 2024

haileyschoelkopf added the asking questions For asking for clarification / support on library usage. label Mar 17, 2024

haileyschoelkopf self-assigned this Mar 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Negative perplexity values #1595

Negative perplexity values #1595

shikhar-srivastava commented Mar 17, 2024 •

edited

haileyschoelkopf commented Mar 17, 2024

shikhar-srivastava commented Mar 17, 2024 •

edited

haileyschoelkopf commented Mar 18, 2024

shikhar-srivastava commented Mar 18, 2024

Negative perplexity values #1595

Negative perplexity values #1595

Comments

shikhar-srivastava commented Mar 17, 2024 • edited

haileyschoelkopf commented Mar 17, 2024

shikhar-srivastava commented Mar 17, 2024 • edited

haileyschoelkopf commented Mar 18, 2024

shikhar-srivastava commented Mar 18, 2024

shikhar-srivastava commented Mar 17, 2024 •

edited

shikhar-srivastava commented Mar 17, 2024 •

edited