Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about normalized MSE #8

Closed
lukaemon opened this issue Jun 25, 2024 · 2 comments
Closed

question about normalized MSE #8

lukaemon opened this issue Jun 25, 2024 · 2 comments

Comments

@lukaemon
Copy link

In paper 2.1:

We report a normalized version of all MSE numbers, where we divide by a baseline reconstruction error of always predicting the mean activations.

In readme example:

normalized_mse = (reconstructed_activations - input_tensor).pow(2).sum(dim=1) / (input_tensor).pow(2).sum(dim=1)

Which is the same as in loss.py:

def normalized_mean_squared_error(
    reconstruction: torch.Tensor,
    original_input: torch.Tensor,
) -> torch.Tensor:
    """
    :param reconstruction: output of Autoencoder.decode (shape: [batch, n_inputs])
    :param original_input: input of Autoencoder.encode (shape: [batch, n_inputs])
    :return: normalized mean squared error (shape: [1])
    """
    return (
        ((reconstruction - original_input) ** 2).mean(dim=1) / (original_input**2).mean(dim=1)
    ).mean()

The way I understand normalized MSE and divide by baseline reconstruction error of always predicting the mean activations is

mean_activations = input_tensor.mean(dim=1)
baseline_mse = (input_tensor - mean_activations).pow(2).mean()
actual_mse = (reconstructed_activations - input_tensor).pow(2).mean()
normalized_mse = actual_mse / baseline_mse

What did I miss? Did I misunderstand the paper or code? Thx for your time.

@TomDLT
Copy link
Contributor

TomDLT commented Jul 9, 2024

Thanks for reporting this discrepancy.

The version used in the paper is:

# computed only once before training on a fixed set of activations
mean_activations = original_input.mean(dim=0)  # averaging over the batch dimension
baseline_mse = (original_input - mean_activations).pow(2).mean()

# computed on each batch during training and testing
actual_mse = (reconstruction - original_input).pow(2).mean()
normalized_mse = actual_mse / baseline_mse

@lukaemon
Copy link
Author

lukaemon commented Jul 9, 2024

Got it. It matches the code in train.py Thanks for clarification.

@lukaemon lukaemon closed this as completed Jul 9, 2024
lukaemon added a commit to lukaemon/mission-sae that referenced this issue Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants