Skip to content

inseq-team/inseq

Repository files navigation

Intepretability for Sequence-to-sequence Models 🔍


Build status Python Version Code style: black License

Example usage

This example uses the Integrated Gradients attribution method to attribute the English-French translation of a sentence taken from the WinoMT corpus:

import inseq

model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "integrated_gradients")
out = model.attribute(
  "The developer argued with the designer because her idea cannot be implemented.",
  return_convergence_delta=True,
  n_steps=100
)
out.show()

WinoMT Attribution Map

Dealing with minimal pairs

Inseq support minimal pair analysis via the PairAggregator component. Here is an example of using PairAggregator to produce a heatmap containing the score difference between two FeatureAttributionSequenceOutput objects:

import inseq
from inseq.data.aggregator import AggregatorPipeline, ContiguousSpanAggregator, SequenceAttributionAggregator, PairAggregator

# Load the EN-FR translation model and attach the IG method
model = inseq.load_model("Helsinki-NLP/opus-mt-en-fr", "integrated_gradients")

# Perform the attribution with forced decoding. Return convergence deltas, probabilities and target attributions.
out = model.attribute(
    [
        "The manager told the hairdresser that the haircut he made her was terrible.",
        "The manager told the hairdresser that the haircut he made her was terrible.",
    ],
    [
        "Le gérant a dit au coiffeur que la coupe de cheveux qu'il lui a faite était terrible.",
        "La gérante a dit au coiffeur que la coupe de cheveux qu'il lui a faite était terrible.",
    ],
    n_steps=300,
    return_convergence_delta=True,
    attribute_target=True,
    output_step_probabilities=True,
    internal_batch_size=100,
    include_eos_baseline=False,
)

# Aggregation pipeline composed by two steps:
# 1. Aggregate contiguous tokens across all attribution dimensions
# 2. Aggregate the last dimension of the neuron-level attribution to make it token-level
squeezesum = AggregatorPipeline([ContiguousSpanAggregator, SequenceAttributionAggregator])

# Simply aggregate over the last dimension for the masculine variant
masculine = out.sequence_attributions[0].aggregate(aggregator=SequenceAttributionAggregator)

# For the feminine variant, we also use the contiguous span aggregator to merge "▁gérant" "e"
# in a single token to match masc shape
feminine = out.sequence_attributions[1].aggregate(aggregator=squeezesum, target_spans=(1, 3))

# Take the diff of the scores of the two attribution and show it
masculine.show(aggregator=PairAggregator, paired_attr=feminine)

Minimal Pair Diff Attribution Map