Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[alpha] Improvements to ModelWrapper and better QA/Classification implementation #8

Merged
merged 16 commits into from
Mar 10, 2023

Conversation

NISH1001
Copy link
Collaborator

@NISH1001 NISH1001 commented Mar 2, 2023

Major Changes

  • evalem.models.QuestionAnsweringHFPipelineWrapper and evalem.models.TextClassificationHFPipelineWrapper are now the main wrappers for QA and Text Classification tasks respectively.
    • These also have better parameter initialization, allowing any suitable models and tokenizers to be used along with device types.
    • hf_params dict is also provided as a parameter that will be used for initializing the HF pipeline
  • evalem.evaluators.TextClassificationEvaluator has been added with basics metrics for text classification (F1 score, precision, recall, confusion matrix)
  • evalem.models._base.ModelWrapper now utilizes 2 distinct processing parameters (one for pre-preocessing and one for post-processing) which should be Callable (lambda function, external modules that can be called, etc.)
    • inputs_preprocessor is used for working on input dataset and change inputs to the models.
      • If not provided, default ModelWrapper._preprocess_inputs method is used that can also be overridden by any downstream sub-class
    • predictions_postprocessor is used on model's predictions to do post processing on predictions.
      • If not provided, the default ModelWrapper._postprocess_predictions method is used that can also be overridden by any downstream sub-class

Minor Changes

  • evalem.models.DefaultQAModelWrapper has been deprecated. User will get DeprecationWarning error when trying to initialize the object
  • evalem.metrics.BertScore now uses bert-base-uncased as the default model instead of roberta-large.
  • evalem.misc.datasets.get_imdb function is added to load IMDB dataset out-of-box.

Usage

QA Task

Defaults

from evalem.evaluators import QAEvaluator
from evalem.models import QuestionAnsweringHFPipelineWrapper
from evalem.misc.datasets import get_squad_v2

from pprint import pprint

def run_pipeline(
    model: ModelWrapper,
    evaluators: Iterable[Type[Evaluator]],
    inputs,
    references
) -> Iterable[Mapping[str, dict]]:
    predictions = model(inputs)
#     return evaluators.metrics[1](predictions=predictions, references=references)
    evaluators = [evaluators] if not isinstance(evaluators, Iterable) else evaluators
    return list(map(lambda e: e(predictions=predictions, references=references), evaluators))

# initialize evaluator
evaluators = QAEvaluator()

# create model wrapper
wrapped_model = QuestionAnsweringHFPipelineWrapper()

# load data
data = get_squad_v2(nsamples=10)
inputs = data["inputs"]
references = data["references"]

results = run_pipeline(wrapped_model, evaluators, inputs, references)
pprint(results)

Using the custom model and post-processing functionality

from evalem.evaluators import QAEvaluator
from evalem.models import QuestionAnsweringHFPipelineWrapper
from evalem.misc.datasets import get_squad_v2

from pprint import pprint

def run_pipeline(
    model: ModelWrapper,
    evaluators: Iterable[Type[Evaluator]],
    inputs,
    references
) -> Iterable[Mapping[str, dict]]:
    predictions = model(inputs)
#     return evaluators.metrics[1](predictions=predictions, references=references)
    evaluators = [evaluators] if not isinstance(evaluators, Iterable) else evaluators
    return list(map(lambda e: e(predictions=predictions, references=references), evaluators))

# initialize evaluator
evaluators = QAEvaluator()

# create a model wrapper directly using HF's pipeline object
# along with external post processor 
wrapped_model = HFPipelineWrapper(
     pipeline("question-answering", model="deepset/roberta-base-squad2"),
     predictions_postprocessor=lambda xs: list(map(lambda x: x["answer"], xs))
 )

# load data
data = get_squad_v2(nsamples=10)
inputs = data["inputs"]
references = data["references"]

results = run_pipeline(wrapped_model, evaluators, inputs, references)
pprint(results)

Text Classification

defaults

from evalem.evaluators import TextClassificationEvaluator
from evalem.models import TextClassificationHFPipelineWrapper
from evalem.misc.datasets import get_imdb

from pprint import pprint

def run_pipeline(
    model: ModelWrapper,
    evaluators: Iterable[Type[Evaluator]],
    inputs,
    references
) -> Iterable[Mapping[str, dict]]:
    predictions = model(inputs)
#     return evaluators.metrics[1](predictions=predictions, references=references)
    evaluators = [evaluators] if not isinstance(evaluators, Iterable) else evaluators
    return list(map(lambda e: e(predictions=predictions, references=references), evaluators))

# initialize evaluator
evaluators = TextClassificationEvaluator()

# create a model wrapper
wrapped_model = TextClassificationHFPipelineWrapper(
    hf_params=dict(truncation=True)
)

# load data
data = get_imdb(nsamples=10)
inputs = data["inputs"]
references = data["references"]

results = run_pipeline(wrapped_model, evaluators, inputs, references)
pprint(results)

customized

from evalem.evaluators import TextClassificationEvaluator
from evalem.models import TextClassificationHFPipelineWrapper
from evalem.misc.datasets import get_imdb

from pprint import pprint

def run_pipeline(
    model: ModelWrapper,
    evaluators: Iterable[Type[Evaluator]],
    inputs,
    references
) -> Iterable[Mapping[str, dict]]:
    predictions = model(inputs)
#     return evaluators.metrics[1](predictions=predictions, references=references)
    evaluators = [evaluators] if not isinstance(evaluators, Iterable) else evaluators
    return list(map(lambda e: e(predictions=predictions, references=references), evaluators))

# initialize evaluator
evaluators = TextClassificationEvaluator()

# create a model wrapper
wrapped_model = HFPipelineWrapper(
    pipeline("text-classification", truncation=True),
    predictions_postprocessor=lambda xs: list(map(lambda x: x["label"], xs))
)

# load data
data = get_imdb(nsamples=10)
inputs = data["inputs"]
references = data["references"]

results = run_pipeline(wrapped_model, evaluators, inputs, references)
pprint(results)

See `evalem.misc.datasets`.
We have `datasets.get_squad_v2(...)` function.
Now we have `metrics.semantics.SemanticMetric`.
There are 2 implementation for now:
- `metrics.semantics.BertScore`
- `metrics.semantics.BartScore`
We make use of 2 kwargs to any model wrapper:
- `inputs_preprocessor` (maps inputs to a specific format, defaults to
  identity)
- `predictions_postprocessor` (maps model outputs to a specific format,
  defaults to identity)

Also `models.HFPipelineWrapperForQuestionAnswering` is created.
`models.DefaultQAModelWrapper` is deprecated.
See `models.defaults.TextClassificationHFPipelineWrapper`.
Also improve the concstruction of hf pipeline object in existing
wrapper.

`evaluators.basics.TextClassificationEvaluator` is also added.
This flag is used to return precision/recall/f1 score per prediction
instance.
TextClassificationHFPipelineWrapper

Previously, tokenizer was set to some defaults. However, that is
incorrect. We want tokenizer to be the one for which provided model was
trained on. So, now `tokenizer` is set to None by default.
evalem/metrics/semantics.py Outdated Show resolved Hide resolved
"""
nsamples = nsamples or 0
data = load_dataset("imdb")[data_type]
data = data.shuffle(seed=42) if shuffle else data
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move seed to a config or a constant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ya. Good call. The framework-level config could be a nice way to manage these seeds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I resolve this in the next PR? It doesn't hamper the behavior of the framework at this point.

@NISH1001 NISH1001 merged commit 6e3c4a6 into main Mar 10, 2023
@NISH1001 NISH1001 deleted the feature/metrics-models branch March 10, 2023 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants