Skip to content

Releases: NASA-IMPACT/evalem

nlp and cv namespace segregation

20 Jun 19:36
Compare
Choose a tag to compare
Pre-release

Disclaimer: This creates a breaking changes but only at namespace level. So, all the previous evalem.models, evalem.metrics, etc are now residing at evalem.nlp.models, evalem.nlp.metrics.

With this new release, now evalem has both nlp as well as cv namespace segregation:

  • evalem.nlp
  • evalem.cv

Both of these have:

  • models
  • metrics
  • evaluation pipeline

All these are derived from bases at evalem._base.

v0.0.3-alpha.1

21 Apr 18:56
Compare
Choose a tag to compare
v0.0.3-alpha.1 Pre-release
Pre-release

This release fixes few setup related misconfigurations. See #16


v0.0.3-alpha

27 Mar 18:44
Compare
Choose a tag to compare
v0.0.3-alpha Pre-release
Pre-release

This release adds a simple pipeline abstraction for existing ModelWrapper, Metric and Evaluator.

Changelog

Major

  • evalem.pipelines.SimpleEvaluationPipeline is added that wraps existing model wrappers, metrics and evaluators to run in single coherent abstraction. see PR
  • More semantic metrics like Bleu, ROUGE, METEOR are added. see PR

Minor

  • Test suites are refactored. For example, the model and pipeline tests suites are parameterized through conftest.py paradigm.

Usage

from evalem.pipelines import SimpleEvaluationPipeline
from evalem.models import TextClassificationHFPipelineWrapper
from evalem.evaluators import TextClassificationEvaluator

# can switch to any implemented wrapper
model = TextClassificationHFPipelineWrapper()

# can switch to other evaluator implementation
evaluator = TextClassificationEvaluator()

# initialize
eval_pipe = SimpleEvaluationPipeline(model=model, evaluators=evaluator)

results = pipe(inputs, references)

# or
results = pipe.run(inputs, references)

[alpha] Initial release

20 Mar 20:22
Compare
Choose a tag to compare
Pre-release

This release adds initial metrics and model components as:

1) Metrics

We can import various metrics from evalem.metrics

  • BasicMetrics and SemanticMetrics can be used
  • basic metrics are:
    - F1Metric
    - RecallMetric
    - PrecisionMetric
    - ConfusionMatrix
    - AccuracyMetric
    - ExactMatchMetric
  • semantic metrics include BertScore and BartScore

These metrics can be used independently to evaluate the predictions from upstream models using references/ground-truths.

See PRs this, this and this

2) ModelWrapper

evalem.models include various model wrapper implementation. See PRs this and this

  • evalem.models.QuestionAnsweringHFPipelineWrapper and evalem.models.TextClassificationHFPipelineWrapper are now the main wrappers for QA and Text Classification tasks respectively.

    • These also have better parameter initialization, allowing any suitable models and tokenizers to be used along with device types.
    • hf_params dict is also provided as a parameter that will be used for initializing the HF pipeline
  • The model wrappers utilize 2 distinct processing parameters (one for pre-preocessing and one for post-processing) which should be Callable (lambda function, external modules that can be called, etc.) and can be modified accordingly to pre/post processing.

3) Evaluator

evaluators provide abstraction/containerization of metrics to evaluate in group.
See PRs this, this and this

We have 2 different evaluator implementation:

  • evalem.evaluators.QAEvaluator for evaluating QA metrics
  • evalem.evaluators.TextClassificationEvaluator for text classification

We can also directly use evalem.evaluators._base.Evaluator to create our own custom evaluator object.