Skip to content

Releases: shon-otmazgin/fastcoref

custom spacy model for trainer

26 May 14:47
Compare
Choose a tag to compare

bug fix

09 May 17:00
Compare
Choose a tag to compare

Address issue

remove uneeded layers

18 Apr 16:10
Compare
Choose a tag to compare
  1. avoiding downloading Spacy model with pretokenized text (@aryehgigi)
  2. remove the wandb installation if you not using train mode (i.e adding fastcoref[train] in PyPi) (@aryehgigi)

patch to disable progress bar at inference

24 Nov 17:10
Compare
Choose a tag to compare

patch to disable progress bar at inference (thanks to @radandreicristian)

v2.1.0

10 Nov 18:08
Compare
Choose a tag to compare

NEW FEATURE (thanks to @aryehgigi):

Predict function signature changed to support tokenized text as input:

texts: Union[str, List[str], List[List[str]]],  # similar to huggingface tokenizer inputs
is_split_into_words: bool = False

if you send a tokenized text to the predict function use is_split_into_words=True

If you want to use a single tokenized sequence you must setis_split_into_words=True (to lift the ambiguity with a batch of sequences)

v2.0.3

09 Nov 18:30
Compare
Choose a tag to compare

Fix - disabling all spacy components except tokenizer. - see #13 (thanks to @radandreicristian)

v2.0.2

27 Oct 15:09
Compare
Choose a tag to compare

Utilizing the existing spacy instance while using spacy component

v2.0.1 - spacy_component, trainer

25 Oct 11:26
Compare
Choose a tag to compare

Adding the following features:

  1. Spacy component (Thanks to @mlostar )
from fastcoref import spacy_component
import spacy


texts = ['Alice goes down the rabbit hole. Where she would discover a new reality beyond her expectations.']

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("fastcoref")

docs = nlp(texts)
docs[0]._.coref_clusters
> [[(0, 5), (39, 42), (79, 82)]]
  1. Trainer
from fastcoref import TrainingArgs, CorefTrainer

args = TrainingArgs(
    output_dir='test-trainer',
    overwrite_output_dir=True,
    model_name_or_path='distilroberta-base',
    device='cuda:2',
    epochs=129,
    logging_steps=100,
    eval_steps=100
)   # you can control other arguments such as learning head and others.

trainer = CorefTrainer(
    args=args,
    train_file='train_file_with_clusters.jsonlines', 
    dev_file='path-to-dev-file',    # optional
    test_file='path-to-test-file'   # optional
)
trainer.train()
trainer.evaluate(test=True)

trainer.push_to_hub('your-fast-coref-model-path')
  1. predict now support output file:
from fastcoref import LingMessCoref

model = LingMessCoref()
preds = model.predict(texts=texts, output_file='train_file_with_clusters.jsonlines')