Skip to content

0.1.6

Latest
Compare
Choose a tag to compare
@aalok-sathe aalok-sathe released this 15 Nov 18:59
· 29 commits to main since this release

0.1.6 / 2023-11-15

  • rationalize dependencies
  • update to include the KenLMModel example in README

0.1.4 / 2023-11-14

  • version bump; release
  • Merge branch 'benlipkin/main' into main
  • add simple surprisal compute test that should pass as long as full surprise() method executes. no assertions are made.
  • implement attention mask for use_bos_token token case
  • Merge branch 'main' of github.com:benlipkin/surprisal into benlipkin/main
  • Merge branch 'main' of github.com:aalok-sathe/surprisal into main
  • Merge pull request #14 from aalok-sathe/feature-support-kenlm
  • OK, we have a MWE! still TODO: figure out surprisal value: do we want that? maybe add an option to show but default to disabling it? do we also want bos?
  • bugfix in ids handling in CustomEncoding
  • bugfixes; bump numpy version for typing
  • make KenLMModel visible at the module level
  • add an KenLM and NGramSurprisal implementation
  • move repr() to SurprisalArray rather than huggingfacesurprisal. complete CustomEncoding implementation.
  • actually no point subclassing from tokenizers.Encoding
  • flesh out interface towards supporting CustomEncoding for custom-tokenized text, e.g. whitespace for kenlm
  • Update python-publish.yml: fix poetry install command
  • Update pylint.yml: fix poetry install command
  • add deps to publish action workspace using poetry
  • Update pylint.yml
  • Create python-publish.yml
  • Create pylint.yml
  • start writing kenLM implementation
  • prefix with hf to indicate it works with HF-based tokenizer outputs
  • explicitly specify attention mask
  • fix concat on device
  • always pad right
  • add arg to trust remote code
  • require torch2+ to take advantage of optimizations
  • added support for fp16 and bf16 precision