Skip to content

Stemming and lemmatization fully covered for Sorani

Latest
Compare
Choose a tag to compare
@sinaahmadi sinaahmadi released this 06 Jan 16:16
· 10 commits to master since this release

In the version, the following are done:

  • It is possible to stem ("بڕ" → "بڕاوە") and lemmatize ("بردن" → "بردمنەوە") words of all part-of-speech. Up to version 0.1.4, stemming was only possible for verbs.
  • For stemming unknown words, a rule-based approach is provided.
  • When using the morphological analyzer (in the stem module), prefixes and suffixes are returned separately. These used to be previously merged.
  • The tagged lexicon is updated and further enriched with more lexical entries, particularly proper nouns.