Skip to content

This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.

Notifications You must be signed in to change notification settings

petar-popovic-bg/Jerteh

Repository files navigation

Jerteh

This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.

Installation

Linux
  1. pip: pip install -e git+https://github.com/petar-popovic-bg/Jerteh.git#egg=Jerteh

    update: pip install -e git+https://github.com/petar-popovic-bg/Jerteh.git#egg=Jerteh --upgrade

  2. Edit your treetaggerwrapper.py file inside your virtual environment, so the wrapper supports Serbian-latin and Serbian-cyrillic script.

    """
    ('slovak', 'sk'),
    ('swahili', 'sw'),
    ('serbian-lat', 'sr-lat'),
    ('serbian-cyr', 'sr-cyr')]:
    ls = g_langsupport[lang] = copy.deepcopy(g_langsupport['__base__'])
    ...
    g_langsupport['sk']['dummysentence'] = 'To je koniec . .'
    g_langsupport['sw']['dummysentence'] = 'Hii ni mwisho . .'
    g_langsupport['sr-lat']['dummysentence'] = 'Ovo je kraj . .'
    g_langsupport['sr-cyr']['dummysentence'] = 'Ово је крај . .'
    """
  3. Edit configure.py, so it points to your local installations of TreeTagger and Unitex.

Instructions

Using TreeTagger and Unitex classes requires TreeTagger and Unitex to be installed on your machine.

About

This package provides utility classes and static methods for Python that make use of different third party software commonly used in text processing such as: Unitex-GramLab, TreeTagger, Apache-Tika and Google-Tesseract.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages