Skip to content
@bitextor

Bitextor Team

Translation memories generator

Pinned Loading

  1. bitextor bitextor Public

    Bitextor generates translation memories from multilingual websites

    Python 284 43

  2. bicleaner bicleaner Public

    Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    Python 149 22

  3. bifixer bifixer Public

    Tool to fix bitexts and tag near-duplicates for removal

    Python 28 3

  4. biroamer biroamer Public

    Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

    Python 9 2

  5. pdf-extract pdf-extract Public

    PDF parser and converter to HTML

    Java 82 14

  6. warc2text warc2text Public

    Extracts plain text, language identification and more metadata from WARC records

    C++ 20 5

Repositories

Showing 10 of 28 repositories
  • bifixer Public

    Tool to fix bitexts and tag near-duplicates for removal

    bitextor/bifixer’s past year of commit activity
    Python 28 GPL-3.0 3 0 0 Updated Aug 13, 2024
  • warc2text Public

    Extracts plain text, language identification and more metadata from WARC records

    bitextor/warc2text’s past year of commit activity
    C++ 20 MIT 5 12 3 Updated Aug 8, 2024
  • bicleaner-hardrules Public

    Pre-filtering step for bicleaner

    bitextor/bicleaner-hardrules’s past year of commit activity
    Python 4 GPL-3.0 2 0 0 Updated Jul 26, 2024
  • bicleaner-ai Public

    Bicleaner fork that uses neural networks

    bitextor/bicleaner-ai’s past year of commit activity
    Python 36 GPL-3.0 4 2 0 Updated Jul 26, 2024
  • bitextor Public

    Bitextor generates translation memories from multilingual websites

    bitextor/bitextor’s past year of commit activity
    Python 284 GPL-3.0 43 4 4 Updated Jun 18, 2024
  • bicleaner Public

    Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.

    bitextor/bicleaner’s past year of commit activity
    Python 149 GPL-3.0 22 0 1 Updated Jun 18, 2024
  • biroamer Public

    Utility that will help you to ROAM (Random Omit Anonymize and Mix) your parallel corpus.

    bitextor/biroamer’s past year of commit activity
    Python 9 GPL-3.0 2 0 0 Updated Feb 26, 2024
  • bitextor/monocleaner’s past year of commit activity
    Python 6 GPL-3.0 1 1 0 Updated Sep 6, 2023
  • bitextor/monotextor’s past year of commit activity
    Python 6 GPL-3.0 1 0 0 Updated May 31, 2023
  • bitextor-testing-output Public

    Repository for storing testing outputs from Bitextor

    bitextor/bitextor-testing-output’s past year of commit activity
    0 GPL-3.0 0 0 0 Updated May 29, 2023

Top languages

Loading…

Most used topics

Loading…