High performance Chinese tokenizer with both GBK and UTF-8 charset support based on MMSEG algorithm developed by ANSI C. Completely based on modular implementation and can be easily embedded in other programs, like: MySQL, PostgreSQL, PHP, etc.
c
tokenizer
full-text-search
chinese-word-segmentation
chinese-tokenizer
php-tokenizer
korean-tokenizer
japanese-tokenizer
cjk-tokenizer
-
Updated
Oct 29, 2023 - C