Skip to content

parryc/chinese-nlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Chinese Sentence Segmentation

Playing around with ways of emulating different chinese sentence techniques in the browser

Why?

Everyone and their mom has done some sort of Chinese NLP thing, how is this different?

It's not, really. But basically it's cause not everyone has a server in X or wants to use language Y (I'm looking at you, Java) etc. How can I create the most efficient single JS file that can compete with some of the large, expansive, OMFG I have 98,000 bigrams in my database. What is a reasonable list of words that I need? These kinds of questions.

Done

  • Forward Maximum Matching
  • Backward Maximum Matching

Next up?

  • Conditional Random Fields
  • Finding a better bigram list...

Current word list

Bigrams up till HSK Level 6

Reading

Introduction to Chinese NLP by my home-peeps (I wish), Kam-Fai Wong, Wenjie Li, Ruifeng Xu, and Zheng-sheng Zhang

License

MIT

About

In-browser Chinese NLP

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published