NewWordDectection

"新词发现"

目前仅有基于苏神（苏剑林）文章写的左右熵以及互信息方法进行词库建设，其他的算法后面有空会继续更新

安装说明

pip install new-words-detection

使用说明

# 加载模块
from new_words_detection.wordSegment import get_doc_words
    
# 准备预料, 输入时一个字符长串, 可含特殊字符
corpus = '哈哈哈哈哈哈哈哈哈哈哈或或或或或或或或或或或或或或或或或或或或或或或或或或或或或'
    
# 得到的输出是可成的词以及对应的pmi, 左右熵等信息
df_words = get_doc_words(corpus)

最终结果

word	word_length	word_freq	word_pmi	word_entropy
我爱你你爱我蜜雪冰城甜蜜蜜	13	1	3.3	2.333333333

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
new_words_detection		new_words_detection
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NewWordDectection

安装说明

使用说明

最终结果

About

Releases 3

Packages

Languages

License

skykiseki/NewWordDectection

Folders and files

Latest commit

History

Repository files navigation

NewWordDectection

安装说明

使用说明

最终结果

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages