Skip to content

Latest commit

 

History

History
18 lines (17 loc) · 933 Bytes

tricky_case.md

File metadata and controls

18 lines (17 loc) · 933 Bytes

比较有挑战的分词测试集

testCases = [
    "24口交换机",
    "商品和服务",
    "结婚的和尚未结婚的确实在干扰分词啊",
    "买水果然后来世博园最后去世博会",
    "中国的首都是北京",
    "欢迎新老师生前来就餐",
    "工信处女干事每月经过下属科室都要亲口交代24口交换机等技术性器件的安装工作",
    "随着页游兴起到现在的页游繁盛,依赖于存档进行逻辑判断的设计减少了,但这块也不能完全忽略掉。",
    "结婚的和尚未结婚的确实在干扰。",
    "龚学平等领导说,邓颖超生前杜绝超生",
    "当下雨天地面积水分外严重"]

Acknowledge & Credit

大部分数据来自 pyhanlp: Python interfaces for HanLPIterated Dilated Convolutions for Chinese Word Segmentation项目