Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

指定纠错 #499

Closed
joseph16388 opened this issue Jun 17, 2024 · 2 comments
Closed

指定纠错 #499

joseph16388 opened this issue Jun 17, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@joseph16388
Copy link

请问可以指定纠错的target和source的基本发音不要改变吗?举例:干净的胡面,只能改为干净的湖面,不要改为干净的画面。

@joseph16388 joseph16388 added the bug Something isn't working label Jun 17, 2024
@shibing624
Copy link
Owner

1.训练集里面指定,只纠错音似错误就可以;
2.用自己改好的训练集,再训练一版模型。

@joseph16388
Copy link
Author

joseph16388 commented Jun 17, 2024

1.训练集里面指定,只纠错音似错误就可以; 2.用自己改好的训练集,再训练一版模型。

好的,3Q!进一步问一下,你说的指定就是只选取纠错音似的数据集吧?
比如下面这个:
SIGHAN+Wang271K中文纠错数据集(27万条),是通过原始SIGHAN13、14、15年数据集和Wang271K数据集格式转化后得到,json格式,带错误字符位置信息,SIGHAN为test.json, macbert4csc模型训练可以直接用该数据集复现paper准召结果。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants