APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.
You can download benchmark set APEACH. APEACH/test.csv
in this repository.
- APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.
Name | Beep! Dev Dataset | Apeach (Ours) |
---|---|---|
SoongsilBERT-Base | 0.8261 | 0.8424 |
SoongsilBERT-Small | 0.8149 | 0.8228 |
KcBERT-base | 0.8088 | 0.8086 |
KcBERT-large | 0.8295 | 0.8116 |
DistillKoBERT | 0.7570 | 0.7715 |
KoELECTRA-V3 | 0.7920 | 0.8101 |
KoBERT | 0.8030 | 0.7885 |
We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.
The main contributors of the work ( * : equal contribution) :
- Kichang Yang* (Kakao Corp., Kakao Enterprise Corp., Soongsil University)
- Wonjun Jang* (Kakao Corp., Soongsil University)
- Won Ik Cho* (Seoul National University)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.