Skip to content
/ DSSM Public

representation-based duplicated question identification

License

Notifications You must be signed in to change notification settings

lixinsu/DSSM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

basic text matching model

representation-based method

基本的文本匹配模型,用于重复问题检测。模型使用词语和字符两种级别的嵌入向量,将两段文本进行LSTM表示,然后拼接得到隐含向量,进行二分类。

features

  • pretrained word and char embedding
  • combine word-level and char-level matching signal

training

  • clone the reposity and run sh scripts/setup.sh
  • cd data/atec and download data from - atec data
  • split the origin csv file to train.csv , dev.csv and test.csv
  • run sh scripts/train.sh

About

representation-based duplicated question identification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published