Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

负采样 #113

Open
AugustLHHHHHH opened this issue Dec 7, 2023 · 3 comments
Open

负采样 #113

AugustLHHHHHH opened this issue Dec 7, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@AugustLHHHHHH
Copy link

🐛 bug 说明

请问“M3E 使用 in-batch 负采样的对比学习的方式在句对数据集进行训练,为了保证 in-batch 负采样的效果,我们使用 A100 80G 来最大化 batch-size,并在共计 2200W+ 的句对数据集上训练了 1 epoch。”里描述的负采样对比学习具体是怎么做的呢?对应到代码里是在哪个部分呢,谢谢

Python Version

None

@AugustLHHHHHH AugustLHHHHHH added the bug Something isn't working label Dec 7, 2023
@wangyuxinwhy
Copy link
Owner

wangyuxinwhy commented Dec 8, 2023

https://github.com/wangyuxinwhy/uniem/blob/main/uniem/criteria.py

可以查看 Loss 的计算方式,Loss 的计算实现了 In-Batch 的负采样

@AugustLHHHHHH
Copy link
Author

您好,感谢您的回复。我对于对比学习的batch内采样还不太熟悉,所以麻烦再请教下:

criteria.py里的PairInBatchNegSoftmaxContrastLoss对应的负样本采样是69-76行的步骤吗
image

@wangyuxinwhy
Copy link
Owner

嗯嗯,对的~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants