-
Notifications
You must be signed in to change notification settings - Fork 1k
Insights: huggingface/trl
Overview
Could not load contribution data
Please try again later
6 Pull requests merged by 5 people
-
adds AOT
#1701 merged
Jun 12, 2024 -
ktotrainer: Refuse datasets which contain only one class of labels
#1724 merged
Jun 11, 2024 -
feat(ci): add trufflehog secrets detection
#1721 merged
Jun 10, 2024 -
Fix default padding_value in dpo_config.py
#1692 merged
Jun 7, 2024 -
fix yaml parser for derived config classes
#1713 merged
Jun 7, 2024 -
[RPO] fix nll loss
#1705 merged
Jun 7, 2024
5 Pull requests opened by 4 people
-
Fix masking of response tokens
#1718 opened
Jun 9, 2024 -
Adding SimPO to TRL
#1725 opened
Jun 11, 2024 -
prepare deepspeed accomodate fp16 and bf16
#1728 opened
Jun 13, 2024 -
rloo trainer with trainer callbacks
#1729 opened
Jun 13, 2024 -
allow ref model use ds stage3 only
#1730 opened
Jun 13, 2024
14 Issues closed by 7 people
-
kto error when assign dataset to device
#1620 closed
Jun 13, 2024 -
Using PEFT causes model to not predict EOS
#1578 closed
Jun 12, 2024 -
KTO finetuning - float division by zero
#1651 closed
Jun 11, 2024 -
How to use trl\trainer\kto_trainer.py
#1635 closed
Jun 11, 2024 -
ImportError: cannot import name 'SFTConfig' from 'trl'
#1639 closed
Jun 11, 2024 -
ValueError when training on a multi GPU setup and DPO
#1645 closed
Jun 11, 2024 -
Feature Request: Simple Preference Optimization Integration
#1684 closed
Jun 10, 2024 -
Why ratios don't need to be detached?
#1720 closed
Jun 10, 2024 -
TRL orpo gives everything Nan
#1473 closed
Jun 10, 2024 -
Possible risks in xxxPOTrainer
#1679 closed
Jun 10, 2024 -
Training stops early
#1601 closed
Jun 7, 2024 -
YamlConfigParser fails on RewardConfig, DPOConfig etc..
#1712 closed
Jun 7, 2024 -
Discrepancy between reward model training tokenization and PPO tokenization in PPOTrainerv2
#1702 closed
Jun 7, 2024
10 Issues opened by 10 people
-
TRLParser needs changes, overwrites command line arguments with config
#1733 opened
Jun 13, 2024 -
[BUG] RuntimeError: still have inflight params in KTO
#1732 opened
Jun 13, 2024 -
Questions about the reference model in PPOTrainer and DPOTrainer
#1727 opened
Jun 12, 2024 -
FSDP with PPO trainer won't work because FSDP doesn't support model.generate
#1726 opened
Jun 12, 2024 -
FSDP Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float32
#1723 opened
Jun 11, 2024 -
KTO train_loss = 0.0
#1722 opened
Jun 10, 2024 -
Problem handling of response masks
#1717 opened
Jun 9, 2024 -
trl CLI doesn't work
#1716 opened
Jun 7, 2024 -
Feature Request: Self-Improving Robust Preference Optimization (SRPO)
#1714 opened
Jun 7, 2024
20 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Added Reward Backpropogation Support
#1585 commented on
Jun 12, 2024 • 10 new comments -
Got an abnormally high loss when training Gemma-7B.
#1709 commented on
Jun 8, 2024 • 3 new comments -
DPO models generate multiple / corrupted responses
#1025 commented on
Jun 12, 2024 • 3 new comments -
Error when Using 8-bit Quantization
#1616 commented on
Jun 11, 2024 • 3 new comments -
[ORPO] Enable batched tokenization & multiprocessing to process large datasets
#1624 commented on
Jun 9, 2024 • 1 new comment -
Integrate f-divergence to DPO (Follow up)
#1610 commented on
Jun 11, 2024 • 1 new comment -
Minimal examples
#1603 commented on
Jun 9, 2024 • 1 new comment -
A pull request for POVIDTrainer
#1573 commented on
Jun 11, 2024 • 1 new comment -
How to save and resume a checkpoint from PPOTrainer
#1643 commented on
Jun 13, 2024 • 1 new comment -
Do we need to consider the chat template when doing DPO/KTO training?
#1640 commented on
Jun 12, 2024 • 1 new comment -
how to save v_head
#1650 commented on
Jun 12, 2024 • 1 new comment -
CLI utils class cases seem to be incorrect
#1600 commented on
Jun 11, 2024 • 1 new comment -
SFTrainer with FSDP on a model that doens't fit in GPU memory
#1681 commented on
Jun 11, 2024 • 1 new comment -
FineTuning issue with Gemma-2B-IT model using the SFTTrainer
#1665 commented on
Jun 11, 2024 • 1 new comment -
Use `SFTTrainer` for completion-only model without `DataCollatorForCompletionOnlyLM`
#1507 commented on
Jun 11, 2024 • 1 new comment -
stf Example not working
#1693 commented on
Jun 10, 2024 • 1 new comment -
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cpu! (values = values * mask)
#1691 commented on
Jun 10, 2024 • 1 new comment -
Seq2seq model with ppo_trainer samples strange output!
#1633 commented on
Jun 8, 2024 • 1 new comment -
[DRAFT] Vllm integration
#1628 commented on
Jun 7, 2024 • 0 new comments -
Why compute IPO loss using `average_log_prob=Ture`?
#1677 commented on
Jun 7, 2024 • 0 new comments