Pulse · huggingface/transformers · GitHub

July 19, 2024 – July 26, 2024

Overview

116 Active pull requests

109 Active issues

3 Releases published by 1 person

v4.43.0 v4.43.0: Llama 3.1, Chameleon, ZoeDepth, Hiera
published Jul 23, 2024
v4.43.1 v4.43.1: Patch release
published Jul 23, 2024
v4.43.2 v4.43.2: Patch release
published Jul 24, 2024

75 Pull requests merged by 44 people

Flash-Attn: fix generation when no attention mask or no pading
#32241 merged Jul 26, 2024
[tests] fix static cache implementation is not compatible with attn_implementation==flash_attention_2
#32039 merged Jul 26, 2024
Add check for target_sizes is None in post_process_image_guided_detection for owlv2
#31934 merged Jul 26, 2024
Adds: extra_repr for RMSNorm layers in most models
#32204 merged Jul 26, 2024
Refactor: Removed un-necessary object base class
#32230 merged Jul 26, 2024
don't log base model architecture in wandb if log model is false
#32143 merged Jul 26, 2024
Resize embeds with DeepSpeed
#32214 merged Jul 26, 2024
Llava: generate without images
#32183 merged Jul 26, 2024
Generation: stop at eos for assisted decoding
#31301 merged Jul 26, 2024
Fix code snippet for Grounding DINO
#32229 merged Jul 25, 2024
Allow a specific microphone to be used by the ffmpeg audio pipeline utility functions. Default to using the currently active microphone on Mac
#31846 merged Jul 25, 2024
translate philosophy.md to chinese
#32177 merged Jul 25, 2024
Follow up for #31973
#32025 merged Jul 25, 2024
[warnings] fix E721 warnings
#32223 merged Jul 25, 2024
[BigBird Pegasus] set _supports_param_buffer_assignment to False
#32222 merged Jul 25, 2024
Update question_answering.py
#32208 merged Jul 25, 2024
remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0
#32210 merged Jul 25, 2024
[whisper] fix short-form output type
#32178 merged Jul 25, 2024
fix: Replaced deprecated unittest method with the correct one
#32198 merged Jul 24, 2024
🚨 No more default chat templates
#31733 merged Jul 24, 2024
Support dequantizing GGUF FP16 format
#31783 merged Jul 24, 2024
Fix float8_e4m3fn in modeling_utils
#32193 merged Jul 24, 2024
Fix resize embedding with Deepspeed
#32192 merged Jul 24, 2024
let's not warn when someone is running a forward
#32176 merged Jul 24, 2024
RoPE: relaxed rope validation
#32182 merged Jul 24, 2024
Remove conversational pipeline tests
#32099 merged Jul 24, 2024
Update qwen2.md
#32108 merged Jul 24, 2024
fix: default value reflects the runtime environment variables rather than the ones present at import time.
#32153 merged Jul 24, 2024
adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer
#32171 merged Jul 24, 2024
[docs] change temperature to a positive value
#32077 merged Jul 23, 2024
fix: Fixed an if condition that is always evaluating to true
#32160 merged Jul 23, 2024
fix
#32162 merged Jul 23, 2024
Updated ruff to the latest version
#31926 merged Jul 23, 2024
Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs
#31629 merged Jul 23, 2024
Added additional kwarg for successful running of optuna hyperparameter search
#31924 merged Jul 23, 2024
feat(cache): StaticCache uses index_copy_ to avoid useless copy
#31857 merged Jul 23, 2024
Fix typing to be compatible with later py versions
#32155 merged Jul 23, 2024
Revert "Incorrect Whisper long-form decoding timestamps "
#32148 merged Jul 23, 2024
Rename Phi-3 rope scaling type
#31436 merged Jul 23, 2024
Added mamba.py backend
#30139 merged Jul 23, 2024
Fix video batching to videollava
#32139 merged Jul 23, 2024
Fix flash attention speed issue
#32028 merged Jul 23, 2024
gguf conversion add_prefix_space=None for llama3
#31937 merged Jul 23, 2024
Llama: RoPE refactor
#32135 merged Jul 23, 2024
Modify resize_token_embeddings to ensure output type is same as input
#31979 merged Jul 23, 2024
Disable quick init for TapasPreTrainedModel
#32149 merged Jul 23, 2024
Add YaRN and Dynamic-YaRN RoPE Scaling Methods
#30910 merged Jul 23, 2024
Add method to retrieve used chat template
#32032 merged Jul 23, 2024
Fix mask creations of GPTNeoX and GPT2
#31944 merged Jul 23, 2024
[modelling] remove un-necessary transpose for fa2 attention
#31749 merged Jul 23, 2024
Remove trust_remote_code when loading Libri Dummy
#31748 merged Jul 23, 2024
LLaVaNeXT: pad on right if training
#32134 merged Jul 23, 2024
Add llama3-llava-next-8b to llava_next conversion script
#31395 merged Jul 23, 2024
Add new quant method
#32047 merged Jul 22, 2024
set warning level to info for special tokens have been added
#32138 merged Jul 22, 2024
Don't default to other weights file when use_safetensors=True
#31874 merged Jul 22, 2024
Return assistant generated tokens mask in apply_chat_template
#30650 merged Jul 22, 2024
[RoBERTa] Minor clarifications to model doc
#31949 merged Jul 22, 2024
fix: Fixed raising TypeError instead of ValueError for invalid type
#32111 merged Jul 22, 2024
Update ko/_toctree.yml and remove custom_tools.md to reflect latest changes
#31969 merged Jul 22, 2024
Fix failing test with race condition
#32140 merged Jul 22, 2024
[generate] fix eos/pad id check on mps devices
#31695 merged Jul 22, 2024
Mention model_info.id instead of model_info.modelId
#32106 merged Jul 22, 2024
fix: Replaced deprecated mktemp() function
#32123 merged Jul 22, 2024
Generate: store special token tensors under a unique variable name
#31980 merged Jul 22, 2024
Fix shard order
#32023 merged Jul 22, 2024
Agents planning
#31702 merged Jul 22, 2024
Fix tests after huggingface_hub 0.24
#32054 merged Jul 19, 2024
Chameleon: not supported with fast load
#32091 merged Jul 19, 2024
Disable quick init for deepspeed
#32066 merged Jul 19, 2024
Support generating with fallback for short form audio in Whisper
#30984 merged Jul 19, 2024
Add image-text-to-text task guide
#31777 merged Jul 19, 2024
Fixes to chameleon docs
#32078 merged Jul 19, 2024
Fix progress callback deepcopy
#32070 merged Jul 19, 2024
VideoLLaVa: fix chat format in docs
#32083 merged Jul 19, 2024

41 Pull requests opened by 35 people

added warning to Trainer when label_names is not specified for PeftModel
#32085 opened Jul 19, 2024
fix: SeamlessM4TFeatureExtractor stride remainder
#32088 opened Jul 19, 2024
Add sdpa support for Albert
#32092 opened Jul 19, 2024
Fix `.push_to_hub(..., create_pr=True, revision="my-branch")` when creating PR on not-owned repo
#32094 opened Jul 19, 2024
Custom beam search scorer argument in generate function
#32097 opened Jul 19, 2024
Create django.yml
#32100 opened Jul 19, 2024
[wip][meta-llama][torch.compile] Fix issues with torch.compile
#32102 opened Jul 19, 2024
support 3D attention mask in bert
#32105 opened Jul 20, 2024
fix: resolve bug with `use_mps_device` setting not taking effect
#32114 opened Jul 20, 2024
Update stateful_callbacks state before saving checkpoint
#32115 opened Jul 20, 2024
Update benchmark.py-- Enhance Benchmarking with Multi-Commit Support …
#32116 opened Jul 21, 2024
[WIP] Add Depth Anything V2 Metric models
#32126 opened Jul 21, 2024
DINOv2 register support
#32127 opened Jul 22, 2024
[whisper] alternative fix for long-form timestamps
#32131 opened Jul 22, 2024
Add Qwen2-Audio
#32137 opened Jul 22, 2024
add scaling_factor to GemmaRotaryEmbedding for fix error in GemmaLine…
#32141 opened Jul 22, 2024
Add stream messages from agent run for gradio chatbot
#32142 opened Jul 22, 2024
Cache: create docs
#32150 opened Jul 23, 2024
[build-ci-image] add tiktoken
#32152 opened Jul 23, 2024
Added error when sequence length is bigger than max_position_embeddings
#32156 opened Jul 23, 2024
fix: (issue #32124) Exception raised when running `transformers/examples/flax/language-modeling/t5_tokenizer_model.py`.
#32157 opened Jul 23, 2024
Test
#32158 opened Jul 23, 2024
support copies
#32159 opened Jul 23, 2024
Add a static cache that offloads to the CPU or other device
#32161 opened Jul 23, 2024
Fixed Hybrid Cache Shape Initialization.
#32163 opened Jul 23, 2024
Make static cache compatible with torch.export
#32168 opened Jul 23, 2024
Uniformize kwargs for Layoutlm (2, 3, X) processors
#32180 opened Jul 24, 2024
Uniformize kwargs for chameleon processor
#32181 opened Jul 24, 2024
Gemma2 and flash-attention
#32188 opened Jul 24, 2024
[WIP] - Enable speculative decoding with batch size >1
#32189 opened Jul 24, 2024
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process
#32191 opened Jul 24, 2024
warning about weight_g/weight_v missing on WeightNorm on PyTorch
#32194 opened Jul 24, 2024
Whisper tokenizer word level timestamps
#32197 opened Jul 24, 2024
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/visual_bert
#32220 opened Jul 25, 2024
Fix attention propagation for vision towers of llava-like models
#32221 opened Jul 25, 2024
>3-5x faster torch.compile forward compilation for autoregressive decoder models
#32227 opened Jul 25, 2024
[WIP] Add support for XTR
#32231 opened Jul 25, 2024
fix: warmup_steps check for training_args
#32236 opened Jul 26, 2024
VLMs: dispatch sdpa to each sub model
#32238 opened Jul 26, 2024
🌐 [i18n-KO] Translated `image_feature_extraction.md` to Korean
#32239 opened Jul 26, 2024
#32184 save total_vocab_size
#32240 opened Jul 26, 2024

61 Issues closed by 23 people

Idefics2 generation erroring with flash_attention_2
#32237 closed Jul 26, 2024
`target_sizes` in Owlv2 `post_process_image_guided_detection`
#31915 closed Jul 26, 2024
Transformers 4.36 use_cache issue
#28056 closed Jul 26, 2024
Model saving when using Trainer with Accelerate
#29792 closed Jul 26, 2024
Running out of memory while finetuning and inferencing VideoMAE due to which script is being killed.
#30939 closed Jul 26, 2024
Can't create transformer pipeline because pytorch failed to be detected
#31454 closed Jul 26, 2024
cannot import name 'AutoModelForImageToImage' from 'transformers.models.auto.modeling_auto' (/opt/conda/lib/python3.10/site-packages/transformers/models/auto/modeling_auto.py)
#31460 closed Jul 26, 2024
ValueError: too many values to unpack (expected 2) when use glm4v or cogvlm2
#32226 closed Jul 25, 2024
[Whisper] Inconsistent return types for Whisper generation
#32202 closed Jul 25, 2024
🐛 `attn_implementation="sdpa"` slower than `BetterTransformer.transform`?
#31245 closed Jul 25, 2024
Low retrieval and generation performance if evaluate rag model using consolidate_rag_checkpoint initialized with BART-LARGE as generator
#31349 closed Jul 25, 2024
pipeline gives a different result than the other approach in predicting word probability
#31995 closed Jul 25, 2024
NaNs when running bitsandbytes quantized Chameleon
#32174 closed Jul 25, 2024
Metrics - Pipeline
#32190 closed Jul 24, 2024
The features calculated by transformers dinov2 are different from the official ones
#32175 closed Jul 24, 2024
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'
#32185 closed Jul 24, 2024
Llama 3 - RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920
#32170 closed Jul 24, 2024
Backwards compatibility broken for RoPE: "rope_type"
#32166 closed Jul 24, 2024
KeyError: 'rope_type'
#32167 closed Jul 24, 2024
cannot import name 'Conversation' from 'transformers'
#32096 closed Jul 24, 2024
callback to implement how the predictions should be stored
#32186 closed Jul 24, 2024
Using `AutoTokenizer.from_pretrained`'s `.encode()` function fails to add BOS token for new Llama-3.1 model
#32172 closed Jul 24, 2024
phi-3's LlamaTokenizer ignores newline character.
#32136 closed Jul 24, 2024
Why MPS can never be used successfully?
#32035 closed Jul 24, 2024
Unable to load wavlm-large from pretrained in offline mode
#32147 closed Jul 23, 2024
Extra dataset features not passing to the custom collator
#32093 closed Jul 23, 2024
Allow additional keyword args to be passed to optuna hyperparameter search
#31923 closed Jul 23, 2024
The behavior of the tokenizer loaded from GGUF file is incorrect.
#31630 closed Jul 23, 2024
Table question answering pipeline failing to save
#32128 closed Jul 23, 2024
Very different output depending on whether an attention mask is passed when using caching
#31943 closed Jul 23, 2024
the script convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py does not work on fairseq wav2vec2-xls-r fine-tuned model
#29182 closed Jul 23, 2024
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained
#30762 closed Jul 23, 2024
AttributeError: 'BertModel' object has no attribute 'attn_implementation'
#30965 closed Jul 23, 2024
MultiScaleDeformableAttentionFunction different results on different devices
#31399 closed Jul 23, 2024
LlavaNextVideo always assumes left padding when batch size is 1
#32112 closed Jul 23, 2024
Add llama3-llava-next-8b to convert_llava_next_weights_to_hf.py
#31394 closed Jul 23, 2024
"use_safetensors" not enforced with "local_files_only", loads bin file
#31649 closed Jul 22, 2024
Source link to `LlamaForSequenceClassification` seems broken, if so, update it.
#31531 closed Jul 22, 2024
Licence
#32104 closed Jul 22, 2024
Gemma-7b model set my own lm_head but cannot saved and changed pretrained embedding_layer's weights too.
#31467 closed Jul 22, 2024
Metadata
#32121 closed Jul 22, 2024
When tranining the RWKV, it report "backward error"
#31413 closed Jul 22, 2024
Transformer compatibility Python 3.9, ComfyUI
#31806 closed Jul 22, 2024
Model load when dtypes match is broken
#32089 closed Jul 22, 2024
unexpected keyword argument 'torch_empty_cache_steps' in TrainingArguments
#32071 closed Jul 22, 2024
HuggingFace `from_pretrained()` called multiple times when launching with `torchrun`
#29843 closed Jul 21, 2024
recent version of Transformers seems to mess with forward/__call__. Breaks patching loss function
#30753 closed Jul 21, 2024
Trained tokenizer has broken encoding for cyrillic
#30937 closed Jul 21, 2024
sdpa for bert causes nan output when mix-precision enabled.
#31038 closed Jul 21, 2024
Using a single 'RecurrentGemmaRglru' layer - "Trying to backward through the graph a second time" Error
#31324 closed Jul 21, 2024
https://github.com/VikParuchuri/surya can not convert model to onnx
#31384 closed Jul 21, 2024
ValueError: The checkpoint you are trying to load has model type `chameleon` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.
#32107 closed Jul 20, 2024
past_key_values for SeamlessM4Tv2ForSpeechToText is not working as expected
#29139 closed Jul 20, 2024
`num_input_tokens_seen` included the `pad` tokens if sample padding strategy used
#29889 closed Jul 20, 2024
stop_strings not work at all
#31363 closed Jul 20, 2024
loading AutoTokenizer.from_pretrained with gated model
#31367 closed Jul 20, 2024
TypeError: Block.forward() got an unexpected keyword argument 'past_key_value'
#31371 closed Jul 20, 2024
linear_sum_assignment error in the object_detection.py guide
#31461 closed Jul 20, 2024
Support generating with fallback for short form audio in Whisper
#29508 closed Jul 19, 2024
Can you please provide:
#32087 closed Jul 19, 2024
The ProgressCallback triggers a `cannot pickle '_thread.lock' object` failure
#32064 closed Jul 19, 2024

48 Issues opened by 45 people

A question about code on Mistral-7B attention
#32235 opened Jul 26, 2024
LLaVA cannot use beam search after 4.43.0
#32234 opened Jul 26, 2024
SinkCache with Qwen1.5 broken in 4.43.0+
#32233 opened Jul 25, 2024
can't load the llama-3.1-8b-instruct model
#32232 opened Jul 25, 2024
[Whisper] Attention mask not detected in `Whisper.generate()`
#32228 opened Jul 25, 2024
Add New Optimizer
#32225 opened Jul 25, 2024
`BarkModel` can't be saved anymore
#32224 opened Jul 25, 2024
flashattention3
#32219 opened Jul 25, 2024
Parallel inference on generative models throws an exception
#32217 opened Jul 25, 2024
auto_find_batch_size for OOM during evaluation
#32215 opened Jul 25, 2024
Chat Assistant Prefill
#32213 opened Jul 25, 2024
RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
#32211 opened Jul 25, 2024
get error when running the chatglm3: 'GenerationConfig' object has no attribute '_eos_token_tensor'
#32207 opened Jul 25, 2024
Does GroundingDINO support batched inference?
#32206 opened Jul 25, 2024
Broken accuracy on LLaMa 3.1 70B -- worse than even 8B
#32205 opened Jul 24, 2024
Cannot build documentation on Mac OS
#32203 opened Jul 24, 2024
Load Phi 3 small on Nvidia Tesla V100 - Flash Attention
#32201 opened Jul 24, 2024
Support `from_pretrained` of `FlaxPretrainedModel` from sharded `.safetensors` weights
#32200 opened Jul 24, 2024
Model loading is uneven on GPUs with AutomodelforCasualLM
#32199 opened Jul 24, 2024
error occur in the resize_embedding
#32196 opened Jul 24, 2024
"inverted" form required for 4D masking not defined / 4D attention masks breaks with transformers >=4.40
#32195 opened Jul 24, 2024
error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta').
#32187 opened Jul 24, 2024
DataCollatorForLanguageModeling is (unnecessary) slow
#32184 opened Jul 24, 2024
Static KV cache with CPU offloading
#32179 opened Jul 24, 2024
LlavaNextForConditionalGeneration - incorrect way of creating `final_labels` in `_merge_input_ids_with_image_features`
#32173 opened Jul 24, 2024
`dataloader_prefetch_factor` is left unused for datasets of type `IterableDataset`
#32169 opened Jul 23, 2024
Enable speculative decoding with batch size >1
#32165 opened Jul 23, 2024
Add Matching Anything by Segmenting Anything (MASA) MOT tracking model
#32164 opened Jul 23, 2024
Adding warnings or errors when provided sequence length is bigger than config.max_position_embeddings
#32154 opened Jul 23, 2024
[i18n-<languageCode>] Translating docs to <languageName>
#32146 opened Jul 22, 2024
callback to implement how the predictions should be stored.
#32145 opened Jul 22, 2024
TimmBackbone.from_pretrained out_indices not working correctly (out_indices is sorted using strings blocks.11 before blocks.3)
#32133 opened Jul 22, 2024
Wav2Vec2ProcessorWithLM doesn't handle unknown token well for BPE
#32132 opened Jul 22, 2024
Is apply_chat_template support function call usage?
#32130 opened Jul 22, 2024
No module named 'transformers.modeling_flash_attention_utils'
#32129 opened Jul 22, 2024
TF Lite model created from TFWhisperForConditionalGeneration.from_pretrained craches
#32125 opened Jul 21, 2024
Exception raised when running `T5-like span-masked language modeling` example in `examples/flax/language-modeling/`
#32124 opened Jul 21, 2024
RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback): module 'tensorflow' has no attribute 'data'
#32122 opened Jul 21, 2024
Output from model.Generate & model.forward not same when output attention/hidden_state is True
#32117 opened Jul 21, 2024
_prepare_4d_causal_attention_mask mask inversion should work boolean masks
#32113 opened Jul 20, 2024
Gemma template won't end with eos_token
#32110 opened Jul 20, 2024
gemma2 + flash atten Error: RuntimeError: linalg.vector_norm: Expected a floating point or complex tensor as input. Got Long
#32103 opened Jul 20, 2024
Using Trainer + a pretrained tokenizer + 4D attention mask is extremely slow
#32101 opened Jul 19, 2024
Unrecognized configuration class ChameleonConfig
#32098 opened Jul 19, 2024
max_length calculation for padding the generation outputs in the Seq2SeqTrainer prediction_step function
#32095 opened Jul 19, 2024
[Error] with Trainer: TypeError: Unsupported types (<class 'NoneType'>) passed to `_gpu_broadcast_one`.
#32090 opened Jul 19, 2024
The implementations of `LlamaAttention` and `LlamaSdpaAttention` are not equivalent.
#32086 opened Jul 19, 2024
Training multiple adapters
#32084 opened Jul 19, 2024

189 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[WIP] - Add Microsoft CLAP model
#31929 commented on Jul 25, 2024 • 52 new comments
Add DAB-DETR Object detection/segmentation model
#30803 commented on Jul 24, 2024 • 48 new comments
Improve support for image generation with Chameleon & Anole
#32013 commented on Jul 24, 2024 • 43 new comments
Add GLM-4 and Later GLM Model (Draft)
#31977 commented on Jul 26, 2024 • 41 new comments
Support Kosmos-2.5
#31711 commented on Jul 25, 2024 • 36 new comments
[WIP] Add OmDet-Turbo
#31843 commented on Jul 23, 2024 • 35 new comments
Uniform kwargs for processors + Docs update - GroundingDINO
#31964 commented on Jul 25, 2024 • 31 new comments
Import structure & first three model refactors
#31329 commented on Jul 25, 2024 • 25 new comments
[GroundingDino] Fix grounding dino loss 🚨
#31828 commented on Jul 23, 2024 • 24 new comments
🚨 Bloom support for cache class
#31445 commented on Jul 26, 2024 • 22 new comments
Support reading tiktoken tokenizer.model file
#31656 commented on Jul 23, 2024 • 18 new comments
RoPE: model-agnostic RoPE refactor
#31999 commented on Jul 22, 2024 • 13 new comments
Add codestral mamba2
#32080 commented on Jul 26, 2024 • 12 new comments
Add Nemotron HF Support
#31699 commented on Jul 26, 2024 • 11 new comments
Offloaded KV Cache
#31325 commented on Jul 25, 2024 • 8 new comments
Granite language models
#31502 commented on Jul 24, 2024 • 8 new comments
Implement SuperGlue model
#29886 commented on Jul 20, 2024 • 7 new comments
Add Descript-Audio-Codec model
#31494 commented on Jul 26, 2024 • 7 new comments
Add DINOv2 with registers
#31832 commented on Jul 23, 2024 • 5 new comments
_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules
#30884 commented on Jul 22, 2024 • 3 new comments
Interpolate clip
#31900 commented on Jul 22, 2024 • 3 new comments
[pipeline] fix padding for 1-d tensors
#31776 commented on Jul 25, 2024 • 3 new comments
[whisper] compile compatibility with long-form decoding
#31772 commented on Jul 25, 2024 • 3 new comments
clean_up_tokenization_spaces=False if unset
#31938 commented on Jul 25, 2024 • 2 new comments
Add ViTPose
#30530 commented on Jul 25, 2024 • 2 new comments
Cache: new Cache format in decoder-only models
#31421 commented on Jul 26, 2024 • 2 new comments
Fix conflicting key in init kwargs in PreTrainedTokenizerBase
#31233 commented on Jul 23, 2024 • 2 new comments
Implement MambaForSequenceClassification
#31155 commented on Jul 23, 2024 • 2 new comments
Add Flax Dinov2
#31960 commented on Jul 26, 2024 • 1 new comment
SPLIT PR: eos bos tokens
#31316 commented on Jul 22, 2024 • 1 new comment
[RoBERTa-based] Add support for sdpa
#30510 commented on Jul 22, 2024 • 1 new comment
Add Zamba
#30950 commented on Jul 24, 2024 • 0 new comments
Add IRIS
#30883 commented on Jul 25, 2024 • 0 new comments
fix: multilingual midel convert to tflite get wrong token
#32079 commented on Jul 19, 2024 • 0 new comments
fixes clip interpolate
#30783 commented on Jul 20, 2024 • 0 new comments
pipeline 'text-classification' in >=4.40.0 throwing TypeError: Got unsupported ScalarType BFloat16
#30542 commented on Jul 21, 2024 • 0 new comments
filter flash_attn optional imports loading remote code
#30954 commented on Jul 22, 2024 • 0 new comments
[WIP] enable cpu bnb path
#31098 commented on Jul 22, 2024 • 0 new comments
Fix perceiver latent initialization modeling_idefics2.py
#31151 commented on Jul 26, 2024 • 0 new comments
[WIP] Add Tokenizer for MyT5 Model
#31286 commented on Jul 22, 2024 • 0 new comments
Reducing memory usage: removing useless logits computation in generate()
#31292 commented on Jul 24, 2024 • 0 new comments
fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function
#31296 commented on Jul 26, 2024 • 0 new comments
Added HHCache class implementing H2O Cache
#31623 commented on Jul 26, 2024 • 0 new comments
The last ut test of the QDQBert model ”test_inference_no_head_absolute_embedding” did not pass when using official safetensors
#31486 commented on Jul 26, 2024 • 0 new comments
"from_pretrained" read wrong config file. not "tokenizer_config.json", but "config.json"
#31282 commented on Jul 26, 2024 • 0 new comments
Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification
#30614 commented on Jul 26, 2024 • 0 new comments
Inconsitent module names (state_dict keys).
#30124 commented on Jul 26, 2024 • 0 new comments
Inconsistent special_token addition in EncoderDecoderModel forward pass
#31729 commented on Jul 26, 2024 • 0 new comments
Adding mixtral attention_bias in style of llama modeling
#28440 commented on Jul 26, 2024 • 0 new comments
Title: CUDA RuntimeError: Unspecified Launch Failure during Training
#30913 commented on Jul 26, 2024 • 0 new comments
Weights of LlamaForQuestionAnswering were not initialized from the model checkpoint
#30381 commented on Jul 26, 2024 • 0 new comments
[WIP] Improve multimodal processors - rely less on kwargs
#28711 commented on Jul 24, 2024 • 0 new comments
🚨 Add Blip2ForImageTextRetrieval
#29261 commented on Jul 24, 2024 • 0 new comments
Fix from pretrained ignoring errors
#29959 commented on Jul 26, 2024 • 0 new comments
fix prompt tunning + deepspeed zero3 + checkpoint_saving hang issue
#29980 commented on Jul 23, 2024 • 0 new comments
schedulefree optimizers
#30079 commented on Jul 23, 2024 • 0 new comments
Add SDPA support for T5 Style Models
#30375 commented on Jul 20, 2024 • 0 new comments
Add trainer integration test for llava to ensure accelerate autocasting works correctly
#30489 commented on Jul 19, 2024 • 0 new comments
update based on tokenizers release
#30574 commented on Jul 23, 2024 • 0 new comments
Adding imagebind
#30690 commented on Jul 23, 2024 • 0 new comments
Remove device map for saving tokenizer config on TPU (fix for issue #31971)
#32043 commented on Jul 19, 2024 • 0 new comments
[`WIP`] Add Mamba2
#32027 commented on Jul 20, 2024 • 0 new comments
HFQuantizer implementation for compressed-tensors library
#31704 commented on Jul 25, 2024 • 0 new comments
[Demo][ExecuTorch] Lower and run native Gemma e2e in ExecuTorch
#31706 commented on Jul 24, 2024 • 0 new comments
Update kwargs validation for `preprocess` with decorator
#32024 commented on Jul 24, 2024 • 0 new comments
[WIP] Agents use grammar
#31735 commented on Jul 25, 2024 • 0 new comments
[docs] Redesign
#31757 commented on Jul 25, 2024 • 0 new comments
Adding mplugdocowl
#31792 commented on Jul 25, 2024 • 0 new comments
chore: move `conftest.py` to `tests/`
#32011 commented on Jul 25, 2024 • 0 new comments
Enable whisper encoder to accept any chunk length
#31991 commented on Jul 25, 2024 • 0 new comments
Add support for GGUF Phi-3
#31844 commented on Jul 22, 2024 • 0 new comments
Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer
#31870 commented on Jul 24, 2024 • 0 new comments
Add SOLO: A Single Transformer for Scalable Vision-Language Modeling
#31918 commented on Jul 20, 2024 • 0 new comments
Make special image tokens attribute of tokenizer
#31967 commented on Jul 21, 2024 • 0 new comments
Added optimizer adam mini
#31933 commented on Jul 25, 2024 • 0 new comments
# Fix an error of 'ValueError: mean must have 1 elements if it is an iterable, got 3' for method 'infer_channel_dimension_format' in 'image_utils.py'
#31950 commented on Jul 20, 2024 • 0 new comments
[feat] Apply rope_scaling for general use in phi3, llama
#31966 commented on Jul 21, 2024 • 0 new comments
make `p_mask` a numpy array before passing to `select_starts_ends`
#32076 commented on Jul 26, 2024 • 0 new comments
avoid padding for `num_frames` in `AutomaticSpeechRecognitionPipeline`
#32074 commented on Jul 25, 2024 • 0 new comments
Rest of model init refactors
#31330 commented on Jul 25, 2024 • 0 new comments
Uniformize model processors
#31368 commented on Jul 24, 2024 • 0 new comments
Add Cross-Attention to Bloom Model for VisionEncoderDecoder Compatibility
#31432 commented on Jul 23, 2024 • 0 new comments
[WIP] Standardize inputs and outputs for existing image-text-to-text models
#32059 commented on Jul 25, 2024 • 0 new comments
add changes in mistral model to avoid problems in pytorch hooks
#31463 commented on Jul 19, 2024 • 0 new comments
FIX / Hub: Also catch for `exceptions.ConnectionError`
#31469 commented on Jul 22, 2024 • 0 new comments
Update beam_constraints with KMP
#31482 commented on Jul 19, 2024 • 0 new comments
MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input.
#31500 commented on Jul 25, 2024 • 0 new comments
docs: ko: tasks/awq.md
#32057 commented on Jul 25, 2024 • 0 new comments
add bnb support for Ascend NPU
#31512 commented on Jul 20, 2024 • 0 new comments
Fix mininal version check for object_detection.md
#31520 commented on Jul 21, 2024 • 0 new comments
Sequence Length Invariant Text Models
#31521 commented on Jul 21, 2024 • 0 new comments
fix wav2vec2 with torch.compile
#31538 commented on Jul 22, 2024 • 0 new comments
handle when from_pretrained_id is a list
#31541 commented on Jul 22, 2024 • 0 new comments
Optimize 1st token for beam_search
#31564 commented on Jul 24, 2024 • 0 new comments
Allow infer_framework_load_model to use the originally specified config.
#31580 commented on Jul 25, 2024 • 0 new comments
activation_checkpointing error when using --fsdp
#28499 commented on Jul 21, 2024 • 0 new comments
Using batching with pipeline and transformers
#31641 commented on Jul 22, 2024 • 0 new comments
Uniform kwargs for processors
#31911 commented on Jul 22, 2024 • 0 new comments
Potential Bug in llava_next when calling pack_image_features function.
#31529 commented on Jul 22, 2024 • 0 new comments
Incorrect docstring of `get_anyres_image_grid_shape`
#31588 commented on Jul 22, 2024 • 0 new comments
Problem with the masked language modeling tutorial
#31545 commented on Jul 22, 2024 • 0 new comments
Nested from_pretrained() gives warnings loading weights - "copying from a non-meta parameter"
#31544 commented on Jul 22, 2024 • 0 new comments
GenerationConfig throws Object is not JSON serializable when setting constraints
#31070 commented on Jul 22, 2024 • 0 new comments
Mismatched tensor size error when generating text with beam_search on mps
#30662 commented on Jul 22, 2024 • 0 new comments
transformers offline model loading is not working from version 4.40.0 for models without safetensors
#30469 commented on Jul 22, 2024 • 0 new comments
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 commented on Jul 22, 2024 • 0 new comments
Add support for Apple's DCLM-Baseline-7B model
#32000 commented on Jul 22, 2024 • 0 new comments
`MixtralFlashAttention2` subscripts `position_ids` before checking if it is `None`
#31326 commented on Jul 22, 2024 • 0 new comments
Dropout sync across GPUs causes major performance drops
#31412 commented on Jul 22, 2024 • 0 new comments
[BUG] Offline loading of non-safe tensors fails
#30920 commented on Jul 22, 2024 • 0 new comments
ddp_time in TrainingArguments with deepspeed doesn't take effect
#32036 commented on Jul 22, 2024 • 0 new comments
FineWeb SLM Training doesn't start
#31501 commented on Jul 22, 2024 • 0 new comments
Have `_is_peft_model` check if there's any peft submodule/Allow quantised training
#30878 commented on Jul 22, 2024 • 0 new comments
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (HF/Accelerate)
#31504 commented on Jul 22, 2024 • 0 new comments
Add Depth Anything v2 metric depth
#31972 commented on Jul 22, 2024 • 0 new comments
Add post_process_depth_estimation to image processors
#30917 commented on Jul 22, 2024 • 0 new comments
flash_attn ImportError breaking model loading (Florence-2-base-ft)
#31793 commented on Jul 22, 2024 • 0 new comments
Fixing Tensor Shape/Dimension Mismatch Errors in TimeSeries Transformer for Stock Price Prediction
#31556 commented on Jul 24, 2024 • 0 new comments
Batch is empty when fine-tuning flan-t5 using LoRA
#31357 commented on Jul 19, 2024 • 0 new comments
[Severe Bug] Performance Degradation Starting from v4.42.*
#31890 commented on Jul 19, 2024 • 0 new comments
Saved weights differ from the original model
#30543 commented on Jul 19, 2024 • 0 new comments
cannot use activation_checkpoint in torch native fsdp
#32073 commented on Jul 19, 2024 • 0 new comments
Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures
#28005 commented on Jul 19, 2024 • 0 new comments
ViTLayer.forward() needs to be in "eager" mode when `output_attentions=True`
#30978 commented on Jul 19, 2024 • 0 new comments
Add support for MiniCPM-V-2 and MiniCPM-Llama3-V-2_5
#31836 commented on Jul 19, 2024 • 0 new comments
More robust tests required for gradient checkpointing
#32063 commented on Jul 19, 2024 • 0 new comments
BertForSequenceClassification.from_pretrained broken when using FSDP
#32068 commented on Jul 19, 2024 • 0 new comments
Performance mismatch with best_epoch
#32075 commented on Jul 19, 2024 • 0 new comments
GPT-2 Model Logits and Loss are different on MPS
#32005 commented on Jul 19, 2024 • 0 new comments
如果在单个GPU上out of memory 如何用两个GPU加载推理同一个模型？
#31508 commented on Jul 20, 2024 • 0 new comments
load qwen2-72b-instruct sft awq q4_0 gguf ValueError: Trying to set a tensor of shape torch.Size
#31507 commented on Jul 20, 2024 • 0 new comments
ImportError: cannot import name 'logging' from 'huggingface_hub'
#31492 commented on Jul 20, 2024 • 0 new comments
run_clm.py AttributeError: 'NoneType' object has no attribute 'get'
#31487 commented on Jul 20, 2024 • 0 new comments
RecurrentGemma Doesn't Support left padding?
#31201 commented on Jul 20, 2024 • 0 new comments
Unable to load starcoder2 finetuned version getting quantization errors
#29990 commented on Jul 20, 2024 • 0 new comments
Using accelerate launch FDSP cause weight saved after 2nd time onwards to be incomplete
#31034 commented on Jul 20, 2024 • 0 new comments
Plans to Integrate LongRoPE into LLaMA?
#31992 commented on Jul 20, 2024 • 0 new comments
OOM when loading 300B models with `AutoModelForCausalLM.from_pretrained` and `BitsAndBytesConfig` quantization.
#31577 commented on Jul 21, 2024 • 0 new comments
from_pretrained 加载checkpoint过慢的问题
#31515 commented on Jul 21, 2024 • 0 new comments
GenerationMixin sample() runs forever
#31484 commented on Jul 21, 2024 • 0 new comments
error when convert llama1 ckpts to hf formath
#30723 commented on Jul 21, 2024 • 0 new comments
Make fx traced model with the use of `past_key_values` pickable again?
#30575 commented on Jul 21, 2024 • 0 new comments
bart-large-xsum model: There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight'].
#29128 commented on Jul 24, 2024 • 0 new comments
(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch
#26796 commented on Jul 24, 2024 • 0 new comments
NotImplementedError: Cannot copy out of meta tensor; no data when embedding to meta
#31560 commented on Jul 24, 2024 • 0 new comments
Idefics2 fine-tuning: Error when unscale_gradients called on FP16 gradients during training with transformers and accelerate
#30559 commented on Jul 24, 2024 • 0 new comments
Optimised 4bit inference kernels
#28568 commented on Jul 24, 2024 • 0 new comments
Bug in whisper word-level timestamps (`tokenizer._decode_asr`)
#31778 commented on Jul 24, 2024 • 0 new comments
Converting gguf fp16 & bf16 to hf is not supported.
#31762 commented on Jul 24, 2024 • 0 new comments
Improving memory efficiency further 🚀
#30860 commented on Jul 24, 2024 • 0 new comments
`Gemma2Model` not returning cache
#31981 commented on Jul 24, 2024 • 0 new comments
KV cache with CPU offloading
#30704 commented on Jul 24, 2024 • 0 new comments
Implement Cross Attention in LLAMA Model
#27285 commented on Jul 25, 2024 • 0 new comments
RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')
#31571 commented on Jul 25, 2024 • 0 new comments
Trainer: To keep unused columns for `compute_metrics`
#31570 commented on Jul 25, 2024 • 0 new comments
Tokenizers: Character encoding inconsistencies between __call__ and .convert_tokens_to_ids
#31438 commented on Jul 25, 2024 • 0 new comments
Whisper Translation on low resource languages
#30592 commented on Jul 25, 2024 • 0 new comments
`pip install accelerate` (and similar) error messages should specify min version
#31583 commented on Jul 25, 2024 • 0 new comments
Multi-GPU inference affects LLM's (Llama2-7b-chat-hf) generation.
#31582 commented on Jul 25, 2024 • 0 new comments
push_to_hub doesn't push checkpoint folder while training
#30141 commented on Jul 25, 2024 • 0 new comments
Bug version 4.42.4: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#32060 commented on Jul 26, 2024 • 0 new comments
Embedding class is replaced when calling `resize_token_embeddings`
#31835 commented on Jul 26, 2024 • 0 new comments
When max_steps < save_steps with deepspeed zero3 stage
#31624 commented on Jul 26, 2024 • 0 new comments
Unable to export Phi-3-vision model to PyTorch exported program
#31622 commented on Jul 26, 2024 • 0 new comments
HuggingFace GroundingDINO inference execution time is slower than the original groundingDINO (~100ms)
#31533 commented on Jul 26, 2024 • 0 new comments
Checkpoint validation as an option
#32067 commented on Jul 22, 2024 • 0 new comments
Whisper - get probability of detected language
#29293 commented on Jul 22, 2024 • 0 new comments
Support H100 training with FP8 in Trainer and Deepspeed
#25333 commented on Jul 22, 2024 • 0 new comments
Cannot export Deberta to TorchScript
#20815 commented on Jul 22, 2024 • 0 new comments
Index out of range when generate using optimum
#31551 commented on Jul 23, 2024 • 0 new comments
Error on fine tuning paligemma for object detection
#31528 commented on Jul 23, 2024 • 0 new comments
Mixtral's implementation of auxiliary loss seems incorrect
#31464 commented on Jul 23, 2024 • 0 new comments
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on Jul 23, 2024 • 0 new comments
DPT implementation contains unused parameters
#30633 commented on Jul 23, 2024 • 0 new comments
`test_encode_decode_fast_slow_all_tokens` is failing
#30045 commented on Jul 23, 2024 • 0 new comments
SDPA gives nans/infs during sampling on ROCM w/ float16
#30056 commented on Jul 23, 2024 • 0 new comments
Fail to load model without .safetensors file
#31552 commented on Jul 23, 2024 • 0 new comments
Skipping cudagraphs for unknown reason
#31645 commented on Jul 23, 2024 • 0 new comments
Training Evaluation Display on VSCode
#22694 commented on Jul 23, 2024 • 0 new comments
kwargs pop "attn_implement" twice in modeling_utils.py and configuration_utils.py when use AutoConfig/AutoModel
#32082 commented on Jul 23, 2024 • 0 new comments
NonMatchingSplitsSizesError on Flax BART with wiki summary dataset
#29596 commented on Jul 23, 2024 • 0 new comments
[flax_llama] Why is the return value of the `create_sinusoidal_positions` truncated by `num_pos`?
#29590 commented on Jul 23, 2024 • 0 new comments
FP8 inference and FP8 KV cache
#23660 commented on Jul 23, 2024 • 0 new comments
SeamlessM4TFeatureExtractor fails with pad_to_multiple_of not being a multiple of stride
#31916 commented on Jul 23, 2024 • 0 new comments
Add MistralForQuestionAnswering
#28908 commented on Jul 23, 2024 • 0 new comments
Flash Attention with Gemma 2
#31953 commented on Jul 23, 2024 • 0 new comments
static cache implementation is not compatible with attn_implementation==flash_attention_2
#32040 commented on Jul 23, 2024 • 0 new comments
Quantization support for heads and embeddings
#31474 commented on Jul 23, 2024 • 0 new comments
Race condition when loading models from local folders with custom code
#27421 commented on Jul 23, 2024 • 0 new comments