-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
3 Releases published by 1 person
-
v4.43.0 v4.43.0: Llama 3.1, Chameleon, ZoeDepth, Hiera
published
Jul 23, 2024 -
v4.43.1 v4.43.1: Patch release
published
Jul 23, 2024 -
v4.43.2 v4.43.2: Patch release
published
Jul 24, 2024
75 Pull requests merged by 44 people
-
Flash-Attn: fix generation when no attention mask or no pading
#32241 merged
Jul 26, 2024 -
[tests] fix
static
cache implementation is not compatible withattn_implementation==flash_attention_2
#32039 merged
Jul 26, 2024 -
Add check for
target_sizes is None
inpost_process_image_guided_detection
for owlv2#31934 merged
Jul 26, 2024 -
Adds: extra_repr for RMSNorm layers in most models
#32204 merged
Jul 26, 2024 -
Refactor: Removed un-necessary
object
base class#32230 merged
Jul 26, 2024 -
don't log base model architecture in wandb if log model is false
#32143 merged
Jul 26, 2024 -
Resize embeds with DeepSpeed
#32214 merged
Jul 26, 2024 -
Llava: generate without images
#32183 merged
Jul 26, 2024 -
Generation: stop at
eos
for assisted decoding#31301 merged
Jul 26, 2024 -
Fix code snippet for Grounding DINO
#32229 merged
Jul 25, 2024 -
translate philosophy.md to chinese
#32177 merged
Jul 25, 2024 -
Follow up for #31973
#32025 merged
Jul 25, 2024 -
[warnings] fix E721 warnings
#32223 merged
Jul 25, 2024 -
[BigBird Pegasus] set _supports_param_buffer_assignment to False
#32222 merged
Jul 25, 2024 -
Update question_answering.py
#32208 merged
Jul 25, 2024 -
remove unnecessary guard code related with pytorch versions 1.4.2 ~ 1.7.0
#32210 merged
Jul 25, 2024 -
[whisper] fix short-form output type
#32178 merged
Jul 25, 2024 -
fix: Replaced deprecated
unittest method
with the correct one#32198 merged
Jul 24, 2024 -
🚨 No more default chat templates
#31733 merged
Jul 24, 2024 -
Support dequantizing GGUF FP16 format
#31783 merged
Jul 24, 2024 -
Fix float8_e4m3fn in modeling_utils
#32193 merged
Jul 24, 2024 -
Fix resize embedding with Deepspeed
#32192 merged
Jul 24, 2024 -
let's not warn when someone is running a forward
#32176 merged
Jul 24, 2024 -
RoPE: relaxed rope validation
#32182 merged
Jul 24, 2024 -
Remove conversational pipeline tests
#32099 merged
Jul 24, 2024 -
Update qwen2.md
#32108 merged
Jul 24, 2024 -
fix: default value reflects the runtime environment variables rather than the ones present at import time.
#32153 merged
Jul 24, 2024 -
adds: extra_repr() to MambaRMSNorm to include hidden size / size of weights in the layer
#32171 merged
Jul 24, 2024 -
[docs] change temperature to a positive value
#32077 merged
Jul 23, 2024 -
fix: Fixed an if condition that is always evaluating to true
#32160 merged
Jul 23, 2024 -
fix
#32162 merged
Jul 23, 2024 -
Updated
ruff
to the latest version#31926 merged
Jul 23, 2024 -
Enhancing SFT Training Efficiency Using Packing and FlashAttention2 with Position IDs
#31629 merged
Jul 23, 2024 -
Added additional kwarg for successful running of optuna hyperparameter search
#31924 merged
Jul 23, 2024 -
feat(cache): StaticCache uses index_copy_ to avoid useless copy
#31857 merged
Jul 23, 2024 -
Fix typing to be compatible with later py versions
#32155 merged
Jul 23, 2024 -
Revert "Incorrect Whisper long-form decoding timestamps "
#32148 merged
Jul 23, 2024 -
Rename Phi-3 rope scaling type
#31436 merged
Jul 23, 2024 -
Added mamba.py backend
#30139 merged
Jul 23, 2024 -
Fix video batching to videollava
#32139 merged
Jul 23, 2024 -
Fix flash attention speed issue
#32028 merged
Jul 23, 2024 -
gguf conversion add_prefix_space=None for llama3
#31937 merged
Jul 23, 2024 -
Llama: RoPE refactor
#32135 merged
Jul 23, 2024 -
Modify resize_token_embeddings to ensure output type is same as input
#31979 merged
Jul 23, 2024 -
Disable quick init for TapasPreTrainedModel
#32149 merged
Jul 23, 2024 -
Add YaRN and Dynamic-YaRN RoPE Scaling Methods
#30910 merged
Jul 23, 2024 -
Add method to retrieve used chat template
#32032 merged
Jul 23, 2024 -
Fix mask creations of
GPTNeoX
andGPT2
#31944 merged
Jul 23, 2024 -
[modelling] remove un-necessary transpose for fa2 attention
#31749 merged
Jul 23, 2024 -
Remove
trust_remote_code
when loading Libri Dummy#31748 merged
Jul 23, 2024 -
LLaVaNeXT: pad on right if training
#32134 merged
Jul 23, 2024 -
Add llama3-llava-next-8b to llava_next conversion script
#31395 merged
Jul 23, 2024 -
Add new quant method
#32047 merged
Jul 22, 2024 -
set warning level to info for special tokens have been added
#32138 merged
Jul 22, 2024 -
Don't default to other weights file when use_safetensors=True
#31874 merged
Jul 22, 2024 -
Return assistant generated tokens mask in apply_chat_template
#30650 merged
Jul 22, 2024 -
[RoBERTa] Minor clarifications to model doc
#31949 merged
Jul 22, 2024 -
fix: Fixed raising
TypeError
instead ofValueError
for invalid type#32111 merged
Jul 22, 2024 -
Update
ko/_toctree.yml
and removecustom_tools.md
to reflect latest changes#31969 merged
Jul 22, 2024 -
Fix failing test with race condition
#32140 merged
Jul 22, 2024 -
[generate] fix eos/pad id check on mps devices
#31695 merged
Jul 22, 2024 -
Mention model_info.id instead of model_info.modelId
#32106 merged
Jul 22, 2024 -
fix: Replaced deprecated
mktemp()
function#32123 merged
Jul 22, 2024 -
Generate: store special token tensors under a unique variable name
#31980 merged
Jul 22, 2024 -
Fix shard order
#32023 merged
Jul 22, 2024 -
Agents planning
#31702 merged
Jul 22, 2024 -
Fix tests after
huggingface_hub
0.24#32054 merged
Jul 19, 2024 -
Chameleon: not supported with fast load
#32091 merged
Jul 19, 2024 -
Disable quick init for deepspeed
#32066 merged
Jul 19, 2024 -
Support generating with fallback for short form audio in Whisper
#30984 merged
Jul 19, 2024 -
Add image-text-to-text task guide
#31777 merged
Jul 19, 2024 -
Fixes to chameleon docs
#32078 merged
Jul 19, 2024 -
Fix progress callback deepcopy
#32070 merged
Jul 19, 2024 -
VideoLLaVa: fix chat format in docs
#32083 merged
Jul 19, 2024
41 Pull requests opened by 35 people
-
added warning to Trainer when label_names is not specified for PeftModel
#32085 opened
Jul 19, 2024 -
fix: SeamlessM4TFeatureExtractor stride remainder
#32088 opened
Jul 19, 2024 -
Add sdpa support for Albert
#32092 opened
Jul 19, 2024 -
Fix `.push_to_hub(..., create_pr=True, revision="my-branch")` when creating PR on not-owned repo
#32094 opened
Jul 19, 2024 -
Custom beam search scorer argument in generate function
#32097 opened
Jul 19, 2024 -
Create django.yml
#32100 opened
Jul 19, 2024 -
[wip][meta-llama][torch.compile] Fix issues with torch.compile
#32102 opened
Jul 19, 2024 -
support 3D attention mask in bert
#32105 opened
Jul 20, 2024 -
fix: resolve bug with `use_mps_device` setting not taking effect
#32114 opened
Jul 20, 2024 -
Update stateful_callbacks state before saving checkpoint
#32115 opened
Jul 20, 2024 -
Update benchmark.py-- Enhance Benchmarking with Multi-Commit Support …
#32116 opened
Jul 21, 2024 -
[WIP] Add Depth Anything V2 Metric models
#32126 opened
Jul 21, 2024 -
DINOv2 register support
#32127 opened
Jul 22, 2024 -
[whisper] alternative fix for long-form timestamps
#32131 opened
Jul 22, 2024 -
Add Qwen2-Audio
#32137 opened
Jul 22, 2024 -
add scaling_factor to GemmaRotaryEmbedding for fix error in GemmaLine…
#32141 opened
Jul 22, 2024 -
Add stream messages from agent run for gradio chatbot
#32142 opened
Jul 22, 2024 -
Cache: create docs
#32150 opened
Jul 23, 2024 -
[build-ci-image] add tiktoken
#32152 opened
Jul 23, 2024 -
Added error when sequence length is bigger than max_position_embeddings
#32156 opened
Jul 23, 2024 -
Test
#32158 opened
Jul 23, 2024 -
support copies
#32159 opened
Jul 23, 2024 -
Add a static cache that offloads to the CPU or other device
#32161 opened
Jul 23, 2024 -
Fixed Hybrid Cache Shape Initialization.
#32163 opened
Jul 23, 2024 -
Make static cache compatible with torch.export
#32168 opened
Jul 23, 2024 -
Uniformize kwargs for Layoutlm (2, 3, X) processors
#32180 opened
Jul 24, 2024 -
Uniformize kwargs for chameleon processor
#32181 opened
Jul 24, 2024 -
Gemma2 and flash-attention
#32188 opened
Jul 24, 2024 -
[WIP] - Enable speculative decoding with batch size >1
#32189 opened
Jul 24, 2024 -
Fix slow GemmaTokenizer and improve SPM slow -> fast conversion process
#32191 opened
Jul 24, 2024 -
warning about weight_g/weight_v missing on WeightNorm on PyTorch
#32194 opened
Jul 24, 2024 -
Whisper tokenizer word level timestamps
#32197 opened
Jul 24, 2024 -
Bump torch from 1.13.1 to 2.2.0 in /examples/research_projects/visual_bert
#32220 opened
Jul 25, 2024 -
Fix attention propagation for vision towers of llava-like models
#32221 opened
Jul 25, 2024 -
>3-5x faster torch.compile forward compilation for autoregressive decoder models
#32227 opened
Jul 25, 2024 -
[WIP] Add support for XTR
#32231 opened
Jul 25, 2024 -
fix: warmup_steps check for training_args
#32236 opened
Jul 26, 2024 -
VLMs: dispatch sdpa to each sub model
#32238 opened
Jul 26, 2024 -
🌐 [i18n-KO] Translated `image_feature_extraction.md` to Korean
#32239 opened
Jul 26, 2024 -
#32184 save total_vocab_size
#32240 opened
Jul 26, 2024
61 Issues closed by 23 people
-
Idefics2 generation erroring with flash_attention_2
#32237 closed
Jul 26, 2024 -
`target_sizes` in Owlv2 `post_process_image_guided_detection`
#31915 closed
Jul 26, 2024 -
Transformers 4.36 use_cache issue
#28056 closed
Jul 26, 2024 -
Model saving when using Trainer with Accelerate
#29792 closed
Jul 26, 2024 -
Running out of memory while finetuning and inferencing VideoMAE due to which script is being killed.
#30939 closed
Jul 26, 2024 -
Can't create transformer pipeline because pytorch failed to be detected
#31454 closed
Jul 26, 2024 -
ValueError: too many values to unpack (expected 2) when use glm4v or cogvlm2
#32226 closed
Jul 25, 2024 -
[Whisper] Inconsistent return types for Whisper generation
#32202 closed
Jul 25, 2024 -
🐛 `attn_implementation="sdpa"` slower than `BetterTransformer.transform`?
#31245 closed
Jul 25, 2024 -
pipeline gives a different result than the other approach in predicting word probability
#31995 closed
Jul 25, 2024 -
NaNs when running bitsandbytes quantized Chameleon
#32174 closed
Jul 25, 2024 -
Metrics - Pipeline
#32190 closed
Jul 24, 2024 -
The features calculated by transformers dinov2 are different from the official ones
#32175 closed
Jul 24, 2024 -
AttributeError: module 'torch' has no attribute 'float8_e4m3fn'
#32185 closed
Jul 24, 2024 -
Llama 3 - RuntimeError: shape '[-1, 0]' is invalid for input of size 41041920
#32170 closed
Jul 24, 2024 -
Backwards compatibility broken for RoPE: "rope_type"
#32166 closed
Jul 24, 2024 -
KeyError: 'rope_type'
#32167 closed
Jul 24, 2024 -
cannot import name 'Conversation' from 'transformers'
#32096 closed
Jul 24, 2024 -
callback to implement how the predictions should be stored
#32186 closed
Jul 24, 2024 -
Using `AutoTokenizer.from_pretrained`'s `.encode()` function fails to add BOS token for new Llama-3.1 model
#32172 closed
Jul 24, 2024 -
phi-3's LlamaTokenizer ignores newline character.
#32136 closed
Jul 24, 2024 -
Why MPS can never be used successfully?
#32035 closed
Jul 24, 2024 -
Unable to load wavlm-large from pretrained in offline mode
#32147 closed
Jul 23, 2024 -
Extra dataset features not passing to the custom collator
#32093 closed
Jul 23, 2024 -
Allow additional keyword args to be passed to optuna hyperparameter search
#31923 closed
Jul 23, 2024 -
The behavior of the tokenizer loaded from GGUF file is incorrect.
#31630 closed
Jul 23, 2024 -
Table question answering pipeline failing to save
#32128 closed
Jul 23, 2024 -
Very different output depending on whether an attention mask is passed when using caching
#31943 closed
Jul 23, 2024 -
AttributeError: 'BertModel' object has no attribute 'attn_implementation'
#30965 closed
Jul 23, 2024 -
MultiScaleDeformableAttentionFunction different results on different devices
#31399 closed
Jul 23, 2024 -
LlavaNextVideo always assumes left padding when batch size is 1
#32112 closed
Jul 23, 2024 -
Add llama3-llava-next-8b to convert_llava_next_weights_to_hf.py
#31394 closed
Jul 23, 2024 -
"use_safetensors" not enforced with "local_files_only", loads bin file
#31649 closed
Jul 22, 2024 -
Source link to `LlamaForSequenceClassification` seems broken, if so, update it.
#31531 closed
Jul 22, 2024 -
Licence
#32104 closed
Jul 22, 2024 -
Gemma-7b model set my own lm_head but cannot saved and changed pretrained embedding_layer's weights too.
#31467 closed
Jul 22, 2024 -
Metadata
#32121 closed
Jul 22, 2024 -
When tranining the RWKV, it report "backward error"
#31413 closed
Jul 22, 2024 -
Transformer compatibility Python 3.9, ComfyUI
#31806 closed
Jul 22, 2024 -
Model load when dtypes match is broken
#32089 closed
Jul 22, 2024 -
unexpected keyword argument 'torch_empty_cache_steps' in TrainingArguments
#32071 closed
Jul 22, 2024 -
HuggingFace `from_pretrained()` called multiple times when launching with `torchrun`
#29843 closed
Jul 21, 2024 -
recent version of Transformers seems to mess with forward/__call__. Breaks patching loss function
#30753 closed
Jul 21, 2024 -
Trained tokenizer has broken encoding for cyrillic
#30937 closed
Jul 21, 2024 -
sdpa for bert causes nan output when mix-precision enabled.
#31038 closed
Jul 21, 2024 -
Using a single 'RecurrentGemmaRglru' layer - "Trying to backward through the graph a second time" Error
#31324 closed
Jul 21, 2024 -
https://github.com/VikParuchuri/surya can not convert model to onnx
#31384 closed
Jul 21, 2024 -
past_key_values for SeamlessM4Tv2ForSpeechToText is not working as expected
#29139 closed
Jul 20, 2024 -
`num_input_tokens_seen` included the `pad` tokens if sample padding strategy used
#29889 closed
Jul 20, 2024 -
stop_strings not work at all
#31363 closed
Jul 20, 2024 -
loading AutoTokenizer.from_pretrained with gated model
#31367 closed
Jul 20, 2024 -
TypeError: Block.forward() got an unexpected keyword argument 'past_key_value'
#31371 closed
Jul 20, 2024 -
linear_sum_assignment error in the object_detection.py guide
#31461 closed
Jul 20, 2024 -
Support generating with fallback for short form audio in Whisper
#29508 closed
Jul 19, 2024 -
Can you please provide:
#32087 closed
Jul 19, 2024 -
The ProgressCallback triggers a `cannot pickle '_thread.lock' object` failure
#32064 closed
Jul 19, 2024
48 Issues opened by 45 people
-
A question about code on Mistral-7B attention
#32235 opened
Jul 26, 2024 -
LLaVA cannot use beam search after 4.43.0
#32234 opened
Jul 26, 2024 -
SinkCache with Qwen1.5 broken in 4.43.0+
#32233 opened
Jul 25, 2024 -
can't load the llama-3.1-8b-instruct model
#32232 opened
Jul 25, 2024 -
[Whisper] Attention mask not detected in `Whisper.generate()`
#32228 opened
Jul 25, 2024 -
Add New Optimizer
#32225 opened
Jul 25, 2024 -
`BarkModel` can't be saved anymore
#32224 opened
Jul 25, 2024 -
flashattention3
#32219 opened
Jul 25, 2024 -
Parallel inference on generative models throws an exception
#32217 opened
Jul 25, 2024 -
auto_find_batch_size for OOM during evaluation
#32215 opened
Jul 25, 2024 -
Chat Assistant Prefill
#32213 opened
Jul 25, 2024 -
get error when running the chatglm3: 'GenerationConfig' object has no attribute '_eos_token_tensor'
#32207 opened
Jul 25, 2024 -
Does GroundingDINO support batched inference?
#32206 opened
Jul 25, 2024 -
Broken accuracy on LLaMa 3.1 70B -- worse than even 8B
#32205 opened
Jul 24, 2024 -
Cannot build documentation on Mac OS
#32203 opened
Jul 24, 2024 -
Load Phi 3 small on Nvidia Tesla V100 - Flash Attention
#32201 opened
Jul 24, 2024 -
Support `from_pretrained` of `FlaxPretrainedModel` from sharded `.safetensors` weights
#32200 opened
Jul 24, 2024 -
Model loading is uneven on GPUs with AutomodelforCasualLM
#32199 opened
Jul 24, 2024 -
error occur in the resize_embedding
#32196 opened
Jul 24, 2024 -
"inverted" form required for 4D masking not defined / 4D attention masks breaks with transformers >=4.40
#32195 opened
Jul 24, 2024 -
error occured while running _compute_llama3_parameters in modeling_repe_utils.py with torch.device('meta').
#32187 opened
Jul 24, 2024 -
DataCollatorForLanguageModeling is (unnecessary) slow
#32184 opened
Jul 24, 2024 -
Static KV cache with CPU offloading
#32179 opened
Jul 24, 2024 -
`dataloader_prefetch_factor` is left unused for datasets of type `IterableDataset`
#32169 opened
Jul 23, 2024 -
Enable speculative decoding with batch size >1
#32165 opened
Jul 23, 2024 -
Add Matching Anything by Segmenting Anything (MASA) MOT tracking model
#32164 opened
Jul 23, 2024 -
Adding warnings or errors when provided sequence length is bigger than config.max_position_embeddings
#32154 opened
Jul 23, 2024 -
[i18n-<languageCode>] Translating docs to <languageName>
#32146 opened
Jul 22, 2024 -
callback to implement how the predictions should be stored.
#32145 opened
Jul 22, 2024 -
Wav2Vec2ProcessorWithLM doesn't handle unknown token well for BPE
#32132 opened
Jul 22, 2024 -
Is apply_chat_template support function call usage?
#32130 opened
Jul 22, 2024 -
No module named 'transformers.modeling_flash_attention_utils'
#32129 opened
Jul 22, 2024 -
TF Lite model created from TFWhisperForConditionalGeneration.from_pretrained craches
#32125 opened
Jul 21, 2024 -
Output from model.Generate & model.forward not same when output attention/hidden_state is True
#32117 opened
Jul 21, 2024 -
_prepare_4d_causal_attention_mask mask inversion should work boolean masks
#32113 opened
Jul 20, 2024 -
Gemma template won't end with eos_token
#32110 opened
Jul 20, 2024 -
Using Trainer + a pretrained tokenizer + 4D attention mask is extremely slow
#32101 opened
Jul 19, 2024 -
Unrecognized configuration class ChameleonConfig
#32098 opened
Jul 19, 2024 -
max_length calculation for padding the generation outputs in the Seq2SeqTrainer prediction_step function
#32095 opened
Jul 19, 2024 -
[Error] with Trainer: TypeError: Unsupported types (<class 'NoneType'>) passed to `_gpu_broadcast_one`.
#32090 opened
Jul 19, 2024 -
The implementations of `LlamaAttention` and `LlamaSdpaAttention` are not equivalent.
#32086 opened
Jul 19, 2024 -
Training multiple adapters
#32084 opened
Jul 19, 2024
189 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[WIP] - Add Microsoft CLAP model
#31929 commented on
Jul 25, 2024 • 52 new comments -
Add DAB-DETR Object detection/segmentation model
#30803 commented on
Jul 24, 2024 • 48 new comments -
Improve support for image generation with Chameleon & Anole
#32013 commented on
Jul 24, 2024 • 43 new comments -
Add GLM-4 and Later GLM Model (Draft)
#31977 commented on
Jul 26, 2024 • 41 new comments -
Support Kosmos-2.5
#31711 commented on
Jul 25, 2024 • 36 new comments -
[WIP] Add OmDet-Turbo
#31843 commented on
Jul 23, 2024 • 35 new comments -
Uniform kwargs for processors + Docs update - GroundingDINO
#31964 commented on
Jul 25, 2024 • 31 new comments -
Import structure & first three model refactors
#31329 commented on
Jul 25, 2024 • 25 new comments -
[GroundingDino] Fix grounding dino loss 🚨
#31828 commented on
Jul 23, 2024 • 24 new comments -
🚨 Bloom support for cache class
#31445 commented on
Jul 26, 2024 • 22 new comments -
Support reading tiktoken tokenizer.model file
#31656 commented on
Jul 23, 2024 • 18 new comments -
RoPE: model-agnostic RoPE refactor
#31999 commented on
Jul 22, 2024 • 13 new comments -
Add codestral mamba2
#32080 commented on
Jul 26, 2024 • 12 new comments -
Add Nemotron HF Support
#31699 commented on
Jul 26, 2024 • 11 new comments -
Offloaded KV Cache
#31325 commented on
Jul 25, 2024 • 8 new comments -
Granite language models
#31502 commented on
Jul 24, 2024 • 8 new comments -
Implement SuperGlue model
#29886 commented on
Jul 20, 2024 • 7 new comments -
Add Descript-Audio-Codec model
#31494 commented on
Jul 26, 2024 • 7 new comments -
Add DINOv2 with registers
#31832 commented on
Jul 23, 2024 • 5 new comments -
_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules
#30884 commented on
Jul 22, 2024 • 3 new comments -
Interpolate clip
#31900 commented on
Jul 22, 2024 • 3 new comments -
[pipeline] fix padding for 1-d tensors
#31776 commented on
Jul 25, 2024 • 3 new comments -
[whisper] compile compatibility with long-form decoding
#31772 commented on
Jul 25, 2024 • 3 new comments -
clean_up_tokenization_spaces=False if unset
#31938 commented on
Jul 25, 2024 • 2 new comments -
Add ViTPose
#30530 commented on
Jul 25, 2024 • 2 new comments -
Cache: new Cache format in decoder-only models
#31421 commented on
Jul 26, 2024 • 2 new comments -
Fix conflicting key in init kwargs in PreTrainedTokenizerBase
#31233 commented on
Jul 23, 2024 • 2 new comments -
Implement MambaForSequenceClassification
#31155 commented on
Jul 23, 2024 • 2 new comments -
Add Flax Dinov2
#31960 commented on
Jul 26, 2024 • 1 new comment -
SPLIT PR: eos bos tokens
#31316 commented on
Jul 22, 2024 • 1 new comment -
[RoBERTa-based] Add support for sdpa
#30510 commented on
Jul 22, 2024 • 1 new comment -
Add Zamba
#30950 commented on
Jul 24, 2024 • 0 new comments -
Add IRIS
#30883 commented on
Jul 25, 2024 • 0 new comments -
fix: multilingual midel convert to tflite get wrong token
#32079 commented on
Jul 19, 2024 • 0 new comments -
fixes clip interpolate
#30783 commented on
Jul 20, 2024 • 0 new comments -
pipeline 'text-classification' in >=4.40.0 throwing TypeError: Got unsupported ScalarType BFloat16
#30542 commented on
Jul 21, 2024 • 0 new comments -
filter flash_attn optional imports loading remote code
#30954 commented on
Jul 22, 2024 • 0 new comments -
[WIP] enable cpu bnb path
#31098 commented on
Jul 22, 2024 • 0 new comments -
Fix perceiver latent initialization modeling_idefics2.py
#31151 commented on
Jul 26, 2024 • 0 new comments -
[WIP] Add Tokenizer for MyT5 Model
#31286 commented on
Jul 22, 2024 • 0 new comments -
Reducing memory usage: removing useless logits computation in generate()
#31292 commented on
Jul 24, 2024 • 0 new comments -
fix: [whisper] don't overwrite GenerationConfig's `return_timestamps` when `return_timestamps` is not passed to `generate` function
#31296 commented on
Jul 26, 2024 • 0 new comments -
Added HHCache class implementing H2O Cache
#31623 commented on
Jul 26, 2024 • 0 new comments -
The last ut test of the QDQBert model ”test_inference_no_head_absolute_embedding” did not pass when using official safetensors
#31486 commented on
Jul 26, 2024 • 0 new comments -
"from_pretrained" read wrong config file. not "tokenizer_config.json", but "config.json"
#31282 commented on
Jul 26, 2024 • 0 new comments -
Error During Training with PatchTSMixerForTimeSeriesClassification for Time Series Classification
#30614 commented on
Jul 26, 2024 • 0 new comments -
Inconsitent module names (state_dict keys).
#30124 commented on
Jul 26, 2024 • 0 new comments -
Inconsistent special_token addition in EncoderDecoderModel forward pass
#31729 commented on
Jul 26, 2024 • 0 new comments -
Adding mixtral attention_bias in style of llama modeling
#28440 commented on
Jul 26, 2024 • 0 new comments -
Title: CUDA RuntimeError: Unspecified Launch Failure during Training
#30913 commented on
Jul 26, 2024 • 0 new comments -
Weights of LlamaForQuestionAnswering were not initialized from the model checkpoint
#30381 commented on
Jul 26, 2024 • 0 new comments -
[WIP] Improve multimodal processors - rely less on kwargs
#28711 commented on
Jul 24, 2024 • 0 new comments -
🚨 Add Blip2ForImageTextRetrieval
#29261 commented on
Jul 24, 2024 • 0 new comments -
Fix from pretrained ignoring errors
#29959 commented on
Jul 26, 2024 • 0 new comments -
fix prompt tunning + deepspeed zero3 + checkpoint_saving hang issue
#29980 commented on
Jul 23, 2024 • 0 new comments -
schedulefree optimizers
#30079 commented on
Jul 23, 2024 • 0 new comments -
Add SDPA support for T5 Style Models
#30375 commented on
Jul 20, 2024 • 0 new comments -
Add trainer integration test for llava to ensure accelerate autocasting works correctly
#30489 commented on
Jul 19, 2024 • 0 new comments -
update based on tokenizers release
#30574 commented on
Jul 23, 2024 • 0 new comments -
Adding imagebind
#30690 commented on
Jul 23, 2024 • 0 new comments -
Remove device map for saving tokenizer config on TPU (fix for issue #31971)
#32043 commented on
Jul 19, 2024 • 0 new comments -
[`WIP`] Add Mamba2
#32027 commented on
Jul 20, 2024 • 0 new comments -
HFQuantizer implementation for compressed-tensors library
#31704 commented on
Jul 25, 2024 • 0 new comments -
[Demo][ExecuTorch] Lower and run native Gemma e2e in ExecuTorch
#31706 commented on
Jul 24, 2024 • 0 new comments -
Update kwargs validation for `preprocess` with decorator
#32024 commented on
Jul 24, 2024 • 0 new comments -
[WIP] Agents use grammar
#31735 commented on
Jul 25, 2024 • 0 new comments -
[docs] Redesign
#31757 commented on
Jul 25, 2024 • 0 new comments -
Adding mplugdocowl
#31792 commented on
Jul 25, 2024 • 0 new comments -
chore: move `conftest.py` to `tests/`
#32011 commented on
Jul 25, 2024 • 0 new comments -
Enable whisper encoder to accept any chunk length
#31991 commented on
Jul 25, 2024 • 0 new comments -
Add support for GGUF Phi-3
#31844 commented on
Jul 22, 2024 • 0 new comments -
Add cosine_with_min_lr_schedule_with_warmup_lr_rate scheduler in Trainer
#31870 commented on
Jul 24, 2024 • 0 new comments -
Add SOLO: A Single Transformer for Scalable Vision-Language Modeling
#31918 commented on
Jul 20, 2024 • 0 new comments -
Make special image tokens attribute of tokenizer
#31967 commented on
Jul 21, 2024 • 0 new comments -
Added optimizer adam mini
#31933 commented on
Jul 25, 2024 • 0 new comments -
# Fix an error of 'ValueError: mean must have 1 elements if it is an iterable, got 3' for method 'infer_channel_dimension_format' in 'image_utils.py'
#31950 commented on
Jul 20, 2024 • 0 new comments -
[feat] Apply rope_scaling for general use in phi3, llama
#31966 commented on
Jul 21, 2024 • 0 new comments -
make `p_mask` a numpy array before passing to `select_starts_ends`
#32076 commented on
Jul 26, 2024 • 0 new comments -
avoid padding for `num_frames` in `AutomaticSpeechRecognitionPipeline`
#32074 commented on
Jul 25, 2024 • 0 new comments -
Rest of model init refactors
#31330 commented on
Jul 25, 2024 • 0 new comments -
Uniformize model processors
#31368 commented on
Jul 24, 2024 • 0 new comments -
Add Cross-Attention to Bloom Model for VisionEncoderDecoder Compatibility
#31432 commented on
Jul 23, 2024 • 0 new comments -
[WIP] Standardize inputs and outputs for existing image-text-to-text models
#32059 commented on
Jul 25, 2024 • 0 new comments -
add changes in mistral model to avoid problems in pytorch hooks
#31463 commented on
Jul 19, 2024 • 0 new comments -
FIX / Hub: Also catch for `exceptions.ConnectionError`
#31469 commented on
Jul 22, 2024 • 0 new comments -
Update beam_constraints with KMP
#31482 commented on
Jul 19, 2024 • 0 new comments -
MixtralFlashAttention2: put "plus 1" inside parentheses when calculating rotary_seq_len, allowing None position_ids input.
#31500 commented on
Jul 25, 2024 • 0 new comments -
docs: ko: tasks/awq.md
#32057 commented on
Jul 25, 2024 • 0 new comments -
add bnb support for Ascend NPU
#31512 commented on
Jul 20, 2024 • 0 new comments -
Fix mininal version check for object_detection.md
#31520 commented on
Jul 21, 2024 • 0 new comments -
Sequence Length Invariant Text Models
#31521 commented on
Jul 21, 2024 • 0 new comments -
fix wav2vec2 with torch.compile
#31538 commented on
Jul 22, 2024 • 0 new comments -
handle when from_pretrained_id is a list
#31541 commented on
Jul 22, 2024 • 0 new comments -
Optimize 1st token for beam_search
#31564 commented on
Jul 24, 2024 • 0 new comments -
Allow infer_framework_load_model to use the originally specified config.
#31580 commented on
Jul 25, 2024 • 0 new comments -
activation_checkpointing error when using --fsdp
#28499 commented on
Jul 21, 2024 • 0 new comments -
Using batching with pipeline and transformers
#31641 commented on
Jul 22, 2024 • 0 new comments -
Uniform kwargs for processors
#31911 commented on
Jul 22, 2024 • 0 new comments -
Potential Bug in llava_next when calling pack_image_features function.
#31529 commented on
Jul 22, 2024 • 0 new comments -
Incorrect docstring of `get_anyres_image_grid_shape`
#31588 commented on
Jul 22, 2024 • 0 new comments -
Problem with the masked language modeling tutorial
#31545 commented on
Jul 22, 2024 • 0 new comments -
Nested from_pretrained() gives warnings loading weights - "copying from a non-meta parameter"
#31544 commented on
Jul 22, 2024 • 0 new comments -
GenerationConfig throws Object is not JSON serializable when setting constraints
#31070 commented on
Jul 22, 2024 • 0 new comments -
Mismatched tensor size error when generating text with beam_search on mps
#30662 commented on
Jul 22, 2024 • 0 new comments -
transformers offline model loading is not working from version 4.40.0 for models without safetensors
#30469 commented on
Jul 22, 2024 • 0 new comments -
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 commented on
Jul 22, 2024 • 0 new comments -
Add support for Apple's DCLM-Baseline-7B model
#32000 commented on
Jul 22, 2024 • 0 new comments -
`MixtralFlashAttention2` subscripts `position_ids` before checking if it is `None`
#31326 commented on
Jul 22, 2024 • 0 new comments -
Dropout sync across GPUs causes major performance drops
#31412 commented on
Jul 22, 2024 • 0 new comments -
[BUG] Offline loading of non-safe tensors fails
#30920 commented on
Jul 22, 2024 • 0 new comments -
ddp_time in TrainingArguments with deepspeed doesn't take effect
#32036 commented on
Jul 22, 2024 • 0 new comments -
FineWeb SLM Training doesn't start
#31501 commented on
Jul 22, 2024 • 0 new comments -
Have `_is_peft_model` check if there's any peft submodule/Allow quantised training
#30878 commented on
Jul 22, 2024 • 0 new comments -
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (HF/Accelerate)
#31504 commented on
Jul 22, 2024 • 0 new comments -
Add Depth Anything v2 metric depth
#31972 commented on
Jul 22, 2024 • 0 new comments -
Add post_process_depth_estimation to image processors
#30917 commented on
Jul 22, 2024 • 0 new comments -
flash_attn ImportError breaking model loading (Florence-2-base-ft)
#31793 commented on
Jul 22, 2024 • 0 new comments -
Fixing Tensor Shape/Dimension Mismatch Errors in TimeSeries Transformer for Stock Price Prediction
#31556 commented on
Jul 24, 2024 • 0 new comments -
Batch is empty when fine-tuning flan-t5 using LoRA
#31357 commented on
Jul 19, 2024 • 0 new comments -
[Severe Bug] Performance Degradation Starting from v4.42.*
#31890 commented on
Jul 19, 2024 • 0 new comments -
Saved weights differ from the original model
#30543 commented on
Jul 19, 2024 • 0 new comments -
cannot use activation_checkpoint in torch native fsdp
#32073 commented on
Jul 19, 2024 • 0 new comments -
Open to contribution: adding `torch.nn.functional.scaled_dot_product_attention` support for more architectures
#28005 commented on
Jul 19, 2024 • 0 new comments -
ViTLayer.forward() needs to be in "eager" mode when `output_attentions=True`
#30978 commented on
Jul 19, 2024 • 0 new comments -
Add support for MiniCPM-V-2 and MiniCPM-Llama3-V-2_5
#31836 commented on
Jul 19, 2024 • 0 new comments -
More robust tests required for gradient checkpointing
#32063 commented on
Jul 19, 2024 • 0 new comments -
BertForSequenceClassification.from_pretrained broken when using FSDP
#32068 commented on
Jul 19, 2024 • 0 new comments -
Performance mismatch with best_epoch
#32075 commented on
Jul 19, 2024 • 0 new comments -
GPT-2 Model Logits and Loss are different on MPS
#32005 commented on
Jul 19, 2024 • 0 new comments -
如果在单个GPU上out of memory 如何用两个GPU加载推理同一个模型?
#31508 commented on
Jul 20, 2024 • 0 new comments -
load qwen2-72b-instruct sft awq q4_0 gguf ValueError: Trying to set a tensor of shape torch.Size
#31507 commented on
Jul 20, 2024 • 0 new comments -
ImportError: cannot import name 'logging' from 'huggingface_hub'
#31492 commented on
Jul 20, 2024 • 0 new comments -
run_clm.py AttributeError: 'NoneType' object has no attribute 'get'
#31487 commented on
Jul 20, 2024 • 0 new comments -
RecurrentGemma Doesn't Support left padding?
#31201 commented on
Jul 20, 2024 • 0 new comments -
Unable to load starcoder2 finetuned version getting quantization errors
#29990 commented on
Jul 20, 2024 • 0 new comments -
Using accelerate launch FDSP cause weight saved after 2nd time onwards to be incomplete
#31034 commented on
Jul 20, 2024 • 0 new comments -
Plans to Integrate LongRoPE into LLaMA?
#31992 commented on
Jul 20, 2024 • 0 new comments -
OOM when loading 300B models with `AutoModelForCausalLM.from_pretrained` and `BitsAndBytesConfig` quantization.
#31577 commented on
Jul 21, 2024 • 0 new comments -
from_pretrained 加载checkpoint过慢的问题
#31515 commented on
Jul 21, 2024 • 0 new comments -
GenerationMixin sample() runs forever
#31484 commented on
Jul 21, 2024 • 0 new comments -
error when convert llama1 ckpts to hf formath
#30723 commented on
Jul 21, 2024 • 0 new comments -
Make fx traced model with the use of `past_key_values` pickable again?
#30575 commented on
Jul 21, 2024 • 0 new comments -
bart-large-xsum model: There were missing keys in the checkpoint model loaded: ['model.encoder.embed_tokens.weight', 'model.decoder.embed_tokens.weight', 'lm_head.weight'].
#29128 commented on
Jul 24, 2024 • 0 new comments -
(False?) warning about weight_g/weight_v missing on WeightNorm on PyTorch
#26796 commented on
Jul 24, 2024 • 0 new comments -
NotImplementedError: Cannot copy out of meta tensor; no data when embedding to meta
#31560 commented on
Jul 24, 2024 • 0 new comments -
Idefics2 fine-tuning: Error when unscale_gradients called on FP16 gradients during training with transformers and accelerate
#30559 commented on
Jul 24, 2024 • 0 new comments -
Optimised 4bit inference kernels
#28568 commented on
Jul 24, 2024 • 0 new comments -
Bug in whisper word-level timestamps (`tokenizer._decode_asr`)
#31778 commented on
Jul 24, 2024 • 0 new comments -
Converting gguf fp16 & bf16 to hf is not supported.
#31762 commented on
Jul 24, 2024 • 0 new comments -
Improving memory efficiency further 🚀
#30860 commented on
Jul 24, 2024 • 0 new comments -
`Gemma2Model` not returning cache
#31981 commented on
Jul 24, 2024 • 0 new comments -
KV cache with CPU offloading
#30704 commented on
Jul 24, 2024 • 0 new comments -
Implement Cross Attention in LLAMA Model
#27285 commented on
Jul 25, 2024 • 0 new comments -
RuntimeError: slow_conv2d_forward_mps: input(device='cpu') and weight(device=mps:0')
#31571 commented on
Jul 25, 2024 • 0 new comments -
Trainer: To keep unused columns for `compute_metrics`
#31570 commented on
Jul 25, 2024 • 0 new comments -
Tokenizers: Character encoding inconsistencies between __call__ and .convert_tokens_to_ids
#31438 commented on
Jul 25, 2024 • 0 new comments -
Whisper Translation on low resource languages
#30592 commented on
Jul 25, 2024 • 0 new comments -
`pip install accelerate` (and similar) error messages should specify min version
#31583 commented on
Jul 25, 2024 • 0 new comments -
Multi-GPU inference affects LLM's (Llama2-7b-chat-hf) generation.
#31582 commented on
Jul 25, 2024 • 0 new comments -
push_to_hub doesn't push checkpoint folder while training
#30141 commented on
Jul 25, 2024 • 0 new comments -
Bug version 4.42.4: KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
#32060 commented on
Jul 26, 2024 • 0 new comments -
Embedding class is replaced when calling `resize_token_embeddings`
#31835 commented on
Jul 26, 2024 • 0 new comments -
When max_steps < save_steps with deepspeed zero3 stage
#31624 commented on
Jul 26, 2024 • 0 new comments -
Unable to export Phi-3-vision model to PyTorch exported program
#31622 commented on
Jul 26, 2024 • 0 new comments -
HuggingFace GroundingDINO inference execution time is slower than the original groundingDINO (~100ms)
#31533 commented on
Jul 26, 2024 • 0 new comments -
Checkpoint validation as an option
#32067 commented on
Jul 22, 2024 • 0 new comments -
Whisper - get probability of detected language
#29293 commented on
Jul 22, 2024 • 0 new comments -
Support H100 training with FP8 in Trainer and Deepspeed
#25333 commented on
Jul 22, 2024 • 0 new comments -
Cannot export Deberta to TorchScript
#20815 commented on
Jul 22, 2024 • 0 new comments -
Index out of range when generate using optimum
#31551 commented on
Jul 23, 2024 • 0 new comments -
Error on fine tuning paligemma for object detection
#31528 commented on
Jul 23, 2024 • 0 new comments -
Mixtral's implementation of auxiliary loss seems incorrect
#31464 commented on
Jul 23, 2024 • 0 new comments -
rework `test_multi_gpu_data_parallel_forward`
#31087 commented on
Jul 23, 2024 • 0 new comments -
DPT implementation contains unused parameters
#30633 commented on
Jul 23, 2024 • 0 new comments -
`test_encode_decode_fast_slow_all_tokens` is failing
#30045 commented on
Jul 23, 2024 • 0 new comments -
SDPA gives nans/infs during sampling on ROCM w/ float16
#30056 commented on
Jul 23, 2024 • 0 new comments -
Fail to load model without .safetensors file
#31552 commented on
Jul 23, 2024 • 0 new comments -
Skipping cudagraphs for unknown reason
#31645 commented on
Jul 23, 2024 • 0 new comments -
Training Evaluation Display on VSCode
#22694 commented on
Jul 23, 2024 • 0 new comments -
kwargs pop "attn_implement" twice in modeling_utils.py and configuration_utils.py when use AutoConfig/AutoModel
#32082 commented on
Jul 23, 2024 • 0 new comments -
NonMatchingSplitsSizesError on Flax BART with wiki summary dataset
#29596 commented on
Jul 23, 2024 • 0 new comments -
[flax_llama] Why is the return value of the `create_sinusoidal_positions` truncated by `num_pos`?
#29590 commented on
Jul 23, 2024 • 0 new comments -
FP8 inference and FP8 KV cache
#23660 commented on
Jul 23, 2024 • 0 new comments -
SeamlessM4TFeatureExtractor fails with pad_to_multiple_of not being a multiple of stride
#31916 commented on
Jul 23, 2024 • 0 new comments -
Add MistralForQuestionAnswering
#28908 commented on
Jul 23, 2024 • 0 new comments -
Flash Attention with Gemma 2
#31953 commented on
Jul 23, 2024 • 0 new comments -
static cache implementation is not compatible with attn_implementation==flash_attention_2
#32040 commented on
Jul 23, 2024 • 0 new comments -
Quantization support for heads and embeddings
#31474 commented on
Jul 23, 2024 • 0 new comments -
Race condition when loading models from local folders with custom code
#27421 commented on
Jul 23, 2024 • 0 new comments