pytorch-labs / gpt-fast Public

Notifications You must be signed in to change notification settings
Fork 484
Star 5.4k

Code
Issues 63
Pull requests 35
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: pytorch-labs/gpt-fast

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

63 Open 30 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

tokenizer.model

#186 opened Jun 27, 2024 by hasakikiki

It doesn't accelerate very well at L4

#185 opened Jun 25, 2024 by songh11

getting different acceptance prob when using torch.compile after making a small change.

#184 opened Jun 22, 2024 by kalradivyanshu

Question about the ENABLE_INTRA_NODE_COMM for speculative decoding

#183 opened Jun 22, 2024 by jianc99

GGUF support?

#182 opened Jun 14, 2024 by yukiarimo

Hard-coded Llama-3 model name pattern matching breaks scripts/convert_hf_checkpoint.py

#177 opened May 31, 2024 by ephremw

Missing Keys in state_dict

#172 opened May 6, 2024 by bjohn22

Tensor Parallel Inside notebook

#167 opened Apr 29, 2024 by nivibilla

mmap issue in bf16 of gpt-fast

#165 opened Apr 28, 2024 by yanbing-j

Naming: n_local_heads -> n_kv_heads

#162 opened Apr 23, 2024 by ad8e

INT4 quantization not working on MI210

#154 opened Apr 8, 2024 by yafehlis

int8 Woq raise Codegen Error with --compile_prefill

#144 opened Mar 22, 2024 by yanbing-j

Question about large sequence length attention kernels

#140 opened Mar 19, 2024 by loubbrad

CUDA error if enabling compile_prefill for quantization model (int8)

#137 opened Mar 14, 2024 by yanboliang

int4/int4-gptq support in Mixtral 8x7B

#129 opened Mar 11, 2024 by yanbing-j

Reducing Latency in Application with Torch Compilation: Initialization and Inference Optimization

#127 opened Mar 8, 2024 by daniyal214

index out of range: No transformer config could be loaded

#126 opened Mar 8, 2024 by SinanAkkoyun

Int4 perplexity

#125 opened Mar 7, 2024 by SinanAkkoyun

Question about the gennerated code of WeightOnlyInt8Linear

#114 opened Feb 29, 2024 by feiyuvl

batching/dynamic batching

#112 opened Feb 27, 2024 by nivibilla

Try Tensor Parallel on a server equipped with two V100 linked by NVLINK, but got a performance degradation

#111 opened Feb 27, 2024 by duanzhaol

What happens to bias during int8 quantization?

#108 opened Feb 24, 2024 by gchhablani

Questions on Speculative Decoding in gpt-fast generate.py

#107 opened Feb 23, 2024 by hxer7963

Bandwidth achieved for INT8 is much smaller than FP16

#99 opened Feb 6, 2024 by yafehlis

I try to speed up with llava,but this it slower then eager mode,why?

#92 opened Jan 31, 2024 by bleedingfight

Previous 1 2 3 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly