Does anyone know if this work is implemented on llama? #18

wyxscir · 2024-01-25T08:30:55Z

Does anyone know if this work is implemented on llama? Or is there any similar dynamic pruning work on llama?

XieWeikai · 2024-02-19T16:00:23Z

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high.
A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

wyxscir · 2024-02-20T01:54:20Z

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

thank you

wyxscir · 2024-02-20T02:35:57Z

This method doesn't work very well on the llama. Llama uses the SiLU activation function, and its inherent sparsity is not very high. A work mentioned that it is possible to replace SiLU with ReLU and retrain llama to improve sparsity.

“ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models”，
What you mentioned maybe be this work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does anyone know if this work is implemented on llama? #18

Does anyone know if this work is implemented on llama? #18

wyxscir commented Jan 25, 2024

XieWeikai commented Feb 19, 2024 •

edited

Loading

wyxscir commented Feb 20, 2024

wyxscir commented Feb 20, 2024

Does anyone know if this work is implemented on llama? #18

Does anyone know if this work is implemented on llama? #18

Comments

wyxscir commented Jan 25, 2024

XieWeikai commented Feb 19, 2024 • edited Loading

wyxscir commented Feb 20, 2024

wyxscir commented Feb 20, 2024

XieWeikai commented Feb 19, 2024 •

edited

Loading