{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":694951609,"defaultBranch":"main","name":"text-generation-inference","ownerLogin":"zhangsibo1129","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2023-09-22T03:14:58.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/134488188?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1696838695.0","currentOid":""},"activityList":{"items":[{"before":"b16ba86d414bd6d25552e926481b30d94cd151c7","after":"d0463ce151d06dee1d3314faf4e018f57579271e","ref":"refs/heads/add_npu_support","pushedAt":"2023-10-09T11:57:18.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"add NPU support","shortMessageHtmlLink":"add NPU support"}},{"before":"5be4046886a49aa608a8a0290fa4a5bdfbc0c8ef","after":"b16ba86d414bd6d25552e926481b30d94cd151c7","ref":"refs/heads/add_npu_support","pushedAt":"2023-10-09T11:44:41.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"add NPU support","shortMessageHtmlLink":"add NPU support"}},{"before":"25bd6b2384b03289863383a43c5240a4058bba60","after":"5be4046886a49aa608a8a0290fa4a5bdfbc0c8ef","ref":"refs/heads/add_npu_support","pushedAt":"2023-10-09T11:24:33.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"add NPU support","shortMessageHtmlLink":"add NPU support"}},{"before":null,"after":"25bd6b2384b03289863383a43c5240a4058bba60","ref":"refs/heads/add_npu_support","pushedAt":"2023-10-09T08:04:55.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"add NPU support","shortMessageHtmlLink":"add NPU support"}},{"before":"c5de7cd88679bc0331185c9cee75e4f68412243d","after":"00b8f36fba62e457ff143cce35564ac6704db860","ref":"refs/heads/main","pushedAt":"2023-10-09T07:15:19.000Z","pushType":"push","commitsCount":32,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"Prepare for v1.1.1 (#1100)\n\n# What does this PR do?\r\n\r\n\r\n\r\n\r\n\r\nFixes # (issue)\r\n\r\n\r\n## Before submitting\r\n- [ ] This PR fixes a typo or improves the docs (you can dismiss the\r\nother checks if that's the case).\r\n- [ ] Did you read the [contributor\r\nguideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),\r\n Pull Request section?\r\n- [ ] Was this discussed/approved via a Github issue or the\r\n[forum](https://discuss.huggingface.co/)? Please add a link\r\n to it if that's the case.\r\n- [ ] Did you make sure to update the documentation with your changes?\r\nHere are the\r\n[documentation\r\nguidelines](https://github.com/huggingface/transformers/tree/main/docs),\r\nand\r\n[here are tips on formatting\r\ndocstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).\r\n- [ ] Did you write any new necessary tests?\r\n\r\n\r\n## Who can review?\r\n\r\nAnyone in the community is free to review the PR once the tests have\r\npassed. Feel free to tag\r\nmembers/contributors who may be interested in your PR.\r\n\r\n","shortMessageHtmlLink":"Prepare for v1.1.1 (huggingface#1100)"}},{"before":"649d9754b1d1710ba2cf2f3350dad0397fac211b","after":"9c0f679d1d05b052a8b88b4a4fee142dc130a350","ref":"refs/heads/fix-convert-bug","pushedAt":"2023-09-26T13:04:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"Narsil","name":"Nicolas Patry","path":"/Narsil","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/204321?s=80&v=4"},"commit":{"message":"Simpler fix.","shortMessageHtmlLink":"Simpler fix."}},{"before":null,"after":"57433201b2339187c277f2cf514822c102eeba5d","ref":"refs/heads/fix-shared-weights-bug","pushedAt":"2023-09-26T09:58:34.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"Fix shared weights load bug and T5 loading","shortMessageHtmlLink":"Fix shared weights load bug and T5 loading"}},{"before":null,"after":"99da7ce121bfaae61cc147688607585341e8e5a3","ref":"refs/heads/fix-opt-load-bug","pushedAt":"2023-09-26T05:41:45.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"Complete OPT load parameters","shortMessageHtmlLink":"Complete OPT load parameters"}},{"before":"7d0aaede63f3765072e570c1ec08cbfe8c7c425d","after":null,"ref":"refs/heads/fix-opt-load-bug","pushedAt":"2023-09-26T05:36:06.000Z","pushType":"branch_deletion","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"}},{"before":null,"after":"7d0aaede63f3765072e570c1ec08cbfe8c7c425d","ref":"refs/heads/fix-opt-load-bug","pushedAt":"2023-09-26T05:24:30.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"Complete OPTDecoder load parameters","shortMessageHtmlLink":"Complete OPTDecoder load parameters"}},{"before":null,"after":"5f76dae04a7a5dfe24bf2e78892fe6cf29cab432","ref":"refs/heads/support-local-config","pushedAt":"2023-09-26T02:50:52.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"support local model config file","shortMessageHtmlLink":"support local model config file"}},{"before":"123749a3c999e32db798667041a4a9589d217c8e","after":"c5de7cd88679bc0331185c9cee75e4f68412243d","ref":"refs/heads/main","pushedAt":"2023-09-26T02:40:18.000Z","pushType":"push","commitsCount":2,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"Add AWQ quantization inference support (#1019) (#1054)\n\n# Add AWQ quantization inference support\r\n\r\nFixes\r\nhttps://github.com/huggingface/text-generation-inference/issues/781\r\n\r\nThis PR (partially) adds support for AWQ quantization for inference.\r\nMore information on AWQ [here](https://arxiv.org/abs/2306.00978). In\r\ngeneral, AWQ is faster and more accurate than GPTQ, which is currently\r\nsupported by TGI.\r\n\r\nThis PR installs 4-bit GEMM custom CUDA kernels released by AWQ authors\r\n(in `requirements.txt`, just one line change).\r\n\r\nQuick way to test this PR would be bring up TGI as follows:\r\n\r\n```\r\ntext-generation-server download-weights abhinavkulkarni/codellama-CodeLlama-7b-Python-hf-w4-g128-awq\r\n\r\ntext-generation-launcher \\\r\n--huggingface-hub-cache ~/.cache/huggingface/hub/ \\\r\n--model-id abhinavkulkarni/codellama-CodeLlama-7b-Python-hf-w4-g128-awq \\\r\n--trust-remote-code --port 8080 \\\r\n--max-input-length 2048 --max-total-tokens 4096 --max-batch-prefill-tokens 4096 \\\r\n--quantize awq\r\n```\r\n\r\nPlease note:\r\n* This PR was tested with FlashAttention v2 and vLLM.\r\n* This PR adds support for AWQ inference, not quantizing the models.\r\nThat needs to be done outside of TGI, instructions\r\n\r\n[here](https://github.com/mit-han-lab/llm-awq/tree/f084f40bd996f3cf3a0633c1ad7d9d476c318aaa).\r\n* This PR only adds support for `FlashLlama` models for now.\r\n* Multi-GPU setup has not been tested. \r\n* No integration tests have been added so far, will add later if\r\nmaintainers are interested in this change.\r\n* This PR can be tested on any of the models released\r\n\r\n[here](https://huggingface.co/abhinavkulkarni?sort_models=downloads#models).\r\n\r\nPlease refer to the linked issue for benchmarks for\r\n\r\n[abhinavkulkarni/meta-llama-Llama-2-7b-chat-hf-w4-g128-awq](https://huggingface.co/abhinavkulkarni/meta-llama-Llama-2-7b-chat-hf-w4-g128-awq)\r\nvs\r\n\r\n[TheBloke/Llama-2-7b-Chat-GPTQ](https://huggingface.co/TheBloke/Llama-2-7b-Chat-GPTQ).\r\n\r\nPlease note, AWQ has released faster (and in case of Llama, fused)\r\nkernels for 4-bit GEMM, currently at the top of the `main` branch at\r\nhttps://github.com/mit-han-lab/llm-awq, but this PR uses an older commit\r\nthat has been tested to work. We can switch to latest commit later on.\r\n\r\n## Who can review?\r\n\r\n@OlivierDehaene OR @Narsil\r\n\r\n---------\r\n\r\n\r\n\r\n# What does this PR do?\r\n\r\n\r\n\r\n\r\n\r\nFixes # (issue)\r\n\r\n\r\n## Before submitting\r\n- [ ] This PR fixes a typo or improves the docs (you can dismiss the\r\nother checks if that's the case).\r\n- [ ] Did you read the [contributor\r\nguideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),\r\n Pull Request section?\r\n- [ ] Was this discussed/approved via a Github issue or the\r\n[forum](https://discuss.huggingface.co/)? Please add a link\r\n to it if that's the case.\r\n- [ ] Did you make sure to update the documentation with your changes?\r\nHere are the\r\n[documentation\r\nguidelines](https://github.com/huggingface/transformers/tree/main/docs),\r\nand\r\n[here are tips on formatting\r\ndocstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).\r\n- [ ] Did you write any new necessary tests?\r\n\r\n\r\n## Who can review?\r\n\r\nAnyone in the community is free to review the PR once the tests have\r\npassed. Feel free to tag\r\nmembers/contributors who may be interested in your PR.\r\n\r\n\r\n\r\n---------\r\n\r\nCo-authored-by: Abhinav M Kulkarni \r\nCo-authored-by: Abhinav Kulkarni ","shortMessageHtmlLink":"Add AWQ quantization inference support (huggingface#1019) (huggingfac…"}},{"before":null,"after":"649d9754b1d1710ba2cf2f3350dad0397fac211b","ref":"refs/heads/fix-convert-bug","pushedAt":"2023-09-25T02:44:02.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"zhangsibo1129","name":null,"path":"/zhangsibo1129","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/134488188?s=80&v=4"},"commit":{"message":"fix discard_names in safetensors convertion","shortMessageHtmlLink":"fix discard_names in safetensors convertion"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADkqZZjwA","startCursor":null,"endCursor":null}},"title":"Activity · zhangsibo1129/text-generation-inference"}