Release v0.1.12
Highlights
- Fast JSON Decoding (blog)
- Output logprobs for decoding tokens
- Multiple bug fixes
What's Changed
- Fix no-cache mode by @Ying1123 in #136
- Support Faster JSON decoding for llava by @hnyls2002 in #137
- fix undfined variable by @yaya-sy in #142
- jump-forward rename by @hnyls2002 in #144
- Add warmup to SRT server by @comaniac in #146
- add openai error handler with retry and logger by @ChuyueSun in #148
- Temporary fix OpenAI API for Pydantic v1/v2 by @comaniac in #153
- Add gptq quantization model support by @Arcmoon-Hu in #141
- Support decode token logprobs by @comaniac in #130
- Format code & move functions by @merrymercy in #155
- [Submodule] Change FlashInfer to import by @comaniac in #156
- add
--disable-disk-cache
by @hnyls2002 in #160 - Add Auth Token to RuntimeEndpoint by @nivibilla in #162
- Fix BaseCache metric by @comaniac in #170
- import outlines by @hnyls2002 in #168
- Fix token usage with jump forward by @comaniac in #174
- Support extra field regex in OpenAI API by @comaniac in #172
- Fix the chat template for llava-v1.6-34b & format code by @merrymercy in #177
- Update version to 0.1.12 by @merrymercy in #178
New Contributors
- @yaya-sy made their first contribution in #142
- @ChuyueSun made their first contribution in #148
- @nivibilla made their first contribution in #162
Full Changelog: v0.1.11...v0.1.12