Release v0.1.12

merrymercy released this 11 Feb 14:49

· 198 commits to main since this release

Highlights

Fast JSON Decoding (blog)
Output logprobs for decoding tokens
Multiple bug fixes

What's Changed

Fix no-cache mode by @Ying1123 in #136
Support Faster JSON decoding for llava by @hnyls2002 in #137
fix undfined variable by @yaya-sy in #142
jump-forward rename by @hnyls2002 in #144
Add warmup to SRT server by @comaniac in #146
add openai error handler with retry and logger by @ChuyueSun in #148
Temporary fix OpenAI API for Pydantic v1/v2 by @comaniac in #153
Add gptq quantization model support by @Arcmoon-Hu in #141
Support decode token logprobs by @comaniac in #130
Format code & move functions by @merrymercy in #155
[Submodule] Change FlashInfer to import by @comaniac in #156
add --disable-disk-cache by @hnyls2002 in #160
Add Auth Token to RuntimeEndpoint by @nivibilla in #162
Fix BaseCache metric by @comaniac in #170
import outlines by @hnyls2002 in #168
Fix token usage with jump forward by @comaniac in #174
Support extra field regex in OpenAI API by @comaniac in #172
Fix the chat template for llava-v1.6-34b & format code by @merrymercy in #177
Update version to 0.1.12 by @merrymercy in #178

New Contributors

@yaya-sy made their first contribution in #142
@ChuyueSun made their first contribution in #148
@nivibilla made their first contribution in #162

Full Changelog: v0.1.11...v0.1.12

Contributors

comaniac, Ying1123, and 6 other contributors

Assets 2