Skip to content

Releases: abgulati/LARS

v2.0-beta8: Field-Tested Requirements & Parsing Refinements

27 Sep 20:19
Compare
Choose a tag to compare
  • Removed unnecessary requirements & version specifications

  • Verified requirements installation from txt file on Windows and Linux

  • Merged HF-Waitress response-parsing refinements from LARS-Enterprise

Full Changelog: v2.0-beta7...v2.0-beta8

v2.0-beta7: Updated Requirements & Various Bug Fixes Merged from LARS-Enterprise Repository

27 Sep 18:27
Compare
Choose a tag to compare
  • requirements.txt files for all three platforms updated following various user complaints

  • bug-fixes implemented in LARS-Enterprise have been merged to public repo

Full Changelog: v2.0-beta6...v2.0-beta7

v2.0-beta6: Major HF-Waitress LLM Server Update

10 Sep 22:44
Compare
Choose a tag to compare
  • HF-Waitress: /completions_stream now implements custom TextStreamer so as to redirect only it's output to the stream buffer, while STDOUT remains unmodified thus allowing other non-blocked routes and methods to execute and output to STDOUT in parallel without interfering with the stream

  • CSS separated into a dedicate file

  • Minor QoL changes

Full Changelog: v2.0-beta5...v2.0-beta6

v2.0-beta5: UI Enhancements

09 Sep 22:19
Compare
Choose a tag to compare
  • New font-family, glassmorphism and title bar

Full Changelog: v2.0-beta4...v2.0-beta5

v2.0-beta4: HQQ Fix and Minor Refinements

06 Sep 23:08
Compare
Choose a tag to compare
  • BUG FIX: HQQ quantization would error out if torch.dtype (dataType) was set to auto, it now force-sets to torch.bfloat16

  • BUG FIX: Add new LLM button re-displays when the HF-Waitress LLM list is closed and re-opened

  • Minor response-formatting adjustment

Full Changelog: v2.0-beta3...v2.0-beta4

v2.0-beta3

06 Sep 00:17
Compare
Choose a tag to compare
  • Fixed HF-Waitress streaming-response formatting!

  • Improved app load times from tuned server health-check intervals

  • Minor performance improvement to HF-Waitress streaming-output

  • Minor refinements to HF-Waitress server status outputs

Full Changelog: v2.0-beta2...v2.0-beta3

v2.0-beta2: Enhanced HF-Waitress LLM Management Features, Error-Reporting Refinements and Bug Fixes

05 Sep 00:25
Compare
Choose a tag to compare
  1. Enhanced HF-Waitress LLM Management: Add new model_ids, search-filter & sort the list of LLMs as well as delete LLM IDs from the HF-Waitress LLM dropdown list
  2. HF-Waitress server health-check reporting improvements
  3. Various bug fixes: Reference to index_dir removed, document_records SQL-DB correctly created on very first run, removed troublesome test-prints during document-chunking operation

Full Changelog: v2.0-beta1...v2.0-beta2

v2.0-beta1: New LLM Server -- HF-Waitress!

30 Aug 01:11
Compare
Choose a tag to compare

HF-Waitress is a powerful and flexible server application for deploying and interacting with HuggingFace Transformer models. It simplifies the process of running open-source Large Language Models (LLMs) locally on-device, addressing common pain points in model deployment and usage.

This server enables loading HF-Transformer & AWQ-quantized models directly off the hub, while providing on-the-fly quantization via BitsAndBytes, HQQ and Quanto for the former. It negates the need to manually download any model yourself, simply working off the models name instead. It requires no setup, and provides concurrency and streaming responses all from within a single, easily-portable, platform-agnostic Python script.

For a full list of features see: https://github.com/abgulati/hf-waitress

LARS is far easier to deploy and get working on the very first run without requiring the user to manually download and place their LLMs.

Check out the updated Dependencies, Installation and Usage Instructions in the README

Note containers are not yet updated and will be done so in the following week most likely.

Full Changelog: v1.9.1...v2.0-beta1

v1.9.1 - Re-ranker Robustness & Minor UI Tweak

21 Aug 20:48
Compare
Choose a tag to compare
  • BUG FIX: Re-ranking bypassed when do_rag=False - error no longer produced due to empty document list!
  • Minor UI change: Adjusted max-width of Settings modal to 75% for better use of available screenspace

Full Changelog: v1.9...v1.9.1

v1.9 - Vector Re-Ranking & No More Whoosh

21 Aug 01:52
Compare
Choose a tag to compare
  1. Custom document chunker appends page number data as metadata to chunks stored vectorDB
  2. LLM can now supply specific document names and page numbers within the response itself!
  3. Re-ranking and filtering applied via SentenceTransformer('all-MiniLM-L6-v2') to the vectorDB similarity search results for better contextual accuracy
  4. Whoosh indexing no longer necessary - far simplified book-keeping and no overhead for page-number searches at inference time
  5. Page number accuracy significantly increased as a result of all the above
  6. Default system-prompt template now instructs the LLM to include document names and page numbers whenever additional context is provided, actual output dependent on ability of the specific LLM used
  7. BUG FIX: PDF tabs in documnet-viewer in the response window did not open properly for consequetive questions and on chat-history load. FIXED.

Full Changelog: v1.8...v1.9