Releases: RWKV/rwkv.cpp
Releases · RWKV/rwkv.cpp
master-7199f5b
Phase out very verbose element_count functions (#95) * Phase out very verbose element_count functions This could have been done better * Add "get" to other getters * Specify "float elements" in rwkv_get_logits_len docs * Use traditional for-loop for rwkv_init_state writes * Newline
master-43c78f2
Fix build on non-MSVC compilers for Windows platforms (#94)
master-b88ae59
Fix bug in world tokenizer (#93)
master-82c4ac7
Add support for the world tokenizer (#86) * Add support for the world tokenizer * Move tokenizer logic to rwkv_tokenizer.py * Added test for the tokenizer
master-09ec314
Fix visual bug in quantization (#92) It didn't calculate the compression ratio properly because of a copy/paste error :(
master-fb6708b
Fix pytorch storage warnings, fixes #80 (#88) we seriously don't care what type of storage we get, pytorch sucks
master-5b41cd7
Add capability for extra binaries to be built with rwkv.cpp (#87) * Add capability for examples This also adds a quantizer that works without python. in the future, we might be able to convert from pytorch as well, without python. * example implied code style * rename examples to tools * rename cpuinfo.c to cpu_info.c * include ggml header again * Return EXIT_FAILURE on help * done with this * final name: extras * going To have a seizure * wait literal double n
master-3f8bb2c
Allow creating multiple contexts per model (#83) * Allow creating multiple contexts per model This allows for parallel inference and I am preparing to support sequence mode using a method similar to this * Fix cuBLAS * Update rwkv.h Co-authored-by: Alex <[email protected]> * Update rwkv.cpp Co-authored-by: Alex <[email protected]> * Inherit print_errors from parent ctx when cloning * Add context cloning test * Free * Free ggml context when last rwkv_context is freed * Free before exit * int main * add explanation of ffn_key_size * Update rwkv_instance and rwkv_context comments * Thread safety notes --------- Co-authored-by: Alex <[email protected]>
master-363dfb1
File parsing and memory usage optimization (#74) * Rework the entire file parsing system prepare for future changes * Estimate memory usage perfectly Removes whatever issue with small models that used to exist * Fix file stream ops on macOS for me this compiles on Windows 11, Ubuntu 20.04, and macOS 10.14 * Fix rwkv.cpp for non-WIN32 MSVC invocations like bindgen-rs * Implement Q8_1 quantization ...and disable the type, because GGML doesn't support the ops required to run inference with it. It's not worth any nasty hacks or workarounds right now, Q8_0 is very very similar if one wants 8-bit quantization. * Completely remove Q8_1 type This type isn't meant to be user-facing in any way so I may as well get rid of it now since it will probably never exist as a data format. * Switch from std::vector to unique array for model layers These don't ever need to be resized * Factor ffn.key.weight height into memory estimate some models have this set weirdly, in various different ways. just give up and record the actual size of it and use that * Make a few more operations inplace ggml doesn't currently expose most of the stuff it supports, so force some things. not 100% sure about this, I don't think the memory savings are that worth it * attempt a perfect upper bound size for the scratch space This should be the largest work_size seen in any model, since it is always larger than any of the other paramters except vocab (which does not participate in the graph work size). * Revert "Make a few more operations inplace" This reverts commit f94d6eb216040ae0ad23d2b9c87fae8349882f89. * Make less calls to fread micro-optimization * Fix memory size estimation for smaller models ggml works with some larger formats internally * print location in all assert macros * remove trailing whitespace * add type_to_string entry for unknown * Simplify quantization a bit * fix cuBLAS compatibility adding n_gpu_layers to rwkv_init_from_file won't work. add an extra function instead * fix quantize * quantize: don't create output file if opening input fails * Rename gpu offload layers might want to avoid branding it with cublas in case we add something like clblast support in the future * Remove old read_int32 and write_int32 functions It's all uints now * Remove static from things * Only call gpu_offload_layers if gpu_layer_count > 0 * Add rwkv_ prefix to all structures * Braces * Functions naming convention * Remove blank line after comment * Capitalize comments * Re-add quantize explanatory comment * Re-add histogram comment * Convert all error messages to uppercase * Make type conversions extern for ffi bindings from other langs * Name the state parts The code in rwkv_eval to initialize the state (when state_in is NULL) was getting very confusing so I just put everything in a struct to name it. * Fnvalid
master-241350f
Feature add cublas support (#65) * chore: add ggml import in the head of rwkv.h * chore: add ggml import in the head of rwkv.h * feat: add cublas support * feat: update rwkv.cpp * feat: remove unused change * chore: fix linux build issue * chore: sync ggml and offload tensor to gpu * chore: comment out tensors which occurs error on GPU * chore: update comment and readme * chore: update ggml to recent * chore: add more performance test results * chore: add more performance test results * chore: fix problem of reading file more than 2 gb * chore: merge master * chore: remove unused comment * chore: fix for comments * Update README.md * Update rwkv.cpp --------- Co-authored-by: Alex <[email protected]>