RWKV Runner

This project aims to eliminate the barriers of using large language models by automating everything for you. All you need is a lightweight executable program of just a few megabytes. Additionally, this project provides an interface compatible with the OpenAI API, which means that every ChatGPT client is an RWKV client.

English | 简体中文 | 日本語

Install

FAQs | Preview | Download | Server-Deploy-Examples

Tip: You can deploy backend-python on a server and use this program as a client only. Fill in your server address in the Settings `API URL`.

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues (output garbled), go to the Configs page and turn off `Use Custom CUDA kernel to Accelerate`, or try to upgrade your gpu driver.

If Windows Defender claims this is a virus, you can try downloading v1.3.7_win.zip and letting it update automatically to the latest version, or add it to the trusted list (`Windows Security` -> `Virus & threat protection` -> `Manage settings` -> `Exclusions` -> `Add or remove exclusions` -> `Add an exclusion` -> `Folder` -> `RWKV-Runner`).

For different tasks, adjusting API parameters can achieve better results. For example, for translation tasks, you can try setting Temperature to 1 and Top_P to 0.3.

Features

RWKV model management and one-click startup
Fully compatible with the OpenAI API, making every ChatGPT client an RWKV client. After starting the model, open https://127.0.0.1:8000/docs to view more details.
Automatic dependency installation, requiring only a lightweight executable program
Configs with 2G to 32G VRAM are included, works well on almost all computers
User-friendly chat and completion interaction interface included
Easy-to-understand and operate parameter configuration
Built-in model conversion tool
Built-in download management and remote model inspection
Built-in one-click LoRA Finetune
Can also be used as an OpenAI ChatGPT and GPT-Playground client
Multilingual localization
Theme switching
Automatic updates

API Concurrency Stress Testing

ab -p body.json -T application/json -c 20 -n 100 -l https://127.0.0.1:8000/chat/completions

body.json:

{
  "messages": [
    {
      "role": "user",
      "content": "Hello"
    }
  ]
}

Embeddings API Example

Note: v1.4.0 has improved the quality of embeddings API. The generated results are not compatible with previous versions. If you are using embeddings API to generate knowledge bases or similar, please regenerate.

If you are using langchain, just use OpenAIEmbeddings(openai_api_base="https://127.0.0.1:8000", openai_api_key="sk-")

import numpy as np
import requests


def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))


values = [
    "I am a girl",
    "我是个女孩",
    "私は女の子です",
    "广东人爱吃福建人",
    "我是个人类",
    "I am a human",
    "that dog is so cute",
    "私はねこむすめです、にゃん♪",
    "宇宙级特大事件！号外号外！"
]

embeddings = []
for v in values:
    r = requests.post("https://127.0.0.1:8000/embeddings", json={"input": v})
    embedding = r.json()["data"][0]["embedding"]
    embeddings.append(embedding)

compared_embedding = embeddings[0]

embeddings_cos_sim = [cosine_similarity(compared_embedding, e) for e in embeddings]

for i in np.argsort(embeddings_cos_sim)[::-1]:
    print(f"{embeddings_cos_sim[i]:.10f} - {values[i]}")

Related Repositories:

RWKV-4-World: https://huggingface.co/BlinkDL/rwkv-4-world/tree/main
RWKV-4-Raven: https://huggingface.co/BlinkDL/rwkv-4-raven/tree/main
ChatRWKV: https://github.com/BlinkDL/ChatRWKV
RWKV-LM: https://github.com/BlinkDL/RWKV-LM
RWKV-LM-LoRA: https://github.com/Blealtan/RWKV-LM-LoRA
MIDI-LLM-tokenizer: https://github.com/briansemrau/MIDI-LLM-tokenizer

Name		Name	Last commit message	Last commit date
Latest commit History 594 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
backend-golang		backend-golang
backend-python		backend-python
backend-rust/assets		backend-rust/assets
build		build
deploy-examples		deploy-examples
finetune		finetune
frontend		frontend
midi		midi
.gitattributes		.gitattributes
.gitignore		.gitignore
CURRENT_CHANGE.md		CURRENT_CHANGE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_JA.md		README_JA.md
README_ZH.md		README_ZH.md
exportModelsJson.js		exportModelsJson.js
go.mod		go.mod
go.sum		go.sum
main.go		main.go
manifest.json		manifest.json
vendor.yml		vendor.yml
wails.json		wails.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RWKV Runner

Install

Tip: You can deploy backend-python on a server and use this program as a client only. Fill in your server address in the Settings `API URL`.

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues (output garbled), go to the Configs page and turn off `Use Custom CUDA kernel to Accelerate`, or try to upgrade your gpu driver.

For different tasks, adjusting API parameters can achieve better results. For example, for translation tasks, you can try setting Temperature to 1 and Top_P to 0.3.

Features

API Concurrency Stress Testing

Embeddings API Example

Related Repositories:

Preview

Homepage

Chat

Completion

Composition

Configuration

Model Management

Download Management

LoRA Finetune

Settings

About

Releases

Packages

Languages

License

taurusduan/RWKV-Runner

Folders and files

Latest commit

History

Repository files navigation

RWKV Runner

Install

Tip: You can deploy backend-python on a server and use this program as a client only. Fill in your server address in the Settings API URL.

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues (output garbled), go to the Configs page and turn off Use Custom CUDA kernel to Accelerate, or try to upgrade your gpu driver.

For different tasks, adjusting API parameters can achieve better results. For example, for translation tasks, you can try setting Temperature to 1 and Top_P to 0.3.

Features

API Concurrency Stress Testing

Embeddings API Example

Related Repositories:

Preview

Homepage

Chat

Completion

Composition

Configuration

Model Management

Download Management

LoRA Finetune

Settings

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Tip: You can deploy backend-python on a server and use this program as a client only. Fill in your server address in the Settings `API URL`.

Default configs has enabled custom CUDA kernel acceleration, which is much faster and consumes much less VRAM. If you encounter possible compatibility issues (output garbled), go to the Configs page and turn off `Use Custom CUDA kernel to Accelerate`, or try to upgrade your gpu driver.

Packages