A GPT discord bot that uses your own server for training data. This repo is to help you build that bot.
seemed funny.
You will need:
- Python
- Node.JS
- (at least) 5 dollars
- consent
You should probably get consent before doing all this!
-
To get started, you'll need chat data. Use DiscordChatExporter: https://github.com/Tyrrrz/DiscordChatExporter
-
In DiscordChatExporter, choose the channel you want to use for the training data, make sure to download it as
json
. This will probably take a while.
You will probably want to set a partition limit! I used 10mb
.
-
Once you have your json files, make a folder called
dirty-data
in thepreparation
folder, and put all the json files in there. -
Run the
jsoncleaner.py
python script. -
When asked, enter your system prompt.
Your system prompt sets the "context" for the AI Model, as well as placing restrictions or "boundaries" on its responses. You may want to write this down for future steps, but you can also just grab it from the output.jsonl
file.
This step will cost you at least $5
There is a script in the training folder, but I would just use the web ui for this: https://platform.openai.com/finetune/
You can only use gpt-3.5-turbo-xxxx
models for fine tuning with the data you've generated.
Training will also take a while, especially if you've given it a lot of data. For me training a GPT3.5 model with ~2048 lines of data will run you about $2.
Your AI Model will want to say slurs after a while, no matter how much you train it. There's a filter in place which should block most if not all of them, and a well crafted system prompt will prevent them as well. We will need a better solution for "ignoring" them from OpenAI's data in the future, though.
- Rename
frontend/config.sample.json
toconfig.json
, and get ready to enter a lot of settings:
- discordToken: your discord api key you got from the developer's portal
- allowedChannelId: the channel you want the bot to look at for pings (can be a thread)
- trainEmoji: an emoji reaction you want the bot to watch for to save the response (and original message) for future training
- reactionCount: how many reactions until the bot should save the message
- openaiapi section:
- apiKey: your OpenAI api key
- modelId: your fine tuned model id
you should leave these settings as is, but here is what these do
- maxTokens: how many tokens your bot can use per response
- default: 256
- temperature: how "random" you want the bot to be, can be 0-2. lower is less "random"
- default: 0.9
- presencePenalty: how "on topic" the bot should be, can be 0-2. lower is more "on topic"
- default: 0.3
- frequencyPenalty: how "repetitive" the bot should be, can be 0-2. lower is more "repetitive"
- default: 0.3
- severityCategory: what "level" of slurs and bad words should we filter, can be 0-3.
- default: 2.6
- systemPrompt: your system prompt that the ai will use
A note on system prompts
While you're in
config.json
, you need to add a system prompt. This sets the guidelines and "boundaries" that the AI mostly follows. You can use the same system prompt that was used injsoncleaner.py
, but now would be the best time to mess around and see what gives you the best results.
- allowedUserTag: a discord user id for a user who can use the bot in any channel
- removePings: can be 0 or 1, 1 to remove pings, 0 to allow them
- removeLinks: can be 0 or 1, 1 to remove links, 0 to allow them
-
Open a terminal/command prompt in the
frontend
folder, then runnpm install
to grab all the dependencies -
Once that's done, run
npm start
to start the bot!
Bucket will log responses, and who triggered the bot in the /frontend/logs/
folder.
Feel Free! If you want to change something just open a PR.