ictnlp / TruthX Star 86 Code Issues Pull requests Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space" safety llama representation language-model mistral explainable-ai hallucination baichuan hallucinations gpt-4 truthfulness llm llms chatgpt chatglm llm-inference llama2 llama3 Updated Mar 26, 2024 Python
OpenMOSS / Say-I-Dont-Know Star 56 Code Issues Pull requests [ICML'2024] Can AI Assistants Know What They Don't Know? alignment truthfulness large-language-models Updated Feb 5, 2024 Python
thu-ml / MMTrustEval Star 48 Code Issues Pull requests A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust) benchmark privacy toolbox safety multi-modal fairness robustness claude gpt-4 trustworthy-ai truthfulness mllm Updated Jul 22, 2024 Python
alexisrozhkov / llm-calib Star 0 Code Issues Pull requests Improving LLM truthfulness via reporting confidence alignment truthfulness llm rlhf Updated Jun 9, 2024 Python