Skip to content
View jinzhuoran's full-sized avatar
🎯
Focusing
🎯
Focusing
  • NEU & CASIA
  • Beijing
  • 22:36 (UTC +08:00)

Organizations

@CogNLP

Block or report jinzhuoran

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

official implementation of paper "Process Reward Model with Q-value Rankings"

Python 10 Updated Oct 26, 2024

Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.

Jupyter Notebook 158 4 Updated Oct 26, 2024

A survey on harmful fine-tuning attack for large language model

55 1 Updated Oct 29, 2024

Open Implementations of LLM Analyses

Jupyter Notebook 94 7 Updated Oct 8, 2024

A reading list on LLM based Synthetic Data Generation 🔥

687 41 Updated Oct 21, 2024

MULFE: Multi-Level Benchmark for Free Text Model Editing

Python 4 Updated Aug 11, 2024

Sparse Autoencoder for Mechanistic Interpretability

Python 185 39 Updated Jul 20, 2024

A curated list of Large Language Model (LLM) Interpretability resources.

1,127 90 Updated Jul 31, 2024

Official Repository for "Tamper-Resistant Safeguards for Open-Weight LLMs"

Python 38 5 Updated Oct 14, 2024

✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM

Python 933 54 Updated Oct 24, 2024

A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..

161 6 Updated Oct 17, 2024

An awesome repository & A comprehensive survey on interpretability of LLM attention heads.

TeX 250 6 Updated Nov 1, 2024

A collection of AWESOME things about mixture-of-experts

959 72 Updated Jul 31, 2024

Code and data for "Living in the Moment: Can Large Language Models Grasp Co-Temporal Reasoning?" (ACL 2024)

Python 28 1 Updated Jul 3, 2024

Code and data for "MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning" (ICLR 2024)

Jupyter Notebook 327 47 Updated Aug 25, 2024

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

Python 460 43 Updated Sep 29, 2024

A modular graph-based Retrieval-Augmented Generation (RAG) system

Python 18,634 1,816 Updated Nov 1, 2024

This repository collects all relevant resources about interpretability in LLMs

279 16 Updated Nov 1, 2024

RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models. NeurIPS 2024

Python 56 4 Updated Sep 30, 2024

[NeurIPS 2024] Large Language Model Unlearning via Embedding-Corrupted Prompts

Python 10 Updated Sep 26, 2024
C++ 2 Updated Jun 25, 2024

Official implementation of AdvPrompter https//arxiv.org/abs/2404.16873

Python 120 12 Updated May 6, 2024

The official implementation of the ICML 2024 paper "MemoryLLM: Towards Self-Updatable Large Language Models"

Python 80 4 Updated Oct 22, 2024

[ICLR24 (Spotlight)] "SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation" by Chongyu Fan*, Jiancheng Liu*, Yihua Zhang, Eric Wong, D…

Python 97 13 Updated Aug 9, 2024

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 315 53 Updated Aug 16, 2024

The official implementation of our ICLR2024 paper "AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models".

Python 239 40 Updated Oct 23, 2024

A curated list of resources dedicated to retrieval-augmented generation (RAG).

Python 67 6 Updated Oct 28, 2024

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

Python 112 13 Updated Mar 22, 2024

Attribute (or cite) statements generated by LLMs back to in-context information.

Jupyter Notebook 138 14 Updated Oct 8, 2024

Steering Llama 2 with Contrastive Activation Addition

Jupyter Notebook 94 30 Updated May 23, 2024
Next