[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.10431.md) | | 04-30 | **Iterative Reasoning Preference Optimization**
_{**Institution:** FAIR at Meta, New York University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.19733v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.19733.md) | | 04-22 | **Information Re-Organization Improves Reasoning in Large Language Models**
_{**Institution:** Zhejiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.13985v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.13985.md) | | 04-19 | **Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?**
_{**Institution:** Nanyang Technological University, Princeton University, Salesforce Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12728v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.12728.md) | | 04-18 | **Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12253v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.12253.md) | | 04-18 | **EVIT: Event-Oriented Instruction Tuning for Event Reasoning**
_{**Institution:** Key Laboratory of High Confidence Software Technologies (PKU), MOE, China, School of Computer Science, Peking University, Advanced Institute of Big Data}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11978v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.11978.md) | | 04-17 | **Many-Shot In-Context Learning**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11018v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.11018.md) | | 04-16 | **CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity**
_{**Institution:** Intel Labs}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10513v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.10513.md) | | 04-16 | **Self-playing Adversarial Language Game Enhances LLM Reasoning**
_{**Institution:** Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10642v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.10642.md) | | 04-11 | **Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning**
_{**Institution:** Nanyang Technological University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07546v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.07546.md) | | 04-09 | **THOUGHTSCULPT: Reasoning with Intermediate Revision and Search**
_{**Institution:** UC Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05966v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.05966.md) | | 04-08 | **Evaluating Interventional Reasoning Capabilities of Large Language Models**
_{**Institution:** Université de Montréal, Google DeepMind, ServiceNow Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05545v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.05545.md) | | 04-07 | **Prompting Large Language Models for Zero-shot Essay Scoring via Multi-trait Specialization**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.04941v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.04941.md) | | 03-22 | **Can large language models explore in-context?**
_{**Institution:** Microsoft Research, Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.15371v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.15371.md) | | 03-20 | **Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts**
_{**Institution:** University of Memphis, San Francisco Veterans Affairs Health Care System, University of California San Francisco}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.13786v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.13786.md) | | 03-13 | **Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments**
_{**Institution:** Nanjing University, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.08593v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.08593.md) | | 03-11 | **ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis**
_{**Institution:** Zhejiang University, Southeast University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.06932v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.06932.md) | | 02-26 | **Do Large Language Models Latently Perform Multi-Hop Reasoning?**
_{**Institution:** Google DeepMind, UCL, Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16837v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.16837.md) | | 02-15 | **Chain-of-Thought Reasoning Without Prompting**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.10200v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.102.md) | | 02-15 | **How to Train Data-Efficient LLMs**
_{**Institution:** Google DeepMind, University of California San Diego, Texas A&M University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.09668v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.09668.md) | | 02-15 | **A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts**
_{**Institution:** Google DeepMind, Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.09727v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.09727.md) | | 02-09 | **InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning**
_{**Institution:** Shanghai AI Laboratory, Tsinghua University, Fudan University School of Computer Science}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.06332v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.06332.md) | | 02-02 | **MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models**
_{**Institution:** UNC Chapel Hill.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01620v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.0162.md) | | 01-25 | **ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases**
_{**Institution:** HKUST}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.14003v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.14003.md) | | 01-23 | **KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning**
_{**Institution:** Samsung R&D Institute India - Bangalore}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12863v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.12863.md) | | 01-22 | **Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation**
_{**Institution:** Institute of Information Engineering, Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.11864v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.11864.md) | | 01-20 | **BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models**
_{**Institution:** University of Illinois Urbana-Champaign, University of Washington, Western Washington University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12242v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.12242.md) | | 01-18 | **Self-Rewarding Language Models**
_{**Institution:** Meta, NYU}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10020v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.1002.md) | | 01-18 | **Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation**
_{**Institution:** The University of Tokyo, RIKEN}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10005v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.10005.md) | | 01-16 | **MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline**
_{**Institution:** Alibaba Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08190v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.0819.md) | | 01-11 | **The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models**
_{**Institution:** Johns Hopkins University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05618v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05618.md) | | 01-11 | **Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning**
_{**Institution:** Qatar Computing Research Institute}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05787v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05787.md) | | 01-11 | **Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph Completion**
_{**Institution:** Tsinghua Shenzhen International Graduate School Tsinghua University, School of Computer Science Peking University, Baidu Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06072v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06072.md) | | 01-09 | **Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs**
_{**Institution:** Zhejiang University, Ant Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04319v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.04319.md) | | 01-09 | **Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding**
_{**Institution:** University of California San Diego, Google Cloud AI Research, Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04398v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.04398.md) | | 01-09 | **The Critique of Critique**
_{**Institution:** The Hong Kong Polytechnic University, Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04518v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.04518.md) | | 01-08 | **TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series**
_{**Institution:** IBM Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03955v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03955.md) | | 01-07 | **Grimoire is All You Need for Enhancing Large Language Models**
_{**Institution:** Beihang University, Renmin University of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03385v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03385.md) | | 01-07 | **Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon**
_{**Institution:** Beijing Academy of Artificial Intelligence, Renmin University of China, Nankai University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03462v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03462.md) | | 01-06 | **Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification**
_{**Institution:** Aerospace Information Research Institute Chinese Academy of Sciences, Key Laboratory of Target Cognition and Application Technology, University of Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03158v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03158.md) | | 01-04 | **On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)**
_{**Institution:** University of South Carolina, New Mexico State University, IBM Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02500v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.025.md) | | 01-04 | **On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)**
_{**Institution:** University of South Carolina, New Mexico State University, IBM Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02500v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.025.md) | | 01-04 | **Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives**
_{**Institution:** Zhejiang University, OPPO Research Institute}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02009v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.02009.md) | | 01-04 | **ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers**
_{**Institution:** Bytedance Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02072v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.02072.md) | | 01-01 | **From Prompt Engineering to Prompt Science With Human in the Loop**
_{**Institution:** University of Washington}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04122v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.04122.md) | | 01-01 | **A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models**
_{**Institution:** The Chinese University of Hong Kong, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00757v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.00757.md) | | 12-28 | **Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs**
_{**Institution:** Chinese University of Hong Kong, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17080v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1708.md) | | 12-28 | **Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17117v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17117.md) | | 12-28 | **Improving In-context Learning via Bidirectional Alignment**
_{**Institution:** Nanyang Technological University, Princeton University, Salesforce Research USA}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17055v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17055.md) | | 12-28 | **Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs**
_{**Institution:** Chinese University of Hong Kong, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17080v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1708.md) | | 12-27 | **Rethinking Tabular Data Understanding with Large Language Models**
_{**Institution:** UC San Diego, USC, UC Davis}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16702v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.16702.md) | | 12-27 | **How Robust are LLMs to In-Context Majority Label Bias?**
_{**Institution:** Amazon}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16549v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.16549.md) | | 12-26 | **Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models**
_{**Institution:** University of Waterloo}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.16098v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.16098.md) | | 12-26 | **KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph**
_{**Institution:** Northeastern University, Neusoft AI Magic Technology Research, Neusoft Institute of Intelligent Medical Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15880v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1588.md) | | 12-26 | **Supervised Knowledge Makes Large Language Models Better In-context Learners**
_{**Institution:** School of Engineering Westlake University, Westlake Institute for Advanced Study, Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15918v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.15918.md) | | 12-22 | **NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes**
_{**Institution:** University of Michigan, Rutgers University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.14890v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1489.md) | | 12-21 | **The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction**
_{**Institution:** MIT, Microsoft Research NYC}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.13558v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.13558.md) | | 12-21 | **On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning**
_{**Institution:** Language Technology Lab University of Cambridge}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.13772v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.13772.md) | | 12-19 | **Active Preference Inference using Language Models and Probabilistic Reasoning**
_{**Institution:** Cornell University, Cornell Tech}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.12009v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.12009.md) | | 12-18 | **Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows**
_{**Institution:** University of Washington, Stanford University, Allen Institute for AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11681v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.11681.md) | | 12-17 | **Mixed Distillation Helps Smaller Language Model Better Reasoning**
_{**Institution:** Zhejiang University, Dalian Medical University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10730v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1073.md) | | 12-15 | **ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)**
_{**Institution:** Luleå University of Technology Sweden}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09801v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.09801.md) | | 12-14 | **TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning**
_{**Institution:** National University of Singapore, University of Illinois Urbana-Champaign, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09039v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.09039.md) | | 12-14 | **Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning**
_{**Institution:** Hong Kong University of Science and Technology, Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.08901v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.08901.md) | | 12-13 | **Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models**
_{**Institution:** University of Southern California, Amazon.com Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.08303v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.08303.md) | | 12-12 | **Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.07476v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.07476.md) | | 12-11 | **On Meta-Prompting**
_{**Institution:** Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.06562v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.06562.md) | | 12-11 | **"What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces**
_{**Institution:** Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.06147v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.06147.md) | | 12-11 | **MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples**
_{**Institution:** Xiamen University, Tencent YouTu Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.06363v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.06363.md) | | 12-07 | **A Study on the Calibration of In-context Learning**
_{**Institution:** Harvard University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.04021v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.04021.md) | | 12-07 | **Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration**
_{**Institution:** Renmin University of China, Beijing Institute of Technology, HKUST (GZ)}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.03987v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.03987.md) | | 12-05 | **Prompt Optimization via Adversarial In-Context Learning**
_{**Institution:** National University of Singapore, Hong Kong University of Science and Technology, Institute for Infocomm Research (I2R) A*STAR}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.02614v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.02614.md) | | 12-05 | **Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation**
_{**Institution:** Sea AI Lab, Sun Yat-sen University, Harvard University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.02439v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.02439.md) | | 12-04 | **On the Effectiveness of Large Language Models in Domain-Specific Code Generation**
_{**Institution:** Shanghai Jiao Tong University, Chongqing University, East China Normal University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01639v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01639.md) | | 12-04 | **The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning**
_{**Institution:** Allen Institute for Artificial Intelligence, University of Washington}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01552v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01552.md) | | 12-04 | **Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models**
_{**Institution:** Xiamen University, MBZUAI, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01714v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01714.md) | | 12-04 | **Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication**
_{**Institution:** Fudan University, National University of Singapore, Shanghai AI Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01823v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01823.md) | | 12-02 | **Exploring and Improving the Spatial Reasoning Abilities of Large Language Models**
_{**Institution:** Stanford University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01054v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01054.md) | | 12-01 | **On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs**
_{**Institution:** Singapore Management University, National Sun Yat-sen University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.00353v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.00353.md) | | 11-30 | **Applying Large Language Models and Chain-of-Thought for Automatic Scoring**
_{**Institution:** University of Georgia}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.03748v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.03748.md) | | 11-30 | **IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions**
_{**Institution:** Huawei Poisson Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.18397v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.18397.md) | | 11-29 | **Zero-shot Conversational Summarization Evaluations with small Large Language Models**
_{**Institution:** Intel labs}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.18041v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.18041.md) | | 11-29 | **Understanding and Improving In-Context Learning on Vision-language Models**
_{**Institution:** LMU Munich, University of Oxford}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.18021v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.18021.md) | | 11-23 | **Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.13982v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.13982.md) | | 11-22 | **Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting**
_{**Institution:** Utrecht University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.13274v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.13274.md) | | 11-22 | **Visual In-Context Prompting**
_{**Institution:** HKUST, Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.13601v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.13601.md) | | 11-20 | **Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.11797v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.11797.md) | | 11-19 | **TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems**
_{**Institution:** SenseTime Researc}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.11315v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.11315.md) | | 11-18 | **Orca 2: Teaching Small Language Models How to Reason**
_{**Institution:** Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.11045v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.11045.md) | | 11-17 | **Exploring the Relationship between In-Context Learning and Instruction Tuning**
_{**Institution:** HKUST}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.10367v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.10367.md) | | 11-16 | **Crafting In-context Examples according to LMs' Parametric Knowledge**
_{**Institution:** The University of Texas at Austin}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.09579v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.09579.md) | | 11-16 | **Automatic Engineering of Long Prompts**
_{**Institution:** Google}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.10117v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.10117.md) | | 11-15 | **Contrastive Chain-of-Thought Prompting**
_{**Institution:** DAMO Academy, Alibaba Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.09277v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.09277.md) | | 11-15 | **Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models**
_{**Institution:** Tecent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.09210v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.0921.md) | | 11-13 | **In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax**
_{**Institution:** NYU, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.07811v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.07811.md) | | 11-11 | **In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering**
_{**Institution:** Stanford University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.06668v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.06668.md) | | 10-31 | **Learning to Reason and Memorize with Self-Notes**
_{**Institution:** Meta AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.00833.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.00833.md) | | 09-19 | **AutoMix: Automatically Mixing Language Models**
_{**Institution:** Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.12963.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.12963.md) | | 09-12 | **Re-Reading Improves Reasoning in Language Models**
_{**Institution:** Institute of Information Engineering, CAS}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.06275.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.06275.md) | | 07-11 | **Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps**
_{**Institution:** UNIVERSITY OF MARYLAND}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2307.05052v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2307.05052.md) | | 05-26 | **Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models**
_{**Institution:** Singapore Management University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.04091.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.04091.md) | | 05-26 | **Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.16582.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.16582.md) | | 05-26 | **MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting**
_{**Institution:** Kyoto University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.16896.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.16896.md) | | 05-23 | **Improving Factuality and Reasoning in Language Models through Multiagent Debate**
_{**Institution:** MIT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.14325.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.14325.md) | | 05-23 | **ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models**
_{**Institution:** Gaoling School of Artificial Intelligence, Renmin University of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.14323.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.14323.md) | | 05-22 | **LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities**
_{**Institution:** Zhejiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.13168.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.13168.md) | | 05-19 | **How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings**
_{**Institution:** The Ohio State University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2305.11853v3)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.11853.md) | | 05-19 | **RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought.**
_{**Institution:** Nanjing University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.11499.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.11499.md) | | 05-17 | **Tree of Thoughts: Deliberate Problem Solving with Large Language Models**
_{**Institution:** Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.10601.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.10601.md) | | 05-10 | **ReAct: Synergizing Reasoning and Acting in Language Models**
_{**Institution:** Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2210.03629.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2210.03629.md) | | 05-05 | **Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework**
_{**Institution:** Nanyang Technological University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.03268.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.03268.md) | ### Agent | Date | Paper | Links & Summary | | --- | --- | --- | | 05-23 | **AGILE: A Novel Framework of LLM Agents**
_{**Institution:** ByteDance Research, University of Science and Technology of China, Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.14751v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.14751.md) | | 05-23 | **Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration**
_{**Institution:** Tsinghua University, Northwestern Polytechnical University, Shanghai AI Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.14314v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.14314.md) | | 05-20 | **Octo: An Open-Source Generalist Robot Policy**
_{**Institution:** UC Berkeley, Stanford}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.12213v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.12213.md) | | 05-07 | **Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation**
_{**Institution:** Center for Responsible AI, IIT Madras, Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.04325v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.04325.md) | | 05-06 | **MARE: Multi-Agents Collaboration Framework for Requirements Engineering**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.03256v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.03256.md) | | 04-18 | **mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture**
_{**Institution:** Beihang University, Beijing Information Science and Technology University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12135v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.12135.md) | | 04-17 | **AgentKit: Flow Engineering with Graphs, not Coding**
_{**Institution:** Carnegie Mellon University, NVIDIA, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11483v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.11483.md) | | 04-02 | **CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models**
_{**Institution:** East China Jiaotong University, Guangdong University of Technology, University of Toronto}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01663v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.01663.md) | | 03-25 | **AIOS: LLM Agent Operating System**
_{**Institution:** Rutgers University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.16971v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.16971.md) | | 03-15 | **VideoAgent: Long-form Video Understanding with Large Language Model as Agent**
_{**Institution:** Stanford University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.10517v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.10517.md) | | 03-08 | **Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering**
_{**Institution:** Gaoling School of Artificial Intelligence Renmin University of China, Nankai University, Beijing Academy of Artificial Intelligence}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.05217v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.05217.md) | | 02-27 | **Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization**
_{**Institution:** Zhejiang University, Institute of Software Chinese Academy of Sciences, Nanjing University of Posts and Telecommunications}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17574v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.17574.md) | | 02-26 | **LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16499v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.16499.md) | | 02-22 | **OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14658v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.14658.md) | | 02-02 | **Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions**
_{**Institution:** Megagon Labs, Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01108v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.01108.md) | | 02-02 | **AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback**
_{**Institution:** Tsinghua University, Ant Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01469v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.01469.md) | | 01-30 | **Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate**
_{**Institution:** Shanghai Jiao Tong University, Carnegie Mellon University, Shanghai Artificial Intelligence Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16788v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.16788.md) | | 01-29 | **Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis**
_{**Institution:** Harbin Institute of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16107v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.16107.md) | | 01-23 | **AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12963v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.12963.md) | | 01-22 | **PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety**
_{**Institution:** Shanghai Artificial Intelligence Laboratory, Dalian University of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.11880v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.1188.md) | | 01-19 | **Tool-LMM: A Large Multi-Modal Model for Tool Agent Learning**
_{**Institution:** ShanghaiTech University, Meituan, UniDT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10727v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.10727.md) | | 01-14 | **Small LLMs Are Weak Tool Learners: A Multi-LLM Agent**
_{**Institution:** Sun Yat-sen University, Alibaba Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.07324v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.07324.md) | | 01-11 | **EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction**
_{**Institution:** Fudan University, Microsoft Research Asia, Zhejiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06201v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06201.md) | | 01-10 | **AUTOACT: Automatic Agent Learning from Scratch via Self-Planning**
_{**Institution:** Zhejiang University, Alibaba Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05268v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05268.md) | | 01-10 | **Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk**
_{**Institution:** AWS AI Labs}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05033v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05033.md) | | 01-09 | **Agent Alignment in Evolving Social Norms**
_{**Institution:** Fudan University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04620v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.0462.md) | | 01-08 | **SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems**
_{**Institution:** Fudan University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03945v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03945.md) | | 01-07 | **Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects**
_{**Institution:** The Chinese University of Hong Kong, DeepWisdom, Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03428v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03428.md) | | 01-06 | **CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models**
_{**Institution:** Harbin Institute of Technology, Kuaishou Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08438v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.08438.md) | | 01-05 | **From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models**
_{**Institution:** Beike Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02777v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.02777.md) | | 12-28 | **GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension**
_{**Institution:** Tsinghua University, Renmin University of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17294v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17294.md) | | 12-28 | **Experiential Co-Learning of Software-Developing Agents**
_{**Institution:** Tsinghua University,Dalian University of Technology,Beijing University of Posts and Telecommunications}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17025v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17025.md) | | 12-22 | **Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning**
_{**Institution:** Huawei Noah's Ark Lab, University College London, University of Oxford}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.14878v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.14878.md) | | 12-21 | **De novo Drug Design using Reinforcement Learning with Multiple GPT Agents**
_{**Institution:** Tsinghua University, Microsoft Research AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06155v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06155.md) | | 12-21 | **AppAgent: Multimodal Agents as Smartphone Users**
_{**Institution:** Tencent}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.13771v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.13771.md) | | 12-20 | **AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation**
_{**Institution:** The University of Hong Kong, Shanghai Jiao Tong University, King’s College London}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.13010v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1301.md) | | 12-20 | **AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation**
_{**Institution:** The University of Hong Kong, Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.13010v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1301.md) | | 12-18 | **Agent-based Learning of Materials Datasets from Scientific Literature**
_{**Institution:** University of Toronto}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11690v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1169.md) | | 12-18 | **Social Learning: Towards Collaborative Learning with Large Language Models**
_{**Institution:** Google, EPFL}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11441v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.11441.md) | | 12-15 | **ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent**
_{**Institution:** Google}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10003v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10003.md) | | 12-14 | **Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.08926v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.08926.md) | | 12-08 | **PaperQA: Retrieval-Augmented Generative Agent for Scientific Research**
_{**Institution:** RAND Corporation, Carnegie Mellon University, LangChain}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.07559v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.07559.md) | | 12-07 | **An LLM Compiler for Parallel Function Calling**
_{**Institution:** UC Berkeley, ICSI, LBNL}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.04511v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.04511.md) | | 12-06 | **Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia**
_{**Institution:** Google DeepMind, Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.03664v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.03664.md) | | 12-05 | **Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction**
_{**Institution:** Zhejiang Lab, Ant Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.03022v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.03022.md) | | 11-30 | **Autonomous Agents in Software Development: A Vision Paper**
_{**Institution:** Tampere University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.18440v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.1844.md) | | 11-29 | **TaskWeaver: A Code-First Agent Framework**
_{**Institution:** Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.17541v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.17541.md) | | 11-29 | **Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering**
_{**Institution:** Sun Yat-Sen University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.17331v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.17331.md) | | 11-28 | **AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.16468v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.16468.md) | | 11-27 | **RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks**
_{**Institution:** Chinese Academy of Sciences, Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.15649v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.15649.md) | | 11-23 | **Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach**
_{**Institution:** Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.13884v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.13884.md) | | 11-18 | **An Embodied Generalist Agent in 3D World**
_{**Institution:** Beijing Institute for General Artificial Intelligence}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.12871v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.12871.md) | | 11-16 | **Predictive Minds: LLMs As Atypical Active Inference Agents**
_{**Institution:** Charles University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.10215v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.10215.md) | | 11-14 | **KTRL+F: Knowledge-Augmented In-Document Search**
_{**Institution:** KAIST AI, Samsung Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.08329v3)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.08329.md) | | 11-06 | **MetaGPT: Meta Programming for Multi-Agent Collaborative Framework**
_{**Institution:** DeepWisdom, King Abdullah University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2308.00352.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2308.00352.md) | | 10-16 | **OpenAgents: An Open Platform for Language Agents in the Wild**
_{**Institution:** The University of Hong Kong, XLang Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.10634.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.10634.md) | | 10-16 | **Theory of Mind for Multi-Agent Collaboration via Large Language Models**
_{**Institution:** University of Pittsburgh}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.10701.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.10701.md) | | 09-29 | **AutoAgents: A Framework for Automatic Agent Generation**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.17288.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.17288.md) | | 09-29 | **ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving**
_{**Institution:** Tsinghua University, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.17452.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.17452.md) | | 09-14 | **Agents: An Open-source Framework for Autonomous Language Agents**
_{**Institution:** AIWaves Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.07870.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.0787.md) | | 08-21 | **AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2308.10848.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2308.10848.md) | | 08-21 | **GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems**
_{**Institution:** University of Waterloo}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2308.10435.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2308.10435.md) | | 08-16 | **AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation**
_{**Institution:** Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2308.08155.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2308.08155.md) | | 07-25 | **WebArena: A Realistic Web Environment for Building Autonomous Agents**
_{**Institution:** Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2307.13854.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2307.13854.md) | | 07-24 | **A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2307.12856.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2307.12856.md) | | 07-16 | **Communicative Agents for Software Development**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2307.07924.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2307.07924.md) | | 07-14 | **Language models show human-like content effects on reasoning tasks**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2207.07051.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2207.07051.md) | | 07-10 | **RoCo: Dialectic Multi-Robot Collaboration with Large Language Models**
_{**Institution:** Columbia University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2307.04738.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2307.04738.md) | | 06-13 | **Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control**
_{**Institution:** Nanyang Technological University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2306.07863.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2306.07863.md) | | 05-23 | **Improving Factuality and Reasoning in Language Models through Multiagent Debate**
_{**Institution:** MIT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.14325.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.14325.md) | | 05-21 | **Augmenting Autotelic Agents with Large Language Models**
_{**Institution:** MIT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.12487.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.12487.md) | | 03-31 | **CAMEL: Communicative Agents for Mind Exploration of Large Language Model Society**
_{**Institution:** King Abdullah University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2303.17760.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2303.1776.md) | ### Knowledge and Retrieval | Date | Paper | Links & Summary | | --- | --- | --- | | 05-20 | **Multiple-Choice Questions are Efficient and Robust LLM Evaluators**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.11966v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.11966.md) | | 05-20 | **xFinder: Robust and Pinpoint Answer Extraction for Large Language Models**
_{**Institution:** Institute for Advanced Algorithms Research, Shanghai,Renmin University of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.11874v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.11874.md) | | 05-16 | **SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation**
_{**Institution:** Amazon, The University of Texas at Austin}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.10040v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.1004.md) | | 05-16 | **SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation**
_{**Institution:** Amazon, The University of Texas at Austin}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.10040v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.1004.md) | | 05-10 | **UniDM: A Unified Framework for Data Manipulation with Large Language Models**
_{**Institution:** Alibaba Group, University of Science and Technology of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.06510v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.0651.md) | | 05-10 | **Automatic Generation of Model and Data Cards: A Step Towards Responsible AI**
_{**Institution:** CMU, MPI, ETH Zürich}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.06258v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.06258.md) | | 05-09 | **Can large language models understand uncommon meanings of common words?**
_{**Institution:** Tsinghua University, Chinese Academy of Science}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.05741v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.05741.md) | | 05-08 | **"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations**
_{**Institution:** University of Washington, MBZUAI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.05378v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.05378.md) | | 05-06 | **Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning**
_{**Institution:** East China Normal University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.03279v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.03279.md) | | 05-02 | **Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models**
_{**Institution:** KAIST AI, LG AI Research, Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.01535v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.01535.md) | | 04-30 | **Multi-hop Question Answering over Knowledge Graphs using Large Language Models**
_{**Institution:** Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.19234v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.19234.md) | | 04-29 | **Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models**
_{**Institution:** Cohere}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.18796v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.18796.md) | | 04-26 | **A Comprehensive Evaluation on Event Reasoning of Large Language Models**
_{**Institution:** Peking University, Advanced Institute of Big Data, Beihang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.17513v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.17513.md) | | 04-24 | **From Local to Global: A Graph RAG Approach to Query-Focused Summarization**
_{**Institution:** Microsoft Research, Microsoft Strategic Missions and Technologies, Microsoft Office of the CTO}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.16130v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.1613.md) | | 04-23 | **CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies**
_{**Institution:** Stanford University, IBM Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.15238v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.15238.md) | | 04-22 | **Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph**
_{**Institution:** University of California San Diego, Carnegie Mellon University, University of Pennsylvania}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.14372v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.14372.md) | | 04-22 | **SnapKV: LLM Knows What You are Looking for Before Generation**
_{**Institution:** University of Illinois Urbana-Champaign, Cohere, Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.14469v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.14469.md) | | 04-22 | **LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation**
_{**Institution:** Meituan}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.14043v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.14043.md) | | 04-22 | **Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering**
_{**Institution:** Tencent Inc., Harbin Institute of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.14464v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.14464.md) | | 04-18 | **RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation**
_{**Institution:** Peking University, ByteDance Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12457v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.12457.md) | | 04-16 | **How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior**
_{**Institution:** Stanford University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.10198v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.10198.md) | | 04-15 | **Compression Represents Intelligence Linearly**
_{**Institution:** The Hong Kong University of Science and Technology, Tencent}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.09937v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.09937.md) | | 04-11 | **Rho-1: Not All Tokens Are What You Need**
_{**Institution:** Xiamen University, Tsinghua University, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07965v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.07965.md) | | 04-11 | **OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments**
_{**Institution:** The University of Hong Kong, CMU, Salesforce Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.07972v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.07972.md) | | 04-10 | **Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation**
_{**Institution:** Apple, Cupertino, CA, USA}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06910v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.0691.md) | | 04-09 | **RULER: What's the Real Context Size of Your Long-Context Language Models?**
_{**Institution:** NVIDIA}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06654v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.06654.md) | | 04-09 | **Event-enhanced Retrieval in Real-time Search**
_{**Institution:** Tencent Search, Platform and Content Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05989v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.05989.md) | | 04-08 | **LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding**
_{**Institution:** Meta}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05825v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.05825.md) | | 04-02 | **Long-context LLMs Struggle with Long In-context Learning**
_{**Institution:** University of Waterloo, Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02060v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.0206.md) | | 04-02 | **Long-context LLMs Struggle with Long In-context Learning**
_{**Institution:** University of Waterloo, Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02060v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.0206.md) | | 04-01 | **Mapping the Increasing Use of LLMs in Scientific Papers**
_{**Institution:** Stanford University, UC Santa Barbara}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.01268v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.01268.md) | | 04-01 | **LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation**
_{**Institution:** Microsoft Research Asia}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.00998v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.00998.md) | | 03-27 | **BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models**
_{**Institution:** DCST Tsinghua University, Beijing Institute of Technology, Huawei Cloud BU}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.18365v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.18365.md) | | 03-26 | **The Unreasonable Ineffectiveness of the Deeper Layers**
_{**Institution:** Meta FAIR, UMD}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.17887v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.17887.md) | | 03-26 | **COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning**
_{**Institution:** Shenzhen Institute of Advanced Technology, CAS; M-A-P; Institute of Automation, CAS}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.18058v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.18058.md) | | 03-18 | **Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression**
_{**Institution:** University of Texas at Austin, Drexel University, MIT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.15447v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.15447.md) | | 03-15 | **RAFT: Adapting Language Model to Domain Specific RAG**
_{**Institution:** UC Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.10131v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.10131.md) | | 03-15 | **Uni-SMART: Universal Science Multimodal Analysis and Research Transformer**
_{**Institution:** DP Technology, AI for Science Institute Beijing}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.10301v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.10301.md) | | 03-11 | **RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback**
_{**Institution:** Zhejiang University, Southeast University, Massachusetts Institute of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.06840v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.0684.md) | | 03-07 | **Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference**
_{**Institution:** UC Berkeley, Stanford, UCSD}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.04132v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.04132.md) | | 03-05 | **MathScale: Scaling Instruction Tuning for Mathematical Reasoning**
_{**Institution:** The Chinese University of Hong Kong Shenzhen, China; Microsoft Research Asia, Beijing, China; Shenzhen Research Institute of Big Data, Shenzhen, China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.02884v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.02884.md) | | 02-27 | **REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering**
_{**Institution:** Gaoling School of Artificial Intelligence Renmin University of China, School of Information Renmin University of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.17497v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.17497.md) | | 02-25 | **ChatMusician: Understanding and Generating Music Intrinsically with LLM**
_{**Institution:** Hong Kong University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.16153v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.16153.md) | | 02-22 | **CriticBench: Benchmarking LLMs for Critique-Correct Reasoning**
_{**Institution:** Tsinghua University, University of Hong Kong}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.14809v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.14809.md) | | 02-20 | **TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization**
_{**Institution:** AWS AI Labs, The University of Texas at Austin, KAIST}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.13249v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.13249.md) | | 02-14 | **Premise Order Matters in Reasoning with Large Language Models**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.08939v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.08939.md) | | 02-01 | **Can Large Language Models Understand Context?**
_{**Institution:** Georgetown University, Apple}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.00858v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.00858.md) | | 02-01 | **HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent**
_{**Institution:** Amazon, University of Milano-Bicocca}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.01018v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.01018.md) | | 01-31 | **LongAlign: A Recipe for Long Context Alignment of Large Language Models**
_{**Institution:** Tsinghua University, Zhipu.AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.18058v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.18058.md) | | 01-30 | **Incoherent Probability Judgments in Large Language Models**
_{**Institution:** Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.16646v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.16646.md) | | 01-27 | **MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries**
_{**Institution:** Hong Kong University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.15391v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.15391.md) | | 01-24 | **Can AI Assistants Know What They Don't Know?**
_{**Institution:** Fudan University, Shanghai Artificial Intelligence Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13275v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.13275.md) | | 01-24 | **Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction**
_{**Institution:** Nanjing University of Science and Technology, Northeastern University, Singapore Institute of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13598v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.13598.md) | | 01-24 | **Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption**
_{**Institution:** Tsinghua University, Zhongguancun Laboratory, XinJiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13444v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.13444.md) | | 01-24 | **AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents**
_{**Institution:** The University of Hong Kong, Zhejiang University, Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13178v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.13178.md) | | 01-22 | **CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation**
_{**Institution:** Stanford University, Stability AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12208v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.12208.md) | | 01-21 | **Interactive AI with Retrieval-Augmented Generation for Next Generation Networking**
_{**Institution:** Nanyang Technological University, Guangdong University of Technology, Institute for Infocomm Research, Agency for Science Technology and Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.11391v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.11391.md) | | 01-17 | **LLMs for Relational Reasoning: How Far are We?**
_{**Institution:** Continental-NTU Corporate Lab, Nanyang Technological University, Singapore}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.09042v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.09042.md) | | 01-16 | **RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture**
_{**Institution:** Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08406v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.08406.md) | | 01-16 | **Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models**
_{**Institution:** Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.08350v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.0835.md) | | 01-15 | **A Study on Large Language Models' Limitations in Multiple-Choice Question Answering**
_{**Institution:** David R. Cheriton School of Computer Science}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.07955v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.07955.md) | | 01-12 | **Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation**
_{**Institution:** Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Weixu Zhang, Xinrun Du, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu, Ge Zhang}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06477v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06477.md) | | 01-12 | **How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs**
_{**Institution:** Virginia Tech, Renmin University of China, UC Davis}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06373v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06373.md) | | 01-11 | **TOFU: A Task of Fictitious Unlearning for LLMs**
_{**Institution:** Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06121v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06121.md) | | 01-11 | **LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase**
_{**Institution:** LAIR Lab Lehigh University, Huazhong University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05952v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05952.md) | | 01-10 | **Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing**
_{**Institution:** Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04881v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.04881.md) | | 01-10 | **CASA: Causality-driven Argument Sufficiency Assessment**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05249v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05249.md) | | 01-10 | **InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.05507v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.05507.md) | | 01-09 | **Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search**
_{**Institution:** Nanyang Technological University Singapore}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.04514v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.04514.md) | | 01-04 | **SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval**
_{**Institution:** Columbia University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.02369v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.02369.md) | | 01-02 | **LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.01325v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.01325.md) | | 01-01 | **The Earth is Flat? Unveiling Factual Errors in Large Language Models**
_{**Institution:** The Chinese University of Hong Kong, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00761v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.00761.md) | | 12-31 | **Improving Text Embeddings with Large Language Models**
_{**Institution:** Microsoft Corporation}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00368v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.00368.md) | | 12-31 | **BatchEval: Towards Human-like Text Evaluation**
_{**Institution:** Beijing Institute of Technology, Xiaohongshu Inc}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.00437v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.00437.md) | | 12-29 | **Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception**
_{**Institution:** **Institution:** Shanghai Key Laboratory of Data Science School of Computer Science Fudan University, School of Data Science Fudan University, DataGrand Co. LTD}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17532v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17532.md) | | 12-28 | **Structured Packing in LLM Training Improves Long Context Utilization**
_{**Institution:** University of Warsaw, Google DeepMind, Polish Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17296v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17296.md) | | 12-26 | **Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models**
_{**Institution:** Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science Peking University, Beijing China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15883v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.15883.md) | | 12-25 | **ESGReveal: An LLM-based approach for extracting structured data from ESG reports**
_{**Institution:** Alibaba Cloud, Tsinghua University, Sun Yat-Sen University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.17264v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.17264.md) | | 12-22 | **VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation**
_{**Institution:** University of Waterloo, IN.AI Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.14867v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.14867.md) | | 12-19 | **A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11870v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1187.md) | | 12-19 | **Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes**
_{**Institution:** University of Cambridge}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.12112v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.12112.md) | | 12-18 | **G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model**
_{**Institution:** Huawei Noah's Ark Lab, The University of Hong Kong, The Hong Kong University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11370v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1137.md) | | 12-18 | **NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation**
_{**Institution:** University of Waterloo, Huawei Noah’s Ark Lab, FEEC-Unicamp Brazil}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11361v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.11361.md) | | 12-18 | **"Paraphrasing The Original Text" Makes High Accuracy Long-Context QA**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11193v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.11193.md) | | 12-17 | **Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10750v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1075.md) | | 12-16 | **RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation**
_{**Institution:** CAS Key Lab of Network Data Science and Technology ICT CAS, University of Chinese Academy of Sciences Beijing China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10466v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10466.md) | | 12-16 | **ProTIP: Progressive Tool Retrieval Improves Planning**
_{**Institution:** Apple}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10332v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10332.md) | | 12-16 | **CoAScore: Chain-of-Aspects Prompting for NLG Evaluation**
_{**Institution:** GSAI Renmin University of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10355v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10355.md) | | 12-16 | **RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models**
_{**Institution:** Science Foundation Ireland (SFI), JSPS KAKENHI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10463v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10463.md) | | 12-15 | **No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models**
_{**Institution:** Fudan University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09494v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.09494.md) | | 12-15 | **Generative Context-aware Fine-tuning of Self-supervised Speech Models**
_{**Institution:** ASAPP, Carnegie Mellon University, Toyota Technological Institute at Chicago}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09895v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.09895.md) | | 12-15 | **Faithful Persona-based Conversational Dataset Generation with Large Language Models**
_{**Institution:** University of Southern California, Google, Information Sciences Institute}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10007v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10007.md) | | 12-15 | **Challenges with unsupervised LLM knowledge discovery**
_{**Institution:** Google DeepMind, Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.10029v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.10029.md) | | 12-15 | **KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know**
_{**Institution:** Apple}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.11539v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.11539.md) | | 12-14 | **Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning**
_{**Institution:** Peking University, DeepSeek-AI, The University of Hong Kong}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.08935v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.08935.md) | | 12-14 | **Entity-Augmented Code Generation**
_{**Institution:** JetBrains}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.08976v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.08976.md) | | 12-14 | **Towards Verifiable Text Generation with Evolving Memory and Self-Reflection**
_{**Institution:** Peking University, Chinese Academy of Sciences, Baidu Inc}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09075v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.09075.md) | | 12-14 | **TinyGSM: achieving >80% on GSM8k with small language models**
_{**Institution:** Carnegie Mellon University, Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09241v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.09241.md) | | 12-14 | **Self-Evaluation Improves Selective Generation in Large Language Models**
_{**Institution:** Google DeepMind, Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.09300v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.093.md) | | 12-12 | **LLMEval: A Preliminary Study on How to Evaluate Large Language Models**
_{**Institution:** Fudan University, Shanghai Jiaotong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.07398v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.07398.md) | | 12-12 | **diff History for Long-Context Language Agents**
_{**Institution:** New York University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.07540v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.0754.md) | | 12-11 | **Honeybee: Locality-enhanced Projector for Multimodal LLM**
_{**Institution:** Kakao Brain}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.06742v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.06742.md) | | 12-11 | **Dense X Retrieval: What Retrieval Granularity Should We Use?**
_{**Institution:** University of Washington, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.06648v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.06648.md) | | 12-10 | **Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs**
_{**Institution:** Microsoft Israel}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.05934v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.05934.md) | | 12-08 | **Using Program Knowledge Graph to Uncover Software Vulnerabilities**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.04818v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.04818.md) | | 12-07 | **CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models**
_{**Institution:** MPI for Intelligent Systems, University of Washington}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.04350v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.0435.md) | | 12-05 | **A Hardware Evaluation Framework for Large Language Model Inference**
_{**Institution:** Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.03134v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.03134.md) | | 12-04 | **Competition-Level Problems are Effective LLM Evaluators**
_{**Institution:** Microsoft Research Asia, Xiamen University, Microsoft Azure AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.02143v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.02143.md) | | 12-04 | **ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions**
_{**Institution:** Nanyang Technological University, National University of Singapore}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01661v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01661.md) | | 12-03 | **D-Bot: Database Diagnosis System using Large Language Models**
_{**Institution:** Tsinghua University, Pigsty, ModelBest}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01454v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01454.md) | | 12-03 | **TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents**
_{**Institution:** University of Southern California, Google Cloud AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01279v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01279.md) | | 12-03 | **Running cognitive evaluations on large language models: The do's and the don'ts**
_{**Institution:** Massachusetts Institute of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.01276v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.01276.md) | | 12-01 | **Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games**
_{**Institution:** Quebec AI Institute}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.00746v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.00746.md) | | 12-01 | **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
_{**Institution:** University of Wisconsin - Madison}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.00960v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.0096.md) | | 12-01 | **The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models**
_{**Institution:** University of Wisconsin - Madison}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.00960v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.0096.md) | | 11-30 | **TaskBench: Benchmarking Large Language Models for Task Automation**
_{**Institution:** Zhejiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.18760v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.1876.md) | | 11-30 | **What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations**
_{**Institution:** Comcast Applied AI, University of Waterloo}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.18812v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.18812.md) | | 11-29 | **Are Large Language Models Good Fact Checkers: A Preliminary Study**
_{**Institution:** Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.17355v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.17355.md) | | 11-29 | **TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models**
_{**Institution:** Harbin Institute of Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.17667v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.17667.md) | | 11-26 | **UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation**
_{**Institution:** Renmin University of Chin}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2311.15296.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.15296.md) | | 11-21 | **Do Smaller Language Models Answer Contextualised Questions Through Memorisation Or Generalisation?**
_{**Institution:** University of Auckland}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.12337v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.12337.md) | | 11-21 | **Oasis: Data Curation and Assessment System for Pretraining of Large Language Models**
_{**Institution:** Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.12537v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.12537.md) | | 11-21 | **How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks**
_{**Institution:** University of Pennsylvania, MIT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.12997v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.12997.md) | | 11-20 | **GPQA: A Graduate-Level Google-Proof Q&A Benchmark**
_{**Institution:** New York University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.12022v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.12022.md) | | 11-20 | **Continual Learning: Applications and the Road Forward**
_{**Institution:** KU Leuven}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.11908v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.11908.md) | | 11-16 | **MacGyver: Are Large Language Models Creative Problem Solvers?**
_{**Institution:** University of California, Princeton University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.09682v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.09682.md) | | 11-15 | **ToolTalk: Evaluating Tool-Usage in a Conversational Setting**
_{**Institution:** Microsoft Corporation}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.10775v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.10775.md) | | 11-14 | **Instruction-Following Evaluation for Large Language Models**
_{**Institution:** Google, Yale University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.07911v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.07911.md) | | 11-10 | **Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking**
_{**Institution:** Helvia.ai}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.06102v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.06102.md) | | 10-17 | **Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection**
_{**Institution:** University of Washington}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2310.11511v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.11511.md) | | 10-11 | **OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models**
_{**Institution:** Tsinghua University, Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2310.07637v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.07637.md) | | 10-10 | **A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.06498.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.06498.md) | | 10-10 | **The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets**
_{**Institution:** Northeastern University, MIT}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.06824.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.06824.md) | | 09-26 | **RAGAS: Automated Evaluation of Retrieval Augmented Generation**
_{**Institution:** Cardiff University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.15217.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.15217.md) | | 09-04 | **Benchmarking Large Language Models in Retrieval-Augmented Generation**
_{**Institution:** Chinese Information Processing Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2309.01431v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.01431.md) | | 06-15 | **KoLA: Carefully Benchmarking World Knowledge of Large Language Models**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2306.09296.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2306.09296.md) | | 06-07 | **Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering**
_{**Institution:** KAIST, MBZUAI, Amazon}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2306.04136.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2306.04136.md) | | 05-29 | **G-EVAL: NLG Evaluation using GPT-4 with Better Human Alignment**
_{**Institution:** Microsoft Cognitive Services Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2303.16634.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2303.16634.md) | | 05-24 | **In-Context Demonstration Selection with Cross Entropy Difference**
_{**Institution:** Microsoft Cognitive Service Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2305.14726v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.14726.md) | | 05-16 | **StructGPT: A General Framework for Large Language Model to Reason over Structured Data**
_{**Institution:** Gaoling School of Artificial Intelligence, Renmin University of China.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.09645.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.09645.md) | | 02-08 | **A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity**
_{**Institution:** Centre for Artificial Intelligence Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2302.04023v4)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2302.04023.md) | ### Alignment and Hallucination | Date | Paper | Links & Summary | | --- | --- | --- | | 05-23 | **Agent Planning with World Knowledge Model**
_{**Institution:** Zhejiang University, Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, National University of Singapore, Alibaba Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.14205v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.14205.md) | | 05-23 | **RaFe: Ranking Feedback Improves Query Rewriting for RAG**
_{**Institution:** Zhejiang University, Alibaba Group, Nanjing University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.14431v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.14431.md) | | 05-23 | **RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models**
_{**Institution:** Amazon AWS AI, Shanghai AI Lab, Shanghai Jiaotong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.14486v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.14486.md) | | 05-14 | **Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs**
_{**Institution:** Carnegie Mellon University, Allen Institute for AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.08760v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.0876.md) | | 05-08 | **ADELIE: Aligning Large Language Models on Information Extraction**
_{**Institution:** Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.05008v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.05008.md) | | 05-01 | **Can a Hallucinating Model help in Reducing Human "Hallucination"?**
_{**Institution:** Stanford University, UC Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.00843v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.00843.md) | | 05-01 | **The Real, the Better: Aligning Large Language Models with Online Human Behaviors**
_{**Institution:** Baidu Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.00578v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.00578.md) | | 04-30 | **Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.19509v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.19509.md) | | 04-26 | **When to Trust LLMs: Aligning Confidence with Response Quality**
_{**Institution:** Alibaba Group}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.17287v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.17287.md) | | 04-18 | **Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers**
_{**Institution:** Westlake University, Alibaba Group, Zhejiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11960v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.1196.md) | | 04-18 | **Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences**
_{**Institution:** UC Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.12272v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.12272.md) | | 04-17 | **Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models**
_{**Institution:** Renmin University of China, Chinese Academy of Sciences, Huawei Technologies}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.11457v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.11457.md) | | 04-15 | **Learn Your Reference Model for Real Good Alignment**
_{**Institution:** Tinkoff}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.09656v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.09656.md) | | 04-10 | **Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking**
_{**Institution:** Renmin University of China, Tsinghua University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.06742v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.06742.md) | | 04-08 | **Know When To Stop: A Study of Semantic Drift in Text Generation**
_{**Institution:** FAIR, Meta, Anthropic}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.05411v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.05411.md) | | 04-02 | **Advancing LLM Reasoning Generalists with Preference Trees**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2404.02078v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2404.02078.md) | | 03-27 | **Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.18349v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.18349.md) | | 03-19 | **Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners**
_{**Institution:** University of Maryland}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.13198v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.13198.md) | | 03-13 | **Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework**
_{**Institution:** ByteDance Research, University of Maryland College Park, Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2403.08743v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2403.08743.md) | | 02-01 | **Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration**
_{**Institution:** University of Washington, University of California Berkeley, The Hong Kong University of Science and Technology}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2402.00367v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2402.00367.md) | | 01-25 | **Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning**
_{**Institution:** Columbia University, Microsoft Research, University of California Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.13986v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.13986.md) | | 01-25 | **True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning**
_{**Institution:** Nanyang Technological University, Zhejiang University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.14151v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.14151.md) | | 01-23 | **Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment**
_{**Institution:** Alibaba Inc.}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.12474v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.12474.md) | | 01-19 | **Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment**
_{**Institution:** Sun Yat-sen University, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.10768v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.10768.md) | | 01-11 | **Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models**
_{**Institution:** Google Research, Tel Aviv University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.06102v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.06102.md) | | 01-06 | **The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models**
_{**Institution:** Renmin University of China, Université de Montréal}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2401.03205v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2401.03205.md) | | 12-26 | **Aligning Large Language Models with Human Preferences through Representation Engineering**
_{**Institution:** Fudan University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15997v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.15997.md) | | 12-25 | **Alleviating Hallucinations of Large Language Models through Induced Hallucinations**
_{**Institution:** Soochow University, Tencent AI Lab}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.15710v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.1571.md) | | 12-22 | **Reasons to Reject? Aligning Language Models with Judgments**
_{**Institution:** Tencent AI Lab, The Chinese University of Hong Kong}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.14591v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.14591.md) | | 12-22 | **Large Language Model (LLM) Bias Index -- LLMBI**
_{**Institution:** University of Oxford, University Canada West, Amazon Web Services (AWS)}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.14769v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.14769.md) | | 12-15 | **WEAK-TO-STRONG GENERALIZATION: ELICITING STRONG CAPABILITIES WITH WEAK SUPERVISION**
_{**Institution:** OpenAI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://cdn.openai.com/papers/weak-to-strong-generalization.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/.md)

[![Blog](https://img.shields.io/badge/Blog-Posts-yellow?logo=rss)](https://mp.weixin.qq.com/s/f6YW-CxnLhnfMWTLg4M4Cw)

| | 12-11 | **Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models**
_{**Institution:** Salesforce AI Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.06149v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.06149.md) | | 12-09 | **Context Tuning for Retrieval Augmented Generation**
_{**Institution:** Apple}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.05708v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.05708.md) | | 12-02 | **Axiomatic Preference Modeling for Longform Question Answering**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.02206v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.02206.md) | | 12-01 | **Nash Learning from Human Feedback**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.00886v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.00886.md) | | 12-01 | **Instruction-tuning Aligns LLMs to the Human Brain**
_{**Institution:** EPFL}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2312.00575v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2312.00575.md) | | 11-28 | **Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization**
_{**Institution:** Shanghai AI Laboratory}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.16839v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.16839.md) | | 11-28 | **RELIC: Investigating Large Language Model Responses using Self-Consistency**
_{**Institution:** ETH Zurich}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.16842v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.16842.md) | | 11-24 | **Calibrated Language Models Must Hallucinate**
_{**Institution:** Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.14648v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.14648.md) | | 11-24 | **Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language**
_{**Institution:** Amazon}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.14543v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.14543.md) | | 11-23 | **ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs**
_{**Institution:** Google Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.13600v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.136.md) | | 11-18 | **RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability**
_{**Institution:** University of Science and Technology of China}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.10947v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.10947.md) | | 11-14 | **Learning to Filter Context for Retrieval-Augmented Generation**
_{**Institution:** Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2311.08377v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2311.08377.md) | | 10-24 | **Correction with Backtracking Reduces Hallucination in Summarization**
_{**Institution:** Google DeepMind, Cornell University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.16176.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.16176.md) | | 10-20 | **The History and Risks of Reinforcement Learning and Human Feedback**
_{**Institution:** Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2310.13595v2)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.13595.md) | | 10-19 | **Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong**
_{**Institution:** Stanford University, University of Maryland}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.12558.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.12558.md) | | 10-19 | **Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks**
_{**Institution:** University of Pennsylvania, Microsoft Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.12516.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.12516.md) | | 10-05 | **Evaluating Hallucinations in Chinese Large Language Models**
_{**Institution:** Fudan University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.03368.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.03368.md) | | 10-02 | **LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.01469.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.01469.md) | | 10-02 | **Tool-Augmented Reward Modeling**
_{**Institution:** Zhejiang University, Baidu}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.01045.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.01045.md) | | 09-30 | **AutoHall: Automated Hallucination Dataset Generation for Large Language Models**
_{**Institution:** Shanghai Jiao Tong University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2310.00259.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2310.00259.md) | | 09-28 | **Hallucination Reduction in Long Input Text Summarization**
_{**Institution:** Jadavpur University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.16781.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.16781.md) | | 09-25 | **Aligning Large Multimodal Models with Factually Augmented RLHF**
_{**Institution:** UC Berkeley, CMU}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.14525.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.14525.md) | | 09-20 | **Chain-of-Verification Reduces Hallucination in Large Language Models**
_{**Institution:** Meta AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.11495.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.11495.md) | | 09-18 | **Summarization is (Almost) Dead**
_{**Institution:** Peking University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2309.09558.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2309.09558.md) | | 08-22 | **Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models**
_{**Institution:** University of Pittsburgh, Pittsburgh, TikTok}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2308.11764v2.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2308.11764.md) | | 07-31 | **Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering**
_{**Institution:** Jadavpur University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2307.16877.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2307.16877.md) | | 06-09 | **Judging LLM-as-a-judge with MT-Bench and Chatbot Arena**
_{**Institution:** UC Berkeley}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2306.05685.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2306.05685.md) | | 05-26 | **Training Socially Aligned Language Models on Simulated Social Interactions**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.16960.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.1696.md) | | 05-24 | **Trusting Your Evidence: Hallucinate Less with Context-aware Decoding**
_{**Institution:** University of Washington}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.14739.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.14739.md) | | 05-22 | **LM vs LM: Detecting Factual Errors via Cross Examination**
_{**Institution:** Google DeepMind}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.13281.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.13281.md) | | 05-18 | **LIMA: Less Is More for Alignment**
_{**Institution:** Meta AI}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.11206.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.11206.md) | | 03-23 | **FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation**
_{**Institution:** University of Washington}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.14251.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.14251.md) | | 03-08 | **HistAlign: Improving Context Dependency in Language Generation by Aligning with History**
_{**Institution:** UNC Chapel Hill}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](https://arxiv.org/pdf/2305.04782.pdf)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2305.04782.md) | ### Application | Date | Paper | Links & Summary | | --- | --- | --- | | 05-23 | **PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services**
_{**Institution:** Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.14636v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.14636.md) | | 05-21 | **SmartFlow: Robotic Process Automation using LLMs**
_{**Institution:** TCS Research}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.12842v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.12842.md) | | 05-16 | **MarkLLM: An Open-Source Toolkit for LLM Watermarking**
_{**Institution:** Tsinghua University, Shanghai Jiao Tong University, The University of Sydney}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.10051v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.10051.md) | | 05-16 | **Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models**
_{**Institution:** Nanyang Technological University, University of Science and Technology of China, University of Aberdeen}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.10025v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.10025.md) | | 05-09 | **LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots**
_{**Institution:** New York University Abu Dhabi}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.05999v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.05999.md) | | 05-09 | **Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.05758v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.05758.md) | | 05-09 | **An Automatic Prompt Generation System for Tabular Data Tasks**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.05618v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.05618.md) | | 05-07 | **Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application**
_{**Institution:** Kuaishou Technology, Southeast University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.03988v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.03988.md) | | 05-07 | **Toward In-Context Teaching: Adapting Examples to Students' Misconceptions**
_{**Institution:** MIT CSAIL}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.04495v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.04495.md) | | 05-03 | **What matters when building vision-language models?**
_{**Institution:** Hugging Face, Sorbonne Université}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.02246v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.02246.md) | | 05-02 | **How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses**
_{**Institution:** Carnegie Mellon University}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.00970v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.0097.md) | | 05-01 | **"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust**
_{**Institution:** Princeton University, Microsoft}|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.00623v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.00623.md) | | 05-01 | **Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3**
|

[![arXiv](https://img.shields.io/badge/arXiv-Paper-%23D2691E?logo=arxiv)](http://arxiv.org/pdf/2405.00664v1)

[![Summary](https://img.shields.io/badge/Sum.-Read-blue?logo=dependabot)](summary/2023-02/2405.00664.md) | | | **How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites**
_{**Institution:** Shanghai AI Laboratory, SenseTime Research, Tsinghua University}|

llm-paper-daily 日常论文精选