Skip to content

Latest commit

 

History

History
641 lines (630 loc) · 290 KB

CATEGORIES.md

File metadata and controls

641 lines (630 loc) · 290 KB

llm-paper-daily 日常论文精选

Status 简体中文 badge English badge

欢迎来到 llm-paper-daily! 这是一个获取最新研究论文的每日更新和分类的平台。希望为爱好者提供 LLM 研究的前沿资讯,让您更轻松地了解该领域的最新发展。

目录

分类

Reasoning

 Date   Paper Links & Summary
05-16 Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language Models
Institution: BITS Pilani, MDSR Labs, Adobe, IIT Guhawati, National University of Singapore
arXiv
Summary
04-30 Iterative Reasoning Preference Optimization
Institution: FAIR at Meta, New York University
arXiv
Summary
04-22 Information Re-Organization Improves Reasoning in Large Language Models
Institution: Zhejiang University
arXiv
Summary
04-19 Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?
Institution: Nanyang Technological University, Princeton University, Salesforce Research
arXiv
Summary
04-18 Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
arXiv
Summary
04-18 EVIT: Event-Oriented Instruction Tuning for Event Reasoning
Institution: Key Laboratory of High Confidence Software Technologies (PKU), MOE, China, School of Computer Science, Peking University, Advanced Institute of Big Data
arXiv
Summary
04-17 Many-Shot In-Context Learning
Institution: Google DeepMind
arXiv
Summary
04-16 CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity
Institution: Intel Labs
arXiv
Summary
04-16 Self-playing Adversarial Language Game Enhances LLM Reasoning
Institution: Tencent AI Lab
arXiv
Summary
04-11 Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning
Institution: Nanyang Technological University
arXiv
Summary
04-09 THOUGHTSCULPT: Reasoning with Intermediate Revision and Search
Institution: UC Berkeley
arXiv
Summary
04-08 Evaluating Interventional Reasoning Capabilities of Large Language Models
Institution: Université de Montréal, Google DeepMind, ServiceNow Research
arXiv
Summary
04-07 Prompting Large Language Models for Zero-shot Essay Scoring via Multi-trait Specialization
Institution: Peking University
arXiv
Summary
03-22 Can large language models explore in-context?
Institution: Microsoft Research, Carnegie Mellon University
arXiv
Summary
03-20 Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts
Institution: University of Memphis, San Francisco Veterans Affairs Health Care System, University of California San Francisco
arXiv
Summary
03-13 Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments
Institution: Nanjing University, Microsoft
arXiv
Summary
03-11 ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis
Institution: Zhejiang University, Southeast University
arXiv
Summary
02-26 Do Large Language Models Latently Perform Multi-Hop Reasoning?
Institution: Google DeepMind, UCL, Google Research
arXiv
Summary
02-15 Chain-of-Thought Reasoning Without Prompting
Institution: Google DeepMind
arXiv
Summary
02-15 How to Train Data-Efficient LLMs
Institution: Google DeepMind, University of California San Diego, Texas A&M University
arXiv
Summary
02-15 A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Institution: Google DeepMind, Google Research
arXiv
Summary
02-09 InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning
Institution: Shanghai AI Laboratory, Tsinghua University, Fudan University School of Computer Science
arXiv
Summary
02-02 MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
Institution: UNC Chapel Hill.
arXiv
Summary
01-25 ConstraintChecker: A Plugin for Large Language Models to Reason on Commonsense Knowledge Bases
Institution: HKUST
arXiv
Summary
01-23 KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Institution: Samsung R&D Institute India - Bangalore
arXiv
Summary
01-22 Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation
Institution: Institute of Information Engineering, Chinese Academy of Sciences
arXiv
Summary
01-20 BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Institution: University of Illinois Urbana-Champaign, University of Washington, Western Washington University
arXiv
Summary
01-18 Self-Rewarding Language Models
Institution: Meta, NYU
arXiv
Summary
01-18 Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Institution: The University of Tokyo, RIKEN
arXiv
Summary
01-16 MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible Pipeline
Institution: Alibaba Group
arXiv
Summary
01-11 The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models
Institution: Johns Hopkins University
arXiv
Summary
01-11 Evidence to Generate (E2G): A Single-agent Two-step Prompting for Context Grounded and Retrieval Augmented Reasoning
Institution: Qatar Computing Research Institute
arXiv
Summary
01-11 Chain of History: Learning and Forecasting with LLMs for Temporal Knowledge Graph Completion
Institution: Tsinghua Shenzhen International Graduate School Tsinghua University, School of Computer Science Peking University, Baidu Inc.
arXiv
Summary
01-09 Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMs
Institution: Zhejiang University, Ant Group
arXiv
Summary
01-09 Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding
Institution: University of California San Diego, Google Cloud AI Research, Google Research
arXiv
Summary
01-09 The Critique of Critique
Institution: The Hong Kong Polytechnic University, Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
arXiv
Summary
01-08 TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series
Institution: IBM Research
arXiv
Summary
01-07 Grimoire is All You Need for Enhancing Large Language Models
Institution: Beihang University, Renmin University of China
arXiv
Summary
01-07 Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Institution: Beijing Academy of Artificial Intelligence, Renmin University of China, Nankai University
arXiv
Summary
01-06 Quartet Logic: A Four-Step Reasoning (QLFR) framework for advancing Short Text Classification
Institution: Aerospace Information Research Institute Chinese Academy of Sciences, Key Laboratory of Target Cognition and Application Technology, University of Chinese Academy of Sciences
arXiv
Summary
01-04 On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
Institution: University of South Carolina, New Mexico State University, IBM Research
arXiv
Summary
01-04 On the Prospects of Incorporating Large Language Models (LLMs) in Automated Planning and Scheduling (APS)
Institution: University of South Carolina, New Mexico State University, IBM Research
arXiv
Summary
01-04 Self-Contrast: Better Reflection Through Inconsistent Solving Perspectives
Institution: Zhejiang University, OPPO Research Institute
arXiv
Summary
01-04 ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers
Institution: Bytedance Inc.
arXiv
Summary
01-01 From Prompt Engineering to Prompt Science With Human in the Loop
Institution: University of Washington
arXiv
Summary
01-01 A & B == B & A: Triggering Logical Reasoning Failures in Large Language Models
Institution: The Chinese University of Hong Kong, Tencent AI Lab
arXiv
Summary
12-28 Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Institution: Chinese University of Hong Kong, Tencent AI Lab
arXiv
Summary
12-28 Grounding-Prompter: Prompting LLM with Multimodal Information for Temporal Sentence Grounding in Long Videos
Institution: Tsinghua University
arXiv
Summary
12-28 Improving In-context Learning via Bidirectional Alignment
Institution: Nanyang Technological University, Princeton University, Salesforce Research USA
arXiv
Summary
12-28 Challenge LLMs to Reason About Reasoning: A Benchmark to Unveil Cognitive Depth in LLMs
Institution: Chinese University of Hong Kong, Tencent AI Lab
arXiv
Summary
12-27 Rethinking Tabular Data Understanding with Large Language Models
Institution: UC San Diego, USC, UC Davis
arXiv
Summary
12-27 How Robust are LLMs to In-Context Majority Label Bias?
Institution: Amazon
arXiv
Summary
12-26 Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models
Institution: University of Waterloo
arXiv
Summary
12-26 KnowledgeNavigator: Leveraging Large Language Models for Enhanced Reasoning over Knowledge Graph
Institution: Northeastern University, Neusoft AI Magic Technology Research, Neusoft Institute of Intelligent Medical Research
arXiv
Summary
12-26 Supervised Knowledge Makes Large Language Models Better In-context Learners
Institution: School of Engineering Westlake University, Westlake Institute for Advanced Study, Peking University
arXiv
Summary
12-22 NPHardEval: Dynamic Benchmark on Reasoning Ability of Large Language Models via Complexity Classes
Institution: University of Michigan, Rutgers University
arXiv
Summary
12-21 The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Institution: MIT, Microsoft Research NYC
arXiv
Summary
12-21 On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning
Institution: Language Technology Lab University of Cambridge
arXiv
Summary
12-19 Active Preference Inference using Language Models and Probabilistic Reasoning
Institution: Cornell University, Cornell Tech
arXiv
Summary
12-18 Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Institution: University of Washington, Stanford University, Allen Institute for AI
arXiv
Summary
12-17 Mixed Distillation Helps Smaller Language Model Better Reasoning
Institution: Zhejiang University, Dalian Medical University
arXiv
Summary
12-15 ProCoT: Stimulating Critical Thinking and Writing of Students through Engagement with Large Language Models (LLMs)
Institution: Luleå University of Technology Sweden
arXiv
Summary
12-14 TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning
Institution: National University of Singapore, University of Illinois Urbana-Champaign, Microsoft
arXiv
Summary
12-14 Boosting LLM Reasoning: Push the Limits of Few-shot Learning with Reinforced In-Context Pruning
Institution: Hong Kong University of Science and Technology, Microsoft Research
arXiv
Summary
12-13 Efficient Toxic Content Detection by Bootstrapping and Distilling Large Language Models
Institution: University of Southern California, Amazon.com Inc.
arXiv
Summary
12-12 Comparable Demonstrations are Important in In-Context Learning: A Novel Perspective on Demonstration Selection
Institution: Shanghai Jiao Tong University
arXiv
Summary
12-11 On Meta-Prompting
Institution: Microsoft
arXiv
Summary
12-11 "What's important here?": Opportunities and Challenges of Using LLMs in Retrieving Information from Web Interfaces
Institution: Carnegie Mellon University
arXiv
Summary
12-11 MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples
Institution: Xiamen University, Tencent YouTu Lab
arXiv
Summary
12-07 A Study on the Calibration of In-context Learning
Institution: Harvard University
arXiv
Summary
12-07 Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration
Institution: Renmin University of China, Beijing Institute of Technology, HKUST (GZ)
arXiv
Summary
12-05 Prompt Optimization via Adversarial In-Context Learning
Institution: National University of Singapore, Hong Kong University of Science and Technology, Institute for Infocomm Research (I2R) A*STAR
arXiv
Summary
12-05 Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Institution: Sea AI Lab, Sun Yat-sen University, Harvard University
arXiv
Summary
12-04 On the Effectiveness of Large Language Models in Domain-Specific Code Generation
Institution: Shanghai Jiao Tong University, Chongqing University, East China Normal University
arXiv
Summary
12-04 The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning
Institution: Allen Institute for Artificial Intelligence, University of Washington
arXiv
Summary
12-04 Retrieval-augmented Multi-modal Chain-of-Thoughts Reasoning for Large Language Models
Institution: Xiamen University, MBZUAI, Tencent AI Lab
arXiv
Summary
12-04 Exchange-of-Thought: Enhancing Large Language Model Capabilities through Cross-Model Communication
Institution: Fudan University, National University of Singapore, Shanghai AI Laboratory
arXiv
Summary
12-02 Exploring and Improving the Spatial Reasoning Abilities of Large Language Models
Institution: Stanford University
arXiv
Summary
12-01 On Exploring the Reasoning Capability of Large Language Models with Knowledge Graphs
Institution: Singapore Management University, National Sun Yat-sen University
arXiv
Summary
11-30 Applying Large Language Models and Chain-of-Thought for Automatic Scoring
Institution: University of Georgia
arXiv
Summary
11-30 IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions
Institution: Huawei Poisson Lab
arXiv
Summary
11-29 Zero-shot Conversational Summarization Evaluations with small Large Language Models
Institution: Intel labs
arXiv
Summary
11-29 Understanding and Improving In-Context Learning on Vision-language Models
Institution: LMU Munich, University of Oxford
arXiv
Summary
11-23 Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions
Institution: Tsinghua University
arXiv
Summary
11-22 Enhancing Summarization Performance through Transformer-Based Prompt Engineering in Automated Medical Reporting
Institution: Utrecht University
arXiv
Summary
11-22 Visual In-Context Prompting
Institution: HKUST, Microsoft Research
arXiv
Summary
11-20 Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents
Institution: Shanghai Jiao Tong University
arXiv
Summary
11-19 TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Institution: SenseTime Researc
arXiv
Summary
11-18 Orca 2: Teaching Small Language Models How to Reason
Institution: Microsoft Research
arXiv
Summary
11-17 Exploring the Relationship between In-Context Learning and Instruction Tuning
Institution: HKUST
arXiv
Summary
11-16 Crafting In-context Examples according to LMs' Parametric Knowledge
Institution: The University of Texas at Austin
arXiv
Summary
11-16 Automatic Engineering of Long Prompts
Institution: Google
arXiv
Summary
11-15 Contrastive Chain-of-Thought Prompting
Institution: DAMO Academy, Alibaba Group
arXiv
Summary
11-15 Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models
Institution: Tecent AI Lab
arXiv
Summary
11-13 In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax
Institution: NYU, Microsoft
arXiv
Summary
11-11 In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Institution: Stanford University
arXiv
Summary
10-31 Learning to Reason and Memorize with Self-Notes
Institution: Meta AI
arXiv
Summary
09-19 AutoMix: Automatically Mixing Language Models
Institution: Carnegie Mellon University
arXiv
Summary
09-12 Re-Reading Improves Reasoning in Language Models
Institution: Institute of Information Engineering, CAS
arXiv
Summary
07-11 Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps
Institution: UNIVERSITY OF MARYLAND
arXiv
Summary
05-26 Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models
Institution: Singapore Management University
arXiv
Summary
05-26 Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models
Institution: Shanghai Jiao Tong University
arXiv
Summary
05-26 MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting
Institution: Kyoto University
arXiv
Summary
05-23 Improving Factuality and Reasoning in Language Models through Multiagent Debate
Institution: MIT
arXiv
Summary
05-23 ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models
Institution: Gaoling School of Artificial Intelligence, Renmin University of China
arXiv
Summary
05-22 LLMs for Knowledge Graph Construction and Reasoning: Recent Capabilities and Future Opportunities
Institution: Zhejiang University
arXiv
Summary
05-19 How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings
Institution: The Ohio State University
arXiv
Summary
05-19 RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought.
Institution: Nanjing University
arXiv
Summary
05-17 Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Institution: Princeton University
arXiv
Summary
05-10 ReAct: Synergizing Reasoning and Acting in Language Models
Institution: Princeton University
arXiv
Summary
05-05 Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework
Institution: Nanyang Technological University
arXiv
Summary

Agent

 Date   Paper Links & Summary
05-23 AGILE: A Novel Framework of LLM Agents
Institution: ByteDance Research, University of Science and Technology of China, Shanghai Jiao Tong University
arXiv
Summary
05-23 Towards Efficient LLM Grounding for Embodied Multi-Agent Collaboration
Institution: Tsinghua University, Northwestern Polytechnical University, Shanghai AI Laboratory
arXiv
Summary
05-20 Octo: An Open-Source Generalist Robot Policy
Institution: UC Berkeley, Stanford
arXiv
Summary
05-07 Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation
Institution: Center for Responsible AI, IIT Madras, Princeton University
arXiv
Summary
05-06 MARE: Multi-Agents Collaboration Framework for Requirements Engineering
Institution: Peking University
arXiv
Summary
04-18 mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture
Institution: Beihang University, Beijing Information Science and Technology University
arXiv
Summary
04-17 AgentKit: Flow Engineering with Graphs, not Coding
Institution: Carnegie Mellon University, NVIDIA, Microsoft
arXiv
Summary
04-02 CMAT: A Multi-Agent Collaboration Tuning Framework for Enhancing Small Language Models
Institution: East China Jiaotong University, Guangdong University of Technology, University of Toronto
arXiv
Summary
03-25 AIOS: LLM Agent Operating System
Institution: Rutgers University
arXiv
Summary
03-15 VideoAgent: Long-form Video Understanding with Large Language Model as Agent
Institution: Stanford University
arXiv
Summary
03-08 Harnessing Multi-Role Capabilities of Large Language Models for Open-Domain Question Answering
Institution: Gaoling School of Artificial Intelligence Renmin University of China, Nankai University, Beijing Academy of Artificial Intelligence
arXiv
Summary
02-27 Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization
Institution: Zhejiang University, Institute of Software Chinese Academy of Sciences, Nanjing University of Posts and Telecommunications
arXiv
Summary
02-26 LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
arXiv
Summary
02-22 OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement
arXiv
Summary
02-02 Reasoning Capacity in Multi-Agent Systems: Limitations, Challenges and Human-Centered Solutions
Institution: Megagon Labs, Carnegie Mellon University
arXiv
Summary
02-02 AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback
Institution: Tsinghua University, Ant Group
arXiv
Summary
01-30 Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
Institution: Shanghai Jiao Tong University, Carnegie Mellon University, Shanghai Artificial Intelligence Laboratory
arXiv
Summary
01-29 Beyond Direct Diagnosis: LLM-based Multi-Specialist Agent Consultation for Automatic Diagnosis
Institution: Harbin Institute of Technology
arXiv
Summary
01-23 AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents
Institution: Google DeepMind
arXiv
Summary
01-22 PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
Institution: Shanghai Artificial Intelligence Laboratory, Dalian University of Technology
arXiv
Summary
01-19 Tool-LMM: A Large Multi-Modal Model for Tool Agent Learning
Institution: ShanghaiTech University, Meituan, UniDT
arXiv
Summary
01-14 Small LLMs Are Weak Tool Learners: A Multi-LLM Agent
Institution: Sun Yat-sen University, Alibaba Group
arXiv
Summary
01-11 EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction
Institution: Fudan University, Microsoft Research Asia, Zhejiang University
arXiv
Summary
01-10 AUTOACT: Automatic Agent Learning from Scratch via Self-Planning
Institution: Zhejiang University, Alibaba Group
arXiv
Summary
01-10 Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk
Institution: AWS AI Labs
arXiv
Summary
01-09 Agent Alignment in Evolving Social Norms
Institution: Fudan University
arXiv
Summary
01-08 SpeechAgents: Human-Communication Simulation with Multi-Modal Multi-Agent Systems
Institution: Fudan University
arXiv
Summary
01-07 Exploring Large Language Model based Intelligent Agents: Definitions, Methods, and Prospects
Institution: The Chinese University of Hong Kong, DeepWisdom, Peking University
arXiv
Summary
01-06 CogGPT: Unleashing the Power of Cognitive Dynamics on Large Language Models
Institution: Harbin Institute of Technology, Kuaishou Technology
arXiv
Summary
01-05 From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models
Institution: Beike Inc.
arXiv
Summary
12-28 GitAgent: Facilitating Autonomous Agent with GitHub by Tool Extension
Institution: Tsinghua University, Renmin University of China
arXiv
Summary
12-28 Experiential Co-Learning of Software-Developing Agents
Institution: Tsinghua University,Dalian University of Technology,Beijing University of Posts and Telecommunications
arXiv
Summary
12-22 Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning
Institution: Huawei Noah's Ark Lab, University College London, University of Oxford
arXiv
Summary
12-21 De novo Drug Design using Reinforcement Learning with Multiple GPT Agents
Institution: Tsinghua University, Microsoft Research AI
arXiv
Summary
12-21 AppAgent: Multimodal Agents as Smartphone Users
Institution: Tencent
arXiv
Summary
12-20 AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Institution: The University of Hong Kong, Shanghai Jiao Tong University, King’s College London
arXiv
Summary
12-20 AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation
Institution: The University of Hong Kong, Shanghai Jiao Tong University
arXiv
Summary
12-18 Agent-based Learning of Materials Datasets from Scientific Literature
Institution: University of Toronto
arXiv
Summary
12-18 Social Learning: Towards Collaborative Learning with Large Language Models
Institution: Google, EPFL
arXiv
Summary
12-15 ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Institution: Google
arXiv
Summary
12-14 Modeling Complex Mathematical Reasoning via Large Language Model based MathAgent
Institution: Shanghai Jiao Tong University
arXiv
Summary
12-08 PaperQA: Retrieval-Augmented Generative Agent for Scientific Research
Institution: RAND Corporation, Carnegie Mellon University, LangChain
arXiv
Summary
12-07 An LLM Compiler for Parallel Function Calling
Institution: UC Berkeley, ICSI, LBNL
arXiv
Summary
12-06 Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia
Institution: Google DeepMind, Google Research
arXiv
Summary
12-05 Beyond Isolation: Multi-Agent Synergy for Improving Knowledge Graph Construction
Institution: Zhejiang Lab, Ant Group
arXiv
Summary
11-30 Autonomous Agents in Software Development: A Vision Paper
Institution: Tampere University
arXiv
Summary
11-29 TaskWeaver: A Code-First Agent Framework
Institution: Microsoft
arXiv
Summary
11-29 Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering
Institution: Sun Yat-Sen University
arXiv
Summary
11-28 AvatarGPT: All-in-One Framework for Motion Understanding, Planning, Generation and Beyond
arXiv
Summary
11-27 RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Institution: Chinese Academy of Sciences, Peking University
arXiv
Summary
11-23 Controlling Large Language Model-based Agents for Large-Scale Decision-Making: An Actor-Critic Approach
Institution: Chinese Academy of Sciences
arXiv
Summary
11-18 An Embodied Generalist Agent in 3D World
Institution: Beijing Institute for General Artificial Intelligence
arXiv
Summary
11-16 Predictive Minds: LLMs As Atypical Active Inference Agents
Institution: Charles University
arXiv
Summary
11-14 KTRL+F: Knowledge-Augmented In-Document Search
Institution: KAIST AI, Samsung Research
arXiv
Summary
11-06 MetaGPT: Meta Programming for Multi-Agent Collaborative Framework
Institution: DeepWisdom, King Abdullah University of Science and Technology
arXiv
Summary
10-16 OpenAgents: An Open Platform for Language Agents in the Wild
Institution: The University of Hong Kong, XLang Lab
arXiv
Summary
10-16 Theory of Mind for Multi-Agent Collaboration via Large Language Models
Institution: University of Pittsburgh
arXiv
Summary
09-29 AutoAgents: A Framework for Automatic Agent Generation
Institution: Peking University
arXiv
Summary
09-29 ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving
Institution: Tsinghua University, Microsoft
arXiv
Summary
09-14 Agents: An Open-source Framework for Autonomous Language Agents
Institution: AIWaves Inc.
arXiv
Summary
08-21 AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Institution: Tsinghua University
arXiv
Summary
08-21 GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems
Institution: University of Waterloo
arXiv
Summary
08-16 AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
Institution: Microsoft Research
arXiv
Summary
07-25 WebArena: A Realistic Web Environment for Building Autonomous Agents
Institution: Carnegie Mellon University
arXiv
Summary
07-24 A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Institution: Google DeepMind
arXiv
Summary
07-16 Communicative Agents for Software Development
Institution: Tsinghua University
arXiv
Summary
07-14 Language models show human-like content effects on reasoning tasks
Institution: Google DeepMind
arXiv
Summary
07-10 RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Institution: Columbia University
arXiv
Summary
06-13 Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Institution: Nanyang Technological University
arXiv
Summary
05-23 Improving Factuality and Reasoning in Language Models through Multiagent Debate
Institution: MIT
arXiv
Summary
05-21 Augmenting Autotelic Agents with Large Language Models
Institution: MIT
arXiv
Summary
03-31 CAMEL: Communicative Agents for Mind Exploration of Large Language Model Society
Institution: King Abdullah University of Science and Technology
arXiv
Summary

Knowledge and Retrieval

 Date   Paper Links & Summary
05-20 Multiple-Choice Questions are Efficient and Robust LLM Evaluators
Institution: Shanghai Jiao Tong University
arXiv
Summary
05-20 xFinder: Robust and Pinpoint Answer Extraction for Large Language Models
Institution: Institute for Advanced Algorithms Research, Shanghai,Renmin University of China
arXiv
Summary
05-16 SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation
Institution: Amazon, The University of Texas at Austin
arXiv
Summary
05-16 SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation
Institution: Amazon, The University of Texas at Austin
arXiv
Summary
05-10 UniDM: A Unified Framework for Data Manipulation with Large Language Models
Institution: Alibaba Group, University of Science and Technology of China
arXiv
Summary
05-10 Automatic Generation of Model and Data Cards: A Step Towards Responsible AI
Institution: CMU, MPI, ETH Zürich
arXiv
Summary
05-09 Can large language models understand uncommon meanings of common words?
Institution: Tsinghua University, Chinese Academy of Science
arXiv
Summary
05-08 "They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Institution: University of Washington, MBZUAI
arXiv
Summary
05-06 Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning
Institution: East China Normal University
arXiv
Summary
05-02 Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models
Institution: KAIST AI, LG AI Research, Carnegie Mellon University
arXiv
Summary
04-30 Multi-hop Question Answering over Knowledge Graphs using Large Language Models
Institution: Microsoft
arXiv
Summary
04-29 Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Institution: Cohere
arXiv
Summary
04-26 A Comprehensive Evaluation on Event Reasoning of Large Language Models
Institution: Peking University, Advanced Institute of Big Data, Beihang University
arXiv
Summary
04-24 From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Institution: Microsoft Research, Microsoft Strategic Missions and Technologies, Microsoft Office of the CTO
arXiv
Summary
04-23 CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies
Institution: Stanford University, IBM Research
arXiv
Summary
04-22 Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph
Institution: University of California San Diego, Carnegie Mellon University, University of Pennsylvania
arXiv
Summary
04-22 SnapKV: LLM Knows What You are Looking for Before Generation
Institution: University of Illinois Urbana-Champaign, Cohere, Princeton University
arXiv
Summary
04-22 LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation
Institution: Meituan
arXiv
Summary
04-22 Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering
Institution: Tencent Inc., Harbin Institute of Technology
arXiv
Summary
04-18 RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
Institution: Peking University, ByteDance Inc.
arXiv
Summary
04-16 How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior
Institution: Stanford University
arXiv
Summary
04-15 Compression Represents Intelligence Linearly
Institution: The Hong Kong University of Science and Technology, Tencent
arXiv
Summary
04-11 Rho-1: Not All Tokens Are What You Need
Institution: Xiamen University, Tsinghua University, Microsoft
arXiv
Summary
04-11 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Institution: The University of Hong Kong, CMU, Salesforce Research
arXiv
Summary
04-10 Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation
Institution: Apple, Cupertino, CA, USA
arXiv
Summary
04-09 RULER: What's the Real Context Size of Your Long-Context Language Models?
Institution: NVIDIA
arXiv
Summary
04-09 Event-enhanced Retrieval in Real-time Search
Institution: Tencent Search, Platform and Content Group
arXiv
Summary
04-08 LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding
Institution: Meta
arXiv
Summary
04-02 Long-context LLMs Struggle with Long In-context Learning
Institution: University of Waterloo, Carnegie Mellon University
arXiv
Summary
04-02 Long-context LLMs Struggle with Long In-context Learning
Institution: University of Waterloo, Carnegie Mellon University
arXiv
Summary
04-01 Mapping the Increasing Use of LLMs in Scientific Papers
Institution: Stanford University, UC Santa Barbara
arXiv
Summary
04-01 LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation
Institution: Microsoft Research Asia
arXiv
Summary
03-27 BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models
Institution: DCST Tsinghua University, Beijing Institute of Technology, Huawei Cloud BU
arXiv
Summary
03-26 The Unreasonable Ineffectiveness of the Deeper Layers
Institution: Meta FAIR, UMD
arXiv
Summary
03-26 COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning
Institution: Shenzhen Institute of Advanced Technology, CAS; M-A-P; Institute of Automation, CAS
arXiv
Summary
03-18 Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Institution: University of Texas at Austin, Drexel University, MIT
arXiv
Summary
03-15 RAFT: Adapting Language Model to Domain Specific RAG
Institution: UC Berkeley
arXiv
Summary
03-15 Uni-SMART: Universal Science Multimodal Analysis and Research Transformer
Institution: DP Technology, AI for Science Institute Beijing
arXiv
Summary
03-11 RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback
Institution: Zhejiang University, Southeast University, Massachusetts Institute of Technology
arXiv
Summary
03-07 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Institution: UC Berkeley, Stanford, UCSD
arXiv
Summary
03-05 MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Institution: The Chinese University of Hong Kong Shenzhen, China; Microsoft Research Asia, Beijing, China; Shenzhen Research Institute of Big Data, Shenzhen, China
arXiv
Summary
02-27 REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering
Institution: Gaoling School of Artificial Intelligence Renmin University of China, School of Information Renmin University of China
arXiv
Summary
02-25 ChatMusician: Understanding and Generating Music Intrinsically with LLM
Institution: Hong Kong University of Science and Technology
arXiv
Summary
02-22 CriticBench: Benchmarking LLMs for Critique-Correct Reasoning
Institution: Tsinghua University, University of Hong Kong
arXiv
Summary
02-20 TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Institution: AWS AI Labs, The University of Texas at Austin, KAIST
arXiv
Summary
02-14 Premise Order Matters in Reasoning with Large Language Models
Institution: Google DeepMind
arXiv
Summary
02-01 Can Large Language Models Understand Context?
Institution: Georgetown University, Apple
arXiv
Summary
02-01 HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent
Institution: Amazon, University of Milano-Bicocca
arXiv
Summary
01-31 LongAlign: A Recipe for Long Context Alignment of Large Language Models
Institution: Tsinghua University, Zhipu.AI
arXiv
Summary
01-30 Incoherent Probability Judgments in Large Language Models
Institution: Princeton University
arXiv
Summary
01-27 MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries
Institution: Hong Kong University of Science and Technology
arXiv
Summary
01-24 Can AI Assistants Know What They Don't Know?
Institution: Fudan University, Shanghai Artificial Intelligence Laboratory
arXiv
Summary
01-24 Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction
Institution: Nanjing University of Science and Technology, Northeastern University, Singapore Institute of Technology
arXiv
Summary
01-24 Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption
Institution: Tsinghua University, Zhongguancun Laboratory, XinJiang University
arXiv
Summary
01-24 AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents
Institution: The University of Hong Kong, Zhejiang University, Shanghai Jiao Tong University
arXiv
Summary
01-22 CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation
Institution: Stanford University, Stability AI
arXiv
Summary
01-21 Interactive AI with Retrieval-Augmented Generation for Next Generation Networking
Institution: Nanyang Technological University, Guangdong University of Technology, Institute for Infocomm Research, Agency for Science Technology and Research
arXiv
Summary
01-17 LLMs for Relational Reasoning: How Far are We?
Institution: Continental-NTU Corporate Lab, Nanyang Technological University, Singapore
arXiv
Summary
01-16 RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture
Institution: Microsoft
arXiv
Summary
01-16 Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models
Institution: Tencent AI Lab
arXiv
Summary
01-15 A Study on Large Language Models' Limitations in Multiple-Choice Question Answering
Institution: David R. Cheriton School of Computer Science
arXiv
Summary
01-12 Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation
Institution: Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Weixu Zhang, Xinrun Du, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu, Ge Zhang
arXiv
Summary
01-12 How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs
Institution: Virginia Tech, Renmin University of China, UC Davis
arXiv
Summary
01-11 TOFU: A Task of Fictitious Unlearning for LLMs
Institution: Carnegie Mellon University
arXiv
Summary
01-11 LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase
Institution: LAIR Lab Lehigh University, Huazhong University of Science and Technology
arXiv
Summary
01-10 Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing
Institution: Google Research
arXiv
Summary
01-10 CASA: Causality-driven Argument Sufficiency Assessment
Institution: Peking University
arXiv
Summary
01-10 InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
arXiv
Summary
01-09 Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search
Institution: Nanyang Technological University Singapore
arXiv
Summary
01-04 SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval
Institution: Columbia University
arXiv
Summary
01-02 LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
arXiv
Summary
01-01 The Earth is Flat? Unveiling Factual Errors in Large Language Models
Institution: The Chinese University of Hong Kong, Tencent AI Lab
arXiv
Summary
12-31 Improving Text Embeddings with Large Language Models
Institution: Microsoft Corporation
arXiv
Summary
12-31 BatchEval: Towards Human-like Text Evaluation
Institution: Beijing Institute of Technology, Xiaohongshu Inc
arXiv
Summary
12-29 Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception
Institution: Institution: Shanghai Key Laboratory of Data Science School of Computer Science Fudan University, School of Data Science Fudan University, DataGrand Co. LTD
arXiv
Summary
12-28 Structured Packing in LLM Training Improves Long Context Utilization
Institution: University of Warsaw, Google DeepMind, Polish Academy of Sciences
arXiv
Summary
12-26 Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models
Institution: Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science Peking University, Beijing China
arXiv
Summary
12-25 ESGReveal: An LLM-based approach for extracting structured data from ESG reports
Institution: Alibaba Cloud, Tsinghua University, Sun Yat-Sen University
arXiv
Summary
12-22 VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Institution: University of Waterloo, IN.AI Research
arXiv
Summary
12-19 A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT
arXiv
Summary
12-19 Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes
Institution: University of Cambridge
arXiv
Summary
12-18 G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model
Institution: Huawei Noah's Ark Lab, The University of Hong Kong, The Hong Kong University of Science and Technology
arXiv
Summary
12-18 NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation
Institution: University of Waterloo, Huawei Noah’s Ark Lab, FEEC-Unicamp Brazil
arXiv
Summary
12-18 "Paraphrasing The Original Text" Makes High Accuracy Long-Context QA
Institution: Tsinghua University
arXiv
Summary
12-17 Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach
Institution: Shanghai Jiao Tong University
arXiv
Summary
12-16 RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation
Institution: CAS Key Lab of Network Data Science and Technology ICT CAS, University of Chinese Academy of Sciences Beijing China
arXiv
Summary
12-16 ProTIP: Progressive Tool Retrieval Improves Planning
Institution: Apple
arXiv
Summary
12-16 CoAScore: Chain-of-Aspects Prompting for NLG Evaluation
Institution: GSAI Renmin University of China
arXiv
Summary
12-16 RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models
Institution: Science Foundation Ireland (SFI), JSPS KAKENHI
arXiv
Summary
12-15 No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models
Institution: Fudan University
arXiv
Summary
12-15 Generative Context-aware Fine-tuning of Self-supervised Speech Models
Institution: ASAPP, Carnegie Mellon University, Toyota Technological Institute at Chicago
arXiv
Summary
12-15 Faithful Persona-based Conversational Dataset Generation with Large Language Models
Institution: University of Southern California, Google, Information Sciences Institute
arXiv
Summary
12-15 Challenges with unsupervised LLM knowledge discovery
Institution: Google DeepMind, Google Research
arXiv
Summary
12-15 KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know
Institution: Apple
arXiv
Summary
12-14 Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning
Institution: Peking University, DeepSeek-AI, The University of Hong Kong
arXiv
Summary
12-14 Entity-Augmented Code Generation
Institution: JetBrains
arXiv
Summary
12-14 Towards Verifiable Text Generation with Evolving Memory and Self-Reflection
Institution: Peking University, Chinese Academy of Sciences, Baidu Inc
arXiv
Summary
12-14 TinyGSM: achieving >80% on GSM8k with small language models
Institution: Carnegie Mellon University, Microsoft Research
arXiv
Summary
12-14 Self-Evaluation Improves Selective Generation in Large Language Models
Institution: Google DeepMind, Google Research
arXiv
Summary
12-12 LLMEval: A Preliminary Study on How to Evaluate Large Language Models
Institution: Fudan University, Shanghai Jiaotong University
arXiv
Summary
12-12 diff History for Long-Context Language Agents
Institution: New York University
arXiv
Summary
12-11 Honeybee: Locality-enhanced Projector for Multimodal LLM
Institution: Kakao Brain
arXiv
Summary
12-11 Dense X Retrieval: What Retrieval Granularity Should We Use?
Institution: University of Washington, Tencent AI Lab
arXiv
Summary
12-10 Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs
Institution: Microsoft Israel
arXiv
Summary
12-08 Using Program Knowledge Graph to Uncover Software Vulnerabilities
arXiv
Summary
12-07 CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models
Institution: MPI for Intelligent Systems, University of Washington
arXiv
Summary
12-05 A Hardware Evaluation Framework for Large Language Model Inference
Institution: Princeton University
arXiv
Summary
12-04 Competition-Level Problems are Effective LLM Evaluators
Institution: Microsoft Research Asia, Xiamen University, Microsoft Azure AI
arXiv
Summary
12-04 ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions
Institution: Nanyang Technological University, National University of Singapore
arXiv
Summary
12-03 D-Bot: Database Diagnosis System using Large Language Models
Institution: Tsinghua University, Pigsty, ModelBest
arXiv
Summary
12-03 TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents
Institution: University of Southern California, Google Cloud AI
arXiv
Summary
12-03 Running cognitive evaluations on large language models: The do's and the don'ts
Institution: Massachusetts Institute of Technology
arXiv
Summary
12-01 Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games
Institution: Quebec AI Institute
arXiv
Summary
12-01 The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models
Institution: University of Wisconsin - Madison
arXiv
Summary
12-01 The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models
Institution: University of Wisconsin - Madison
arXiv
Summary
11-30 TaskBench: Benchmarking Large Language Models for Task Automation
Institution: Zhejiang University
arXiv
Summary
11-30 What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations
Institution: Comcast Applied AI, University of Waterloo
arXiv
Summary
11-29 Are Large Language Models Good Fact Checkers: A Preliminary Study
Institution: Chinese Academy of Sciences
arXiv
Summary
11-29 TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
Institution: Harbin Institute of Technology
arXiv
Summary
11-26 UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation
Institution: Renmin University of Chin
arXiv
Summary
11-21 Do Smaller Language Models Answer Contextualised Questions Through Memorisation Or Generalisation?
Institution: University of Auckland
arXiv
Summary
11-21 Oasis: Data Curation and Assessment System for Pretraining of Large Language Models
Institution: Chinese Academy of Sciences
arXiv
Summary
11-21 How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks
Institution: University of Pennsylvania, MIT
arXiv
Summary
11-20 GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Institution: New York University
arXiv
Summary
11-20 Continual Learning: Applications and the Road Forward
Institution: KU Leuven
arXiv
Summary
11-16 MacGyver: Are Large Language Models Creative Problem Solvers?
Institution: University of California, Princeton University
arXiv
Summary
11-15 ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Institution: Microsoft Corporation
arXiv
Summary
11-14 Instruction-Following Evaluation for Large Language Models
Institution: Google, Yale University
arXiv
Summary
11-10 Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking
Institution: Helvia.ai
arXiv
Summary
10-17 Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Institution: University of Washington
arXiv
Summary
10-11 OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models
Institution: Tsinghua University, Chinese Academy of Sciences
arXiv
Summary
10-10 A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection
Institution: Peking University
arXiv
Summary
10-10 The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Institution: Northeastern University, MIT
arXiv
Summary
09-26 RAGAS: Automated Evaluation of Retrieval Augmented Generation
Institution: Cardiff University
arXiv
Summary
09-04 Benchmarking Large Language Models in Retrieval-Augmented Generation
Institution: Chinese Information Processing Laboratory
arXiv
Summary
06-15 KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Institution: Tsinghua University
arXiv
Summary
06-07 Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering
Institution: KAIST, MBZUAI, Amazon
arXiv
Summary
05-29 G-EVAL: NLG Evaluation using GPT-4 with Better Human Alignment
Institution: Microsoft Cognitive Services Research
arXiv
Summary
05-24 In-Context Demonstration Selection with Cross Entropy Difference
Institution: Microsoft Cognitive Service Research
arXiv
Summary
05-16 StructGPT: A General Framework for Large Language Model to Reason over Structured Data
Institution: Gaoling School of Artificial Intelligence, Renmin University of China.
arXiv
Summary
02-08 A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity
Institution: Centre for Artificial Intelligence Research
arXiv
Summary

Alignment and Hallucination

 Date   Paper Links & Summary
05-23 Agent Planning with World Knowledge Model
Institution: Zhejiang University, Zhejiang University - Ant Group Joint Laboratory of Knowledge Graph, National University of Singapore, Alibaba Group
arXiv
Summary
05-23 RaFe: Ranking Feedback Improves Query Rewriting for RAG
Institution: Zhejiang University, Alibaba Group, Nanjing University
arXiv
Summary
05-23 RefChecker: Reference-based Fine-grained Hallucination Checker and Benchmark for Large Language Models
Institution: Amazon AWS AI, Shanghai AI Lab, Shanghai Jiaotong University
arXiv
Summary
05-14 Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs
Institution: Carnegie Mellon University, Allen Institute for AI
arXiv
Summary
05-08 ADELIE: Aligning Large Language Models on Information Extraction
Institution: Tsinghua University
arXiv
Summary
05-01 Can a Hallucinating Model help in Reducing Human "Hallucination"?
Institution: Stanford University, UC Berkeley
arXiv
Summary
05-01 The Real, the Better: Aligning Large Language Models with Online Human Behaviors
Institution: Baidu Inc.
arXiv
Summary
04-30 Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom
Institution: Shanghai Jiao Tong University
arXiv
Summary
04-26 When to Trust LLMs: Aligning Confidence with Response Quality
Institution: Alibaba Group
arXiv
Summary
04-18 Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers
Institution: Westlake University, Alibaba Group, Zhejiang University
arXiv
Summary
04-18 Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences
Institution: UC Berkeley
arXiv
Summary
04-17 Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models
Institution: Renmin University of China, Chinese Academy of Sciences, Huawei Technologies
arXiv
Summary
04-15 Learn Your Reference Model for Real Good Alignment
Institution: Tinkoff
arXiv
Summary
04-10 Transferable and Efficient Non-Factual Content Detection via Probe Training with Offline Consistency Checking
Institution: Renmin University of China, Tsinghua University
arXiv
Summary
04-08 Know When To Stop: A Study of Semantic Drift in Text Generation
Institution: FAIR, Meta, Anthropic
arXiv
Summary
04-02 Advancing LLM Reasoning Generalists with Preference Trees
arXiv
Summary
03-27 Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback
arXiv
Summary
03-19 Towards Robots That Know When They Need Help: Affordance-Based Uncertainty for Large Language Model Planners
Institution: University of Maryland
arXiv
Summary
03-13 Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework
Institution: ByteDance Research, University of Maryland College Park, Carnegie Mellon University
arXiv
Summary
02-01 Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
Institution: University of Washington, University of California Berkeley, The Hong Kong University of Science and Technology
arXiv
Summary
01-25 Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning
Institution: Columbia University, Microsoft Research, University of California Berkeley
arXiv
Summary
01-25 True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
Institution: Nanyang Technological University, Zhejiang University
arXiv
Summary
01-23 Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment
Institution: Alibaba Inc.
arXiv
Summary
01-19 Mitigating Hallucinations of Large Language Models via Knowledge Consistent Alignment
Institution: Sun Yat-sen University, Tencent AI Lab
arXiv
Summary
01-11 Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models
Institution: Google Research, Tel Aviv University
arXiv
Summary
01-06 The Dawn After the Dark: An Empirical Study on Factuality Hallucination in Large Language Models
Institution: Renmin University of China, Université de Montréal
arXiv
Summary
12-26 Aligning Large Language Models with Human Preferences through Representation Engineering
Institution: Fudan University
arXiv
Summary
12-25 Alleviating Hallucinations of Large Language Models through Induced Hallucinations
Institution: Soochow University, Tencent AI Lab
arXiv
Summary
12-22 Reasons to Reject? Aligning Language Models with Judgments
Institution: Tencent AI Lab, The Chinese University of Hong Kong
arXiv
Summary
12-22 Large Language Model (LLM) Bias Index -- LLMBI
Institution: University of Oxford, University Canada West, Amazon Web Services (AWS)
arXiv
Summary
12-15 WEAK-TO-STRONG GENERALIZATION: ELICITING STRONG CAPABILITIES WITH WEAK SUPERVISION
Institution: OpenAI
arXiv
Summary
Blog
12-11 Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models
Institution: Salesforce AI Research
arXiv
Summary
12-09 Context Tuning for Retrieval Augmented Generation
Institution: Apple
arXiv
Summary
12-02 Axiomatic Preference Modeling for Longform Question Answering
arXiv
Summary
12-01 Nash Learning from Human Feedback
Institution: Google DeepMind
arXiv
Summary
12-01 Instruction-tuning Aligns LLMs to the Human Brain
Institution: EPFL
arXiv
Summary
11-28 Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization
Institution: Shanghai AI Laboratory
arXiv
Summary
11-28 RELIC: Investigating Large Language Model Responses using Self-Consistency
Institution: ETH Zurich
arXiv
Summary
11-24 Calibrated Language Models Must Hallucinate
Institution: Microsoft Research
arXiv
Summary
11-24 Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Institution: Amazon
arXiv
Summary
11-23 ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Institution: Google Research
arXiv
Summary
11-18 RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability
Institution: University of Science and Technology of China
arXiv
Summary
11-14 Learning to Filter Context for Retrieval-Augmented Generation
Institution: Carnegie Mellon University
arXiv
Summary
10-24 Correction with Backtracking Reduces Hallucination in Summarization
Institution: Google DeepMind, Cornell University
arXiv
Summary
10-20 The History and Risks of Reinforcement Learning and Human Feedback
Institution: Berkeley
arXiv
Summary
10-19 Large Language Models Help Humans Verify Truthfulness -- Except When They Are Convincingly Wrong
Institution: Stanford University, University of Maryland
arXiv
Summary
10-19 Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks
Institution: University of Pennsylvania, Microsoft Research
arXiv
Summary
10-05 Evaluating Hallucinations in Chinese Large Language Models
Institution: Fudan University
arXiv
Summary
10-02 LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples
Institution: Peking University
arXiv
Summary
10-02 Tool-Augmented Reward Modeling
Institution: Zhejiang University, Baidu
arXiv
Summary
09-30 AutoHall: Automated Hallucination Dataset Generation for Large Language Models
Institution: Shanghai Jiao Tong University
arXiv
Summary
09-28 Hallucination Reduction in Long Input Text Summarization
Institution: Jadavpur University
arXiv
Summary
09-25 Aligning Large Multimodal Models with Factually Augmented RLHF
Institution: UC Berkeley, CMU
arXiv
Summary
09-20 Chain-of-Verification Reduces Hallucination in Large Language Models
Institution: Meta AI
arXiv
Summary
09-18 Summarization is (Almost) Dead
Institution: Peking University
arXiv
Summary
08-22 Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models
Institution: University of Pittsburgh, Pittsburgh, TikTok
arXiv
Summary
07-31 Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering
Institution: Jadavpur University
arXiv
Summary
06-09 Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Institution: UC Berkeley
arXiv
Summary
05-26 Training Socially Aligned Language Models on Simulated Social Interactions
Institution: Google DeepMind
arXiv
Summary
05-24 Trusting Your Evidence: Hallucinate Less with Context-aware Decoding
Institution: University of Washington
arXiv
Summary
05-22 LM vs LM: Detecting Factual Errors via Cross Examination
Institution: Google DeepMind
arXiv
Summary
05-18 LIMA: Less Is More for Alignment
Institution: Meta AI
arXiv
Summary
03-23 FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation
Institution: University of Washington
arXiv
Summary
03-08 HistAlign: Improving Context Dependency in Language Generation by Aligning with History
Institution: UNC Chapel Hill
arXiv
Summary

Application

 Date   Paper Links & Summary
05-23 PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services
Institution: Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences
arXiv
Summary
05-21 SmartFlow: Robotic Process Automation using LLMs
Institution: TCS Research
arXiv
Summary
05-16 MarkLLM: An Open-Source Toolkit for LLM Watermarking
Institution: Tsinghua University, Shanghai Jiao Tong University, The University of Sydney
arXiv
Summary
05-16 Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Institution: Nanyang Technological University, University of Science and Technology of China, University of Aberdeen
arXiv
Summary
05-09 LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots
Institution: New York University Abu Dhabi
arXiv
Summary
05-09 Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma
arXiv
Summary
05-09 An Automatic Prompt Generation System for Tabular Data Tasks
arXiv
Summary
05-07 Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Institution: Kuaishou Technology, Southeast University
arXiv
Summary
05-07 Toward In-Context Teaching: Adapting Examples to Students' Misconceptions
Institution: MIT CSAIL
arXiv
Summary
05-03 What matters when building vision-language models?
Institution: Hugging Face, Sorbonne Université
arXiv
Summary
05-02 How Can I Get It Right? Using GPT to Rephrase Incorrect Trainee Responses
Institution: Carnegie Mellon University
arXiv
Summary
05-01 "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust
Institution: Princeton University, Microsoft
arXiv
Summary
05-01 Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3
arXiv
Summary
04-25 How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Institution: Shanghai AI Laboratory, SenseTime Research, Tsinghua University
arXiv
Summary
04-19 LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency
Institution: Nanyang Technological University, DAMO Academy Alibaba Group, Singapore University of Technology and Design
arXiv
Summary
04-17 A Deep Dive into Large Language Models for Automated Bug Localization and Repair
Institution: University of Virginia, Purdue University, Amazon Web Services
arXiv
Summary
04-14 Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development
arXiv
Summary
04-11 ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past
Institution: Baylor University
arXiv
Summary
04-11 ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Institution: University of Central Florida, ByteDance Inc
arXiv
Summary
04-10 "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output
Institution: Google Research
arXiv
Summary
04-03 PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts
Institution: Shanghai Jiao Tong University, CMU
arXiv
Summary
04-02 Octopus v2: On-device language model for super agent
Institution: Stanford University
arXiv
Summary
04-02 LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models
Institution: Microsoft
arXiv
Summary
03-13 Scaling Instructable Agents Across Many Simulated Worlds
arXiv
Summary
03-11 Stealing Part of a Production Language Model
Institution: Google DeepMind, ETH Zurich, University of Washington
arXiv
Summary
03-08 Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Institution: Google
arXiv
Summary
03-07 Yi: Open Foundation Models by 01.AI
Institution: 01.AI
arXiv
Summary
03-05 Design2Code: How Far Are We From Automating Front-End Engineering?
Institution: Stanford University, Georgia Tech, Microsoft
arXiv
Summary
02-29 Beyond Language Models: Byte Models are Digital World Simulators
Institution: Microsoft Research Asia
arXiv
Summary
02-29 StarCoder 2 and The Stack v2: The Next Generation
Institution: ServiceNow, Hugging Face
arXiv
Summary
02-27 The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Institution: Microsoft, University of Chinese Academy of Sciences
arXiv
Summary
02-27 EMO: Emote Portrait Alive -- Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Institution: Alibaba Group
arXiv
Summary
02-27 Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
Institution: OpenAI
arXiv
Summary
02-26 Improving LLM-based Machine Translation with Systematic Self-Correction
Institution: Zhejiang University, Tencent, Angelalign Technology Inc.
arXiv
Summary
02-23 Genie: Generative Interactive Environments
Institution: Google DeepMind, University of British Columbia
arXiv
Summary
02-19 AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Institution: Fudan University, Multimodal Art Projection Research Community, Shanghai AI Laboratory
arXiv
Summary
02-16 FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models
Institution: The University of British Columbia & Invertible AI
arXiv
Summary
02-16 SPAR: Personalized Content-Based Recommendation via Long Engagement Attention
Institution: The University of British Columbia, Meta
arXiv
Summary
02-02 LimSim++: A Closed-Loop Platform for Deploying Multimodal LLMs in Autonomous Driving
Institution: Shanghai Artificial Intelligence Laboratory, College of Control Science and Engineering Zhejiang University
arXiv
Summary
01-30 Recovering Mental Representations from Large Language Models with Markov Chain Monte Carlo
Institution: Princeton University, University of Warwick
arXiv
Summary
01-29 LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning
Institution: Nanyang Technological University
arXiv
Summary
01-19 Pruning for Protection: Increasing Jailbreak Resistance in Aligned LLMs Without Fine-Tuning
Institution: MIT
arXiv
Summary
01-17 Vlogger: Make Your Dream A Vlog
Institution: Shanghai Jiao Tong University, Shanghai AI Laboratory, Shenzhen Institute of Advanced Technology Chinese Academy of Sciences
arXiv
Summary
01-16 SpecGen: Automated Generation of Formal Program Specifications via Large Language Models
Institution: Nanjing University, Nanyang Technological University, Singapore Management University
arXiv
Summary
01-12 TestSpark: IntelliJ IDEA's Ultimate Test Generation Companion
Institution: JetBrains Research, Delft University of Technology
arXiv
Summary
01-12 From Automation to Augmentation: Large Language Models Elevating Essay Scoring Landscape
Institution: Tsinghua University, University of Maryland, Beijing Xicheng Educational Research Institute
arXiv
Summary
01-12 Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation
Institution: Nanyang Technological University, Fudan University
arXiv
Summary
01-10 Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
Institution: Renmin University of China, Beijing Key Laboratory of Big Data Management and Analysis Methods, Meituan Group
arXiv
Summary
01-10 Leveraging Print Debugging to Improve Code Generation in Large Language Models
Institution: Zhejiang University, ByteDance
arXiv
Summary
01-08 MARG: Multi-Agent Review Generation for Scientific Papers
Institution: Northwestern University, The Hebrew University of Jerusalem, Allen Institute for AI
arXiv
Summary
01-05 Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache
Institution: Alibaba Group, Shanghai Jiao Tong University
arXiv
Summary
01-04 Using LLM to select the right SQL Query from candidates
Institution: Peking University
arXiv
Summary
01-04 LLM Augmented LLMs: Expanding Capabilities through Composition
Institution: Google Research, Google DeepMind
arXiv
Summary
01-03 MedSumm: A Multimodal Approach to Summarizing Code-Mixed Hindi-English Clinical Queries
Institution: Indian Institute of Technology Patna, Stanford University, Amazon GenAI
arXiv
Summary
01-03 Social Media Ready Caption Generation for Brands
Institution: Adobe Research India
arXiv
Summary
12-29 DB-GPT: Empowering Database Interactions with Private Large Language Models
Institution: Alibaba Group
arXiv
Summary
12-29 The Right Prompts for the Job: Repair Code-Review Defects with Large Language Model
Institution: Ant Group, Nanjing University
arXiv
Summary
12-29 Building Efficient Universal Classifiers with Natural Language Inference
Institution: Vrije Universiteit Amsterdam, University of London Royal Holloway, Hugging Face
arXiv
Summary
12-28 DrugAssist: A Large Language Model for Molecule Optimization
Institution: Tencent AI Lab, Department of Computer Science Hunan University
arXiv
Summary
12-27 Conversational Question Answering with Reformulations over Knowledge Graph
Institution: University of Illinois at Urbana-Champaign, Amazon
arXiv
Summary
12-27 Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
Institution: Shanghai Jiao Tong University (SJTU)
arXiv
Summary
12-26 RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation
Institution: City University of Hong Kong, The Chinese University of Hong Kong, Hangdian University
arXiv
Summary
12-22 YAYI 2: Multilingual Open-Source Large Language Models
Institution: Beijing Wenge Technology Co. Ltd., Institute of Automation Chinese Academy of Sciences
arXiv
Summary
12-20 Lampr: Boosting the Effectiveness of Language-Generic Program Reduction via Large Language Models
Institution: University of Waterloo, The Hong Kong University of Science and Technology, Concordia University
arXiv
Summary
12-20 Generative Multimodal Models are In-Context Learners
Institution: Beijing Academy of Artificial Intelligence, Tsinghua University, Peking University
arXiv
Summary
12-19 Text-Conditioned Resampler For Long Form Video Understanding
Institution: University of Oxford, Google, Google DeepMind
arXiv
Summary
12-18 Towards Better Serialization of Tabular Data for Few-shot Classification with Large Language Models
Institution: Carnegie Mellon University
arXiv
Summary
12-18 MAC-SQL: Multi-Agent Collaboration for Text-to-SQL
Institution: Beihang University, Tencent Cloud AI
arXiv
Summary
12-15 GSVA: Generalized Segmentation via Multimodal Large Language Models
Institution: Tsinghua University
arXiv
Summary
12-14 CogAgent: A Visual Language Model for GUI Agents
Institution: Tsinghua University, Zhipu AI
arXiv
Summary
12-14 StemGen: A music generation model that listens
Institution: SAMI, ByteDance Inc.
arXiv
Summary
12-14 Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Institution: CUHK-SenseTime Joint Laboratory, Shanghai AI Laboratory, Tsinghua University
arXiv
Summary
12-13 SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Institution: The Swiss AI Lab IDSIA USI & SUPSI, AI Initiative KAUST, Center for Brain Science Harvard University
arXiv
Summary
12-13 E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification
Institution: UC Riverside, Microsoft Research
arXiv
Summary
12-13 Knowledge-Aware Artifact Image Synthesis with LLM-Enhanced Prompting and Multi-Source Supervision
Institution: Peking University
arXiv
Summary
12-12 LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Institution: Apple
arXiv
Summary
12-11 Oracle-based Protocol Testing with Eywa
Institution: Microsoft Research
arXiv
Summary
12-09 Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis
Institution: Shanghai Jiao Tong University
arXiv
Summary
12-07 Generating Illustrated Instructions
Institution: GenAI Meta, Columbia University
arXiv
Summary
12-06 Holmes: Towards Distributed Training Across Clusters with Heterogeneous NIC Environment
Institution: Zhejiang Lab
arXiv
Summary
12-06 OneLLM: One Framework to Align All Modalities with Language
Institution: MMLab The Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory
arXiv
Summary
12-05 A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education
Institution: Carnegie Mellon University
arXiv
Summary
12-04 LLMs Accelerate Annotation for Medical Information Extraction
Institution: Google Research
arXiv
Summary
12-02 Large Language Models Are Zero-Shot Text Classifiers
Institution: Florida Atlantic University
arXiv
Summary
12-01 Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses
Institution: Google
arXiv
Summary
12-01 Improve Supervised Representation Learning with Masked Image Modeling
Institution: Google Research, OpenAI
arXiv
Summary
11-30 PoseGPT: Chatting about 3D Human Pose
Institution: Max Planck Institute for Intelligent Systems, Meshcapade
arXiv
Summary
11-30 Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text
Institution: The University of Tokyo
arXiv
Summary
11-30 MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Institution: University of Science and Technology of China, Microsoft Research Asia
arXiv
Summary
11-29 Large Language Models for Networking: Applications, Enabling Techniques, and Challenges
Institution: BUPT
arXiv
Summary
11-29 How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation
Institution: The Education University of Hong Kong
arXiv
Summary
11-28 ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Institution: Nanyang Technological University
arXiv
Summary
11-28 Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Institution: Alibaba Group
arXiv
Summary
Blog
11-28 Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
Institution: Microsoft
arXiv
Summary
11-28 LLaFS: When Large-Language Models Meet Few-Shot Segmentation
Institution: Singapore University of Technology and Design, Zhejiang University
arXiv
Summary
11-23 LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes
Institution: ASRI
arXiv
Summary
11-23 FusionFrames: Efficient Architectural Aspects for Text-to-Video Generation Pipeline
Institution: Sber AI
arXiv
Summary
11-22 XAGen: 3D Expressive Human Avatars Generation
Institution: National University of Singapore, ByteDance
arXiv
Summary
11-21 AcademicGPT: Empowering Academic Research
Institution: International Digital Economy Academy
arXiv
Summary
11-21 A Survey on Multimodal Large Language Models for Autonomous Driving
Institution: Purdue University
arXiv
Summary
11-13 Can LLMs Patch Security Issues?
Institution: School of Computer Science Atlanta
arXiv
Summary
11-05 ChaTA: Towards an Intelligent Question-Answer Teaching Assistant using Open-Source LLMs
Institution: Cornell University, Microsoft Research
arXiv
Summary
11-01 LLMRec: Large Language Models with Graph Augmentation for Recommendation
Institution: University of Hong Kong, Baidu
arXiv
Summary
10-10 GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models
Institution: Microsoft Research
arXiv
Summary
08-18 Learning Representations on Logs for AIOps
Institution: IBM Research
arXiv
Summary

Pre-training and Instruction Fine-tuning

 Date   Paper Links & Summary
05-21 G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation
Institution: ByteDance Research
arXiv
Summary
05-20 OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Institution: OpenLLMAI Team, ByteDance Inc., Netease Fuxi AI Lab
arXiv
Summary
05-19 Your Transformer is Secretly Linear
Institution: AIRI, Skoltech, SberAI
arXiv
Summary
05-17 Prompt Exploration with Prompt Regression
Institution: Carnegie Mellon University, Massachusetts Institute of Technology, University of Michigan
arXiv
Summary
05-15 ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models
Institution: Microsoft Research Asia, Harvard University, Peking University
arXiv
Summary
05-15 LoRA Learns Less and Forgets Less
Institution: Columbia University, Databricks
arXiv
Summary
05-13 RLHF Workflow: From Reward Modeling to Online RLHF
Institution: Salesforce AI Research, University of Illinois Urbana-Champaign
arXiv
Summary
05-07 QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
Institution: MIT, NVIDIA
arXiv
Summary
04-30 Better & Faster Large Language Models via Multi-token Prediction
Institution: FAIR at Meta
arXiv
Summary
04-29 LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Institution: Predibase
arXiv
Summary
04-25 Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
Institution: Meta, University of Toronto, Carnegie Mellon University
arXiv
Summary
04-13 Intuition-aware Mixture-of-Rank-1-Experts for Parameter Efficient Finetuning
Institution: Nanjing University, University of California
arXiv
Summary
04-12 Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Institution: AI at Meta, University of Southern California, Carnegie Mellon University
arXiv
Summary
04-10 Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Institution: Google
arXiv
Summary
04-08 LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Institution: Alibaba Group, Zhejiang University
arXiv
Summary
04-07 Radial Networks: Dynamic Layer Routing for High-Performance Large Language Models
Institution: Cornell University
arXiv
Summary
04-04 Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Institution: Microsoft Research
arXiv
Summary
04-04 ReFT: Representation Finetuning for Language Models
Institution: Stanford University, Pr(Ai)2R Group
arXiv
Summary
04-01 Efficiently Distilling LLMs for Edge Applications
Institution: IBM Research
arXiv
Summary
04-01 Prompt-prompted Mixture of Experts for Efficient LLM Generation
Institution: CMU
arXiv
Summary
03-28 Jamba: A Hybrid Transformer-Mamba Language Model
Institution: AI21 Labs
arXiv
Summary
03-26 LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Institution: The Hong Kong University of Science and Technology, University of Illinois Urbana-Champaign
arXiv
Summary
03-12 Chronos: Learning the Language of Time Series
Institution: Amazon Web Services, UC San Diego, University of Freiburg
arXiv
Summary
03-08 Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation
arXiv
Summary
02-29 SEED: Customize Large Language Models with Sample-Efficient Adaptation for Code Generation
Institution: Peking University
arXiv
Summary
02-27 When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Institution: Google DeepMind
arXiv
Summary
02-20 Instruction-tuned Language Models are Better Knowledge Learners
Institution: FAIR at Meta, Carnegie Mellon University, University of Washington
arXiv
Summary
02-01 Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing
Institution: Nanyang Technological University, Institute for Infocomm Research A*STAR, Salesforce Research
arXiv
Summary
01-29 SelectLLM: Can LLMs Select Important Instructions to Annotate?
Institution: University of Minnesota, Carnegie Mellon University
arXiv
Summary
01-26 EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Institution: Peking University, Microsoft Research, University of Waterloo
arXiv
Summary
01-19 Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Institution: Princeton University, Together AI, University of Illinois Urbana-Champaign
arXiv
Summary
01-18 A Fast, Performant, Secure Distributed Training Framework For Large Language Model
Institution: Ant Group China
arXiv
Summary
01-18 ChatQA: Building GPT-4 Level Conversational QA Models
Institution: NVIDIA
arXiv
Summary
01-17 ReFT: Reasoning with Reinforced Fine-Tuning
Institution: ByteDance Research
arXiv
Summary
01-16 Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Institution: Johns Hopkins University, Microsoft
arXiv
Summary
01-15 MAPLE: Multilingual Evaluation of Parameter Efficient Finetuning of Large Language Models
Institution: Microsoft Research India
arXiv
Summary
01-12 APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding
Institution: Tsinghua University, Zhipu AI
arXiv
Summary
01-12 An Experimental Design Framework for Label-Efficient Supervised Finetuning of Large Language Models
Institution: University of Washington Seattle, University of Wisconsin-Madison, Stanford University
arXiv
Summary
01-11 Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint
Institution: Gaoling School of Artificial Intelligence, Renmin University of China; School of Information, Renmin University of China; Kuaishou Technology, Beijing China.
arXiv
Summary
12-26 A Prompt Learning Framework for Source Code Summarization
Institution: Nanyang Technological University, Tencent Inc., Nanjing University
arXiv
Summary
12-22 Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
arXiv
Summary
12-22 Plan, Posture and Go: Towards Open-World Text-to-Motion Generation
Institution: Tsinghua University, Microsoft Research Asia
arXiv
Summary
12-20 Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Institution: Ant Group
arXiv
Summary
12-20 Mini-GPTs: Efficient Large Language Models through Contextual Pruning
Institution: Massachusetts Institute of Technology
arXiv
Summary
12-20 Time is Encoded in the Weights of Finetuned Language Models
arXiv
Summary
12-15 The Art of Balancing: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Institution: NLP Group Fudan University, Hikvision Inc
arXiv
Summary
12-14 Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention
Institution: Tencent AI Lab Seattle
arXiv
Summary
12-12 VILA: On Pre-training for Visual Language Models
Institution: NVIDIA, MIT
arXiv
Summary
12-11 Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Institution: Zhejiang University, Alibaba Group
arXiv
Summary
12-09 Sim-GPT: Text Similarity via GPT Annotated Data
Institution: Shannon.AI, Zhejiang University, Bytedance
arXiv
Summary
12-09 Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Institution: Northeastern University, Oracle
arXiv
Summary
12-06 Controllable Human-Object Interaction Synthesis
Institution: Stanford University, FAIR Meta
arXiv
Summary
12-05 RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
Institution: University of Waterloo
arXiv
Summary
11-28 Prompting in Autoregressive Large Language Models
Institution: George Mason University
arXiv
Summary
11-28 Training Chain-of-Thought via Latent-Variable Inference
Institution: Google
arXiv
Summary
11-28 RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement
Institution: Alibaba Group
arXiv
Summary
11-23 Diffusion Model Alignment Using Direct Preference Optimization
Institution: Nikhil Naik, Stanford University
arXiv
Summary
11-22 LIMIT: Less Is More for Instruction Tuning Across Evaluation Paradigms
Institution: Princeton University
arXiv
Summary
11-21 Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey
Institution: Nanjing University
arXiv
Summary
11-21 Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Institution: University of Cambridge
arXiv
Summary
11-18 Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning
Institution: Technical University of Darmstadt, University of Cambridge
arXiv
Summary
11-17 Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2
Institution: Allen Institute for AI
arXiv
Summary
11-15 Exponentially Faster Language Modelling
Institution: ETH Zurich
arXiv
Summary
11-15 Memory Augmented Language Models through Mixture of Word Experts
Institution: Google Research
arXiv
Summary
11-14 Fine-tuning Language Models for Factuality
Institution: Stanford University
arXiv
Summary
07-12 Instruction Mining: When Data Mining Meets Large Language Model Finetuning
Institution: Carnegie Mellon University
arXiv
Summary

Survey

 Date   Paper Links & Summary
04-25 Continual Learning of Large Language Models: A Comprehensive Survey
Institution: Rutgers University, Wuhan University, Huazhong University of Science and Technology
arXiv
Summary
04-24 Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Institution: Shanghai Jiao Tong University, UC San Diego, Duke University
arXiv
Summary
04-23 A Survey of Large Language Models on Generative Graph Analytics: Query, Learning, and Applications
Institution: Hong Kong Baptist University
arXiv
Summary
04-22 A Survey on Efficient Inference for Large Language Models
Institution: Tsinghua University
arXiv
Summary
04-22 A Survey on Self-Evolution of Large Language Models
Institution: Peking University, Alibaba Group, Nanyang Technological University
arXiv
Summary
04-09 Privacy Preserving Prompt Engineering: A Survey
Institution: University of Arkansas
arXiv
Summary
04-01 AIOps Solutions for Incident Management: Technical Guidelines and A Comprehensive Literature Review
Institution: University of Lyon, INSA Lyon, Infologic
arXiv
Summary
01-24 MM-LLMs: Recent Advances in MultiModal Large Language Models
Institution: Tencent AI Lab, Kyoto University, Mohamed Bin Zayed University of Artificial Intelligence
arXiv
Summary
01-15 The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey
Institution: Technology Innovation Institute UAE, Islamic University of Technology Bangladesh, Stanford University, Amazon GenAI, AI Institute University of South Carolina
arXiv
Summary
01-11 Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Institution: Zhongguancun Laboratory, Tsinghua University, Institute of Information Engineering Chinese Academy of Sciences
arXiv
Summary
01-09 Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Institution: Northwestern Polytechnical University, University of Georgia, Shaanxi Normal University
arXiv
Summary
01-02 A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
Institution: Islamic University of Technology Bangladesh, University of South Carolina, Stanford University
arXiv
Summary
12-22 A Survey of Reinforcement Learning from Human Feedback
Institution: LMU Munich, Duke Kunshan University
arXiv
Summary
12-18 Retrieval-Augmented Generation for Large Language Models: A Survey
Institution: Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, Fudan University
arXiv
Summary
12-18 From Google Gemini to OpenAI Q-Star: A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Institution: Cyberstronomy Pty Ltd, Academies Australasia Polytechnic, Massey University
arXiv
Summary
12-16 A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers
Institution: Tongji University, National Natural Science Foundation of China, Shanghai Municipal Science and Technology Major Project
arXiv
Summary
12-09 NLLG Quarterly arXiv Report 09/23: What are the most influential current AI Papers?
Institution: University of Mannheim, University of Bielefeld
arXiv
Summary
12-06 Efficient Large Language Models: A Survey
Institution: The Ohio State University, Google Research, Amazon AWS AI
arXiv
Summary
12-04 A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Institution: Elsevier
arXiv
Summary
12-04 Data Management For Large Language Models: A Survey
Institution: Peking University, Huawei Noah’s Ark Lab
arXiv
Summary
11-28 Graph Prompt Learning: A Comprehensive Survey and Beyond
Institution: The Chinese University of Hong Kong, Hong Kong University of Science and Technology, Fudan University
arXiv
Summary
11-21 Prompting Frameworks for Large Language Models: A Survey
Institution: Zhejiang University
arXiv
Summary
10-16 A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Institution: Harbin Institute of Technology, Huawei
arXiv
Summary
09-03 Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Institution: Tencent AI lab
arXiv
Summary
06-01 Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation
Institution: Carnegie Mellon University
arXiv
Summary
03-31 A Survey of Large Language Models
Institution: Renmin University of China
arXiv
Summary
03-15 GPT-4 Technical Report
Institution: OpenAI
arXiv
Summary
02-15 Augmented Language Models: a Survey
Institution: Meta AI
arXiv
Summary