05-20 |
Multiple-Choice Questions are Efficient and Robust LLM Evaluators Institution: Shanghai Jiao Tong University
|
|
05-20 |
xFinder: Robust and Pinpoint Answer Extraction for Large Language Models Institution: Institute for Advanced Algorithms Research, Shanghai,Renmin University of China
|
|
05-16 |
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation Institution: Amazon, The University of Texas at Austin
|
|
05-16 |
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation Institution: Amazon, The University of Texas at Austin
|
|
05-10 |
UniDM: A Unified Framework for Data Manipulation with Large Language Models Institution: Alibaba Group, University of Science and Technology of China
|
|
05-10 |
Automatic Generation of Model and Data Cards: A Step Towards Responsible AI Institution: CMU, MPI, ETH Zürich
|
|
05-09 |
Can large language models understand uncommon meanings of common words? Institution: Tsinghua University, Chinese Academy of Science
|
|
05-08 |
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations Institution: University of Washington, MBZUAI
|
|
05-06 |
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning Institution: East China Normal University
|
|
05-02 |
Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models Institution: KAIST AI, LG AI Research, Carnegie Mellon University
|
|
04-30 |
Multi-hop Question Answering over Knowledge Graphs using Large Language Models Institution: Microsoft
|
|
04-29 |
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Institution: Cohere
|
|
04-26 |
A Comprehensive Evaluation on Event Reasoning of Large Language Models Institution: Peking University, Advanced Institute of Big Data, Beihang University
|
|
04-24 |
From Local to Global: A Graph RAG Approach to Query-Focused Summarization Institution: Microsoft Research, Microsoft Strategic Missions and Technologies, Microsoft Office of the CTO
|
|
04-23 |
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies Institution: Stanford University, IBM Research
|
|
04-22 |
Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph Institution: University of California San Diego, Carnegie Mellon University, University of Pennsylvania
|
|
04-22 |
SnapKV: LLM Knows What You are Looking for Before Generation Institution: University of Illinois Urbana-Champaign, Cohere, Princeton University
|
|
04-22 |
LLMs Know What They Need: Leveraging a Missing Information Guided Framework to Empower Retrieval-Augmented Generation Institution: Meituan
|
|
04-22 |
Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering Institution: Tencent Inc., Harbin Institute of Technology
|
|
04-18 |
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation Institution: Peking University, ByteDance Inc.
|
|
04-16 |
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior Institution: Stanford University
|
|
04-15 |
Compression Represents Intelligence Linearly Institution: The Hong Kong University of Science and Technology, Tencent
|
|
04-11 |
Rho-1: Not All Tokens Are What You Need Institution: Xiamen University, Tsinghua University, Microsoft
|
|
04-11 |
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Institution: The University of Hong Kong, CMU, Salesforce Research
|
|
04-10 |
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation Institution: Apple, Cupertino, CA, USA
|
|
04-09 |
RULER: What's the Real Context Size of Your Long-Context Language Models? Institution: NVIDIA
|
|
04-09 |
Event-enhanced Retrieval in Real-time Search Institution: Tencent Search, Platform and Content Group
|
|
04-08 |
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding Institution: Meta
|
|
04-02 |
Long-context LLMs Struggle with Long In-context Learning Institution: University of Waterloo, Carnegie Mellon University
|
|
04-02 |
Long-context LLMs Struggle with Long In-context Learning Institution: University of Waterloo, Carnegie Mellon University
|
|
04-01 |
Mapping the Increasing Use of LLMs in Scientific Papers Institution: Stanford University, UC Santa Barbara
|
|
04-01 |
LLM-RadJudge: Achieving Radiologist-Level Evaluation for X-Ray Report Generation Institution: Microsoft Research Asia
|
|
03-27 |
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models Institution: DCST Tsinghua University, Beijing Institute of Technology, Huawei Cloud BU
|
|
03-26 |
The Unreasonable Ineffectiveness of the Deeper Layers Institution: Meta FAIR, UMD
|
|
03-26 |
COIG-CQIA: Quality is All You Need for Chinese Instruction Fine-tuning Institution: Shenzhen Institute of Advanced Technology, CAS; M-A-P; Institute of Automation, CAS
|
|
03-18 |
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression Institution: University of Texas at Austin, Drexel University, MIT
|
|
03-15 |
RAFT: Adapting Language Model to Domain Specific RAG Institution: UC Berkeley
|
|
03-15 |
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer Institution: DP Technology, AI for Science Institute Beijing
|
|
03-11 |
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback Institution: Zhejiang University, Southeast University, Massachusetts Institute of Technology
|
|
03-07 |
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Institution: UC Berkeley, Stanford, UCSD
|
|
03-05 |
MathScale: Scaling Instruction Tuning for Mathematical Reasoning Institution: The Chinese University of Hong Kong Shenzhen, China; Microsoft Research Asia, Beijing, China; Shenzhen Research Institute of Big Data, Shenzhen, China
|
|
02-27 |
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering Institution: Gaoling School of Artificial Intelligence Renmin University of China, School of Information Renmin University of China
|
|
02-25 |
ChatMusician: Understanding and Generating Music Intrinsically with LLM Institution: Hong Kong University of Science and Technology
|
|
02-22 |
CriticBench: Benchmarking LLMs for Critique-Correct Reasoning Institution: Tsinghua University, University of Hong Kong
|
|
02-20 |
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization Institution: AWS AI Labs, The University of Texas at Austin, KAIST
|
|
02-14 |
Premise Order Matters in Reasoning with Large Language Models Institution: Google DeepMind
|
|
02-01 |
Can Large Language Models Understand Context? Institution: Georgetown University, Apple
|
|
02-01 |
HR-MultiWOZ: A Task Oriented Dialogue (TOD) Dataset for HR LLM Agent Institution: Amazon, University of Milano-Bicocca
|
|
01-31 |
LongAlign: A Recipe for Long Context Alignment of Large Language Models Institution: Tsinghua University, Zhipu.AI
|
|
01-30 |
Incoherent Probability Judgments in Large Language Models Institution: Princeton University
|
|
01-27 |
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries Institution: Hong Kong University of Science and Technology
|
|
01-24 |
Can AI Assistants Know What They Don't Know? Institution: Fudan University, Shanghai Artificial Intelligence Laboratory
|
|
01-24 |
Consistency Guided Knowledge Retrieval and Denoising in LLMs for Zero-shot Document-level Relation Triplet Extraction Institution: Nanjing University of Science and Technology, Northeastern University, Singapore Institute of Technology
|
|
01-24 |
Clue-Guided Path Exploration: An Efficient Knowledge Base Question-Answering Framework with Low Computational Resource Consumption Institution: Tsinghua University, Zhongguancun Laboratory, XinJiang University
|
|
01-24 |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents Institution: The University of Hong Kong, Zhejiang University, Shanghai Jiao Tong University
|
|
01-22 |
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation Institution: Stanford University, Stability AI
|
|
01-21 |
Interactive AI with Retrieval-Augmented Generation for Next Generation Networking Institution: Nanyang Technological University, Guangdong University of Technology, Institute for Infocomm Research, Agency for Science Technology and Research
|
|
01-17 |
LLMs for Relational Reasoning: How Far are We? Institution: Continental-NTU Corporate Lab, Nanyang Technological University, Singapore
|
|
01-16 |
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture Institution: Microsoft
|
|
01-16 |
Salute the Classic: Revisiting Challenges of Machine Translation in the Age of Large Language Models Institution: Tencent AI Lab
|
|
01-15 |
A Study on Large Language Models' Limitations in Multiple-Choice Question Answering Institution: David R. Cheriton School of Computer Science
|
|
01-12 |
Kun: Answer Polishment for Chinese Self-Alignment with Instruction Back-Translation Institution: Tianyu Zheng, Shuyue Guo, Xingwei Qu, Jiawei Guo, Weixu Zhang, Xinrun Du, Chenghua Lin, Wenhao Huang, Wenhu Chen, Jie Fu, Ge Zhang
|
|
01-12 |
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs Institution: Virginia Tech, Renmin University of China, UC Davis
|
|
01-11 |
TOFU: A Task of Fictitious Unlearning for LLMs Institution: Carnegie Mellon University
|
|
01-11 |
LLM-as-a-Coauthor: The Challenges of Detecting LLM-Human Mixcase Institution: LAIR Lab Lehigh University, Huazhong University of Science and Technology
|
|
01-10 |
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing Institution: Google Research
|
|
01-10 |
CASA: Causality-driven Argument Sufficiency Assessment Institution: Peking University
|
|
01-10 |
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
|
|
01-09 |
Rewriting the Code: A Simple Method for Large Language Model Augmented Code Search Institution: Nanyang Technological University Singapore
|
|
01-04 |
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval Institution: Columbia University
|
|
01-02 |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
|
|
01-01 |
The Earth is Flat? Unveiling Factual Errors in Large Language Models Institution: The Chinese University of Hong Kong, Tencent AI Lab
|
|
12-31 |
Improving Text Embeddings with Large Language Models Institution: Microsoft Corporation
|
|
12-31 |
BatchEval: Towards Human-like Text Evaluation Institution: Beijing Institute of Technology, Xiaohongshu Inc
|
|
12-29 |
Enhancing Quantitative Reasoning Skills of Large Language Models through Dimension Perception Institution: Institution: Shanghai Key Laboratory of Data Science School of Computer Science Fudan University, School of Data Science Fudan University, DataGrand Co. LTD
|
|
12-28 |
Structured Packing in LLM Training Improves Long Context Utilization Institution: University of Warsaw, Google DeepMind, Polish Academy of Sciences
|
|
12-26 |
Think and Retrieval: A Hypothesis Knowledge Graph Enhanced Medical Large Language Models Institution: Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education; School of Computer Science Peking University, Beijing China
|
|
12-25 |
ESGReveal: An LLM-based approach for extracting structured data from ESG reports Institution: Alibaba Cloud, Tsinghua University, Sun Yat-Sen University
|
|
12-22 |
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation Institution: University of Waterloo, IN.AI Research
|
|
12-19 |
A Revisit of Fake News Dataset with Augmented Fact-checking by ChatGPT
|
|
12-19 |
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in ultra low-data regimes Institution: University of Cambridge
|
|
12-18 |
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model Institution: Huawei Noah's Ark Lab, The University of Hong Kong, The Hong Kong University of Science and Technology
|
|
12-18 |
NoMIRACL: Knowing When You Don't Know for Robust Multilingual Retrieval-Augmented Generation Institution: University of Waterloo, Huawei Noah’s Ark Lab, FEEC-Unicamp Brazil
|
|
12-18 |
"Paraphrasing The Original Text" Makes High Accuracy Long-Context QA Institution: Tsinghua University
|
|
12-17 |
Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach Institution: Shanghai Jiao Tong University
|
|
12-16 |
RIGHT: Retrieval-augmented Generation for Mainstream Hashtag Recommendation Institution: CAS Key Lab of Network Data Science and Technology ICT CAS, University of Chinese Academy of Sciences Beijing China
|
|
12-16 |
ProTIP: Progressive Tool Retrieval Improves Planning Institution: Apple
|
|
12-16 |
CoAScore: Chain-of-Aspects Prompting for NLG Evaluation Institution: GSAI Renmin University of China
|
|
12-16 |
RecPrompt: A Prompt Tuning Framework for News Recommendation Using Large Language Models Institution: Science Foundation Ireland (SFI), JSPS KAKENHI
|
|
12-15 |
No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models Institution: Fudan University
|
|
12-15 |
Generative Context-aware Fine-tuning of Self-supervised Speech Models Institution: ASAPP, Carnegie Mellon University, Toyota Technological Institute at Chicago
|
|
12-15 |
Faithful Persona-based Conversational Dataset Generation with Large Language Models Institution: University of Southern California, Google, Information Sciences Institute
|
|
12-15 |
Challenges with unsupervised LLM knowledge discovery Institution: Google DeepMind, Google Research
|
|
12-15 |
KGLens: A Parameterized Knowledge Graph Solution to Assess What an LLM Does and Doesn't Know Institution: Apple
|
|
12-14 |
Math-Shepherd: A Label-Free Step-by-Step Verifier for LLMs in Mathematical Reasoning Institution: Peking University, DeepSeek-AI, The University of Hong Kong
|
|
12-14 |
Entity-Augmented Code Generation Institution: JetBrains
|
|
12-14 |
Towards Verifiable Text Generation with Evolving Memory and Self-Reflection Institution: Peking University, Chinese Academy of Sciences, Baidu Inc
|
|
12-14 |
TinyGSM: achieving >80% on GSM8k with small language models Institution: Carnegie Mellon University, Microsoft Research
|
|
12-14 |
Self-Evaluation Improves Selective Generation in Large Language Models Institution: Google DeepMind, Google Research
|
|
12-12 |
LLMEval: A Preliminary Study on How to Evaluate Large Language Models Institution: Fudan University, Shanghai Jiaotong University
|
|
12-12 |
diff History for Long-Context Language Agents Institution: New York University
|
|
12-11 |
Honeybee: Locality-enhanced Projector for Multimodal LLM Institution: Kakao Brain
|
|
12-11 |
Dense X Retrieval: What Retrieval Granularity Should We Use? Institution: University of Washington, Tencent AI Lab
|
|
12-10 |
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs Institution: Microsoft Israel
|
|
12-08 |
Using Program Knowledge Graph to Uncover Software Vulnerabilities
|
|
12-07 |
CLadder: A Benchmark to Assess Causal Reasoning Capabilities of Language Models Institution: MPI for Intelligent Systems, University of Washington
|
|
12-05 |
A Hardware Evaluation Framework for Large Language Model Inference Institution: Princeton University
|
|
12-04 |
Competition-Level Problems are Effective LLM Evaluators Institution: Microsoft Research Asia, Xiamen University, Microsoft Azure AI
|
|
12-04 |
ChatGPT as a Math Questioner? Evaluating ChatGPT on Generating Pre-university Math Questions Institution: Nanyang Technological University, National University of Singapore
|
|
12-03 |
D-Bot: Database Diagnosis System using Large Language Models Institution: Tsinghua University, Pigsty, ModelBest
|
|
12-03 |
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents Institution: University of Southern California, Google Cloud AI
|
|
12-03 |
Running cognitive evaluations on large language models: The do's and the don'ts Institution: Massachusetts Institute of Technology
|
|
12-01 |
Deciphering Digital Detectives: Understanding LLM Behaviors and Capabilities in Multi-Agent Mystery Games Institution: Quebec AI Institute
|
|
12-01 |
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models Institution: University of Wisconsin - Madison
|
|
12-01 |
The Cost of Compression: Investigating the Impact of Compression on Parametric Knowledge in Language Models Institution: University of Wisconsin - Madison
|
|
11-30 |
TaskBench: Benchmarking Large Language Models for Task Automation Institution: Zhejiang University
|
|
11-30 |
What Do Llamas Really Think? Revealing Preference Biases in Language Model Representations Institution: Comcast Applied AI, University of Waterloo
|
|
11-29 |
Are Large Language Models Good Fact Checkers: A Preliminary Study Institution: Chinese Academy of Sciences
|
|
11-29 |
TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models Institution: Harbin Institute of Technology
|
|
11-26 |
UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation Institution: Renmin University of Chin
|
|
11-21 |
Do Smaller Language Models Answer Contextualised Questions Through Memorisation Or Generalisation? Institution: University of Auckland
|
|
11-21 |
Oasis: Data Curation and Assessment System for Pretraining of Large Language Models Institution: Chinese Academy of Sciences
|
|
11-21 |
How Capable Can a Transformer Become? A Study on Synthetic, Interpretable Tasks Institution: University of Pennsylvania, MIT
|
|
11-20 |
GPQA: A Graduate-Level Google-Proof Q&A Benchmark Institution: New York University
|
|
11-20 |
Continual Learning: Applications and the Road Forward Institution: KU Leuven
|
|
11-16 |
MacGyver: Are Large Language Models Creative Problem Solvers? Institution: University of California, Princeton University
|
|
11-15 |
ToolTalk: Evaluating Tool-Usage in a Conversational Setting Institution: Microsoft Corporation
|
|
11-14 |
Instruction-Following Evaluation for Large Language Models Institution: Google, Yale University
|
|
11-10 |
Making LLMs Worth Every Penny: Resource-Limited Text Classification in Banking Institution: Helvia.ai
|
|
10-17 |
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection Institution: University of Washington
|
|
10-11 |
OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models Institution: Tsinghua University, Chinese Academy of Sciences
|
|
10-10 |
A New Benchmark and Reverse Validation Method for Passage-level Hallucination Detection Institution: Peking University
|
|
10-10 |
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Institution: Northeastern University, MIT
|
|
09-26 |
RAGAS: Automated Evaluation of Retrieval Augmented Generation Institution: Cardiff University
|
|
09-04 |
Benchmarking Large Language Models in Retrieval-Augmented Generation Institution: Chinese Information Processing Laboratory
|
|
06-15 |
KoLA: Carefully Benchmarking World Knowledge of Large Language Models Institution: Tsinghua University
|
|
06-07 |
Knowledge-Augmented Language Model Prompting for Zero-Shot Knowledge Graph Question Answering Institution: KAIST, MBZUAI, Amazon
|
|
05-29 |
G-EVAL: NLG Evaluation using GPT-4 with Better Human Alignment Institution: Microsoft Cognitive Services Research
|
|
05-24 |
In-Context Demonstration Selection with Cross Entropy Difference Institution: Microsoft Cognitive Service Research
|
|
05-16 |
StructGPT: A General Framework for Large Language Model to Reason over Structured Data Institution: Gaoling School of Artificial Intelligence, Renmin University of China.
|
|
02-08 |
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity Institution: Centre for Artificial Intelligence Research
|
|
|
|
|