Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-16 | Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers | Zhengbo Zhang et.al. | 2407.08394 | null |
2024-07-11 | PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin Ray Fingers | Xing Wang et.al. | 2407.08222 | null |
2024-07-07 | Addressing single object tracking in satellite imagery through prompt-engineered solutions | Athena Psalta et.al. | 2407.05518 | null |
2024-07-07 | Learning Motion Blur Robust Vision Transformers with Dynamic Early Exit for Real-Time UAV Tracking | You Wu et.al. | 2407.05383 | null |
2024-07-09 | P2P: Part-to-Part Motion Cues Guide a Strong Tracking Framework for LiDAR Point Clouds | Jiahao Nie et.al. | 2407.05238 | link |
2024-07-07 | Tracking Reflected Objects: A Benchmark | Xiaoyu Guo et.al. | 2407.05235 | null |
2024-07-04 | TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers | Fatemeh Nourilenjan Nokabadi et.al. | 2407.03946 | null |
2024-07-02 | FlowTrack: Point-level Flow Network for 3D Single Object Tracking | Shuo Li et.al. | 2407.01959 | null |
2024-06-28 | eMoE-Tracker: Environmental MoE-based Transformer for Robust Event-guided Object Tracking | Yucheng Chen et.al. | 2406.20024 | null |
2024-06-14 | Constrained Motion Planning for a Robotic Endoscope Holder based on Hierarchical Quadratic Programming | Jacinto Colan et.al. | 2406.09982 | null |
2024-06-14 | Robust compressive tracking via online weighted multiple instance learning | Sandeep Singh Sengar et.al. | 2406.09914 | null |
2024-07-01 | Adaptively Bypassing Vision Transformer Blocks for Efficient Visual Tracking | Xiangyang Yang et.al. | 2406.08037 | null |
2024-06-07 | Multi-Granularity Language-Guided Multi-Object Tracking | Yuhao Li et.al. | 2406.04844 | link |
2024-06-02 | Robust Visual Tracking via Iterative Gradient Descent and Threshold Selection | Zhuang Qi et.al. | 2406.00589 | null |
2024-05-28 | Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion | Hongze Sun et.al. | 2405.17903 | link |
2024-05-27 | LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking | Shaohua Dong et.al. | 2405.17660 | null |
2024-05-31 | Awesome Multi-modal Object Tracking | Chunhui Zhang et.al. | 2405.14200 | link |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
2024-05-16 | A Novel Bounding Box Regression Method for Single Object Tracking | Omar Abdelaziz et.al. | 2405.10444 | null |
2024-05-16 | Beyond Traditional Single Object Tracking: A Survey | Omar Abdelaziz et.al. | 2405.10439 | null |
2024-05-08 | TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking | Pengcheng Shao et.al. | 2405.05004 | link |
2024-04-22 | 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos | Yinzhe Xu et.al. | 2404.13953 | null |
2024-05-25 | An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
2024-04-16 | Attention-Aware Visualization: Tracking and Responding to User Perception Over Time | Arvind Srinivasan et.al. | 2404.10732 | null |
2024-04-15 | Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Fangwei Zhong et.al. | 2404.09857 | null |
2024-04-15 | Learning Tracking Representations from Single Point Annotations | Qiangqiang Wu et.al. | 2404.09504 | null |
2024-04-11 | PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds | Weisheng Xu et.al. | 2404.07495 | link |
2024-05-02 | Longitudinal Analysis and Quantitative Assessment of Child Development through Mobile Interaction | Juan Carlos Ruiz-Garcia et.al. | 2404.06919 | null |
2024-04-09 | LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks | Jianlang Chen et.al. | 2404.06247 | link |
2024-04-08 | Semi-Supervised Novelty Detection for Precise Ultra-Wideband Error Signal Prediction | Umberto Albertin et.al. | 2404.05351 | null |
2024-03-29 | Context-Aware Integration of Language and Visual References for Natural Language Tracking | Yanyan Shao et.al. | 2403.19975 | null |
2024-03-27 | TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes | Liangyu Xu et.al. | 2403.18238 | null |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-26 | Exploring Dynamic Transformer for Efficient Object Tracking | Jiawen Zhu et.al. | 2403.17651 | null |
2024-03-29 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558 | link |
2024-03-25 | Multi-attention Associate Prediction Network for Visual Tracking | Xinglong Sun et.al. | 2403.16395 | null |
2024-03-28 | SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking | Xiaojun Hou et.al. | 2403.16002 | link |
2024-03-23 | Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking | Shaoyu Sun et.al. | 2403.15831 | null |
2024-03-19 | TON-VIO: Online Time Offset Modeling Networks for Robust Temporal Alignment in High Dynamic Motion VIO | Chaoran Xiong et.al. | 2403.12504 | null |
2024-03-18 | Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model | Jan Krejčí et.al. | 2403.11978 | null |
2024-03-16 | A Spectrum-based Image Denoising Method with Edge Feature Enhancement | Peter Luvton et.al. | 2403.11036 | null |
2024-03-15 | Autoregressive Queries for Adaptive Tracking with Spatio-TemporalTransformers | Jinxia Xie et.al. | 2403.10574 | null |
2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634 | null |
2024-02-27 | ACTrack: Adding Spatio-Temporal Condition for Visual Object Tracking | Yushan Han et.al. | 2403.07914 | null |
2024-04-03 | Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline | Xiao Wang et.al. | 2403.05839 | link |
2024-03-08 | Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance | Liting Lin et.al. | 2403.05231 | null |
2024-03-08 | Motion-Guided Dual-Camera Tracker for Low-Cost Skill Evaluation of Gastric Endoscopy | Yuelin Zhang et.al. | 2403.05146 | link |
2024-03-06 | VastTrack: Vast Category Visual Object Tracking | Liang Peng et.al. | 2403.03493 | link |
2024-02-28 | Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks | Zhewei Wu et.al. | 2402.17976 | null |
2024-02-26 | SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking | Yu Lin et.al. | 2402.16249 | link |
2024-02-26 | Reading Relevant Feature from Global Representation Memory for Visual Object Tracking | Xinyu Zhou et.al. | 2402.14392 | null |
2024-02-13 | Optimized Information Flow for Transformer Tracking | Janani Kugarajeevan et.al. | 2402.08195 | link |
2024-02-07 | BioDrone: A Bionic Drone-based Single Object Tracking Benchmark for Robust Vision | Xin Zhao et.al. | 2402.04519 | null |
2024-02-04 | Spatio-temporal Prompting Network for Robust Video Feature Extraction | Guanxiong Sun et.al. | 2402.02574 | link |
2024-01-24 | Small Object Tracking in LiDAR Point Cloud: Learning the Target-awareness Prototype and Fine-grained Search Region | Shengjing Tian et.al. | 2401.13285 | null |
2024-01-23 | Correlation-Embedded Transformer Tracking: A Single-Branch Framework | Fei Xie et.al. | 2401.12743 | link |
2024-01-20 | Unifying Visual and Vision-Language Tracking via Contrastive Learning | Yinchao Ma et.al. | 2401.11228 | link |
2024-01-20 | Towards Category Unification of 3D Single Object Tracking on Point Clouds | Jiahao Nie et.al. | 2401.11204 | null |
2024-01-18 | Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking | Amir M. Mansourian et.al. | 2401.09942 | null |
2024-01-12 | Dense Optical Flow Estimation Using Sparse Regularizers from Reduced Measurements | Muhammad Wasim Nawaz et.al. | 2401.06396 | null |
2024-01-18 | Hold 'em and Fold 'em: Towards Human-scale, Feedback-Controlled Soft Origami Robots | Immanuel Ampomah Mensah et.al. | 2401.04650 | null |
2024-01-06 | Explicit Visual Prompts for Visual Object Tracking | Liangtao Shi et.al. | 2401.03142 | link |
2024-01-03 | ODTrack: Online Dense Temporal Token Learning for Visual Tracking | Yaozong Zheng et.al. | 2401.01686 | link |
2023-12-27 | X Modality Assisting RGBT Object Tracking | Zhaisheng Ding et.al. | 2312.17273 | null |
2023-12-22 | Cross-Modal Object Tracking via Modality-Aware Fusion Network and A Large-Scale Dataset | Lei Liu et.al. | 2312.14446 | link |
2023-12-18 | Multi-Correlation Siamese Transformer Network with Dense Connection for 3D Single Object Tracking | Shihao Feng et.al. | 2312.11051 | link |
2023-12-17 | Robust 3D Tracking with Quality-Aware Shape Completion | Jingwen Zhang et.al. | 2312.10608 | null |
2023-12-15 | Tracking Skiers from the Top to the Bottom | Matteo Dunnhofer et.al. | 2312.09723 | null |
2023-12-11 | M3SOT: Multi-frame, Multi-field, Multi-space 3D Single Object Tracking | Jiaming Liu et.al. | 2312.06117 | link |
2023-12-07 | Instance Tracking in 3D Scenes from Egocentric Videos | Yunhan Zhao et.al. | 2312.04117 | link |
2024-02-19 | Beyond Visual Cues: Synchronously Exploring Target-Centric Semantics for Vision-Language Tracking | Jiawei Ge et.al. | 2311.17085 | null |
2023-11-21 | Visual tracking brain computer interface | Changxing Huang et.al. | 2311.12592 | null |
2024-01-10 | ViKi-HyCo: A Hybrid-Control approach for complex car-like maneuvers | Edison P. Velasco Sánchez et.al. | 2311.07268 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-25 | Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | Tianduo Wang et.al. | 2407.18248 | link |
2024-07-25 | LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | Zhengbo Wang et.al. | 2407.18242 | link |
2024-07-25 | Recursive Introspection: Teaching Language Model Agents How to Self-Improve | Yuxiao Qu et.al. | 2407.18219 | null |
2024-07-25 | Exploring Scaling Trends in LLM Robustness | Nikolhaus Howe et.al. | 2407.18213 | null |
2024-07-25 | AsEP: Benchmarking Deep Learning Methods for Antibody-specific Epitope Prediction | Chunan Liu et.al. | 2407.18184 | link |
2024-07-25 | Gene Regulatory Network Inference from Pre-trained Single-Cell Transcriptomics Transformer with Joint Graph Learning | Sindhura Kommu et.al. | 2407.18181 | null |
2024-07-25 | Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | Sanae Lotfi et.al. | 2407.18158 | null |
2024-07-25 | Vlad Sobal et.al. | 2407.18134 | null | |
2024-07-25 | Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | Fakhraddin Alwajih et.al. | 2407.18129 | null |
2024-07-25 | Efficient Inference of Vision Instruction-Following Models with Elastic Cache | Zuyan Liu et.al. | 2407.18121 | link |
2024-07-25 | Multi-Resolution Histopathology Patch Graphs for Ovarian Cancer Subtyping | Jack Breen et.al. | 2407.18105 | null |
2024-07-25 | Fine-Tuning Large Language Models for Stock Return Prediction Using Newsflow | Tian Guo et.al. | 2407.18103 | null |
2024-07-25 | PEFT-U: Parameter-Efficient Fine-Tuning for User Personalization | Christopher Clarke et.al. | 2407.18078 | null |
2024-07-25 | C2P: Featuring Large Language Models with Causal Reasoning | Abdolmahdi Bagheri et.al. | 2407.18069 | null |
2024-07-25 | ComPeer: A Generative Conversational Agent for Proactive Peer Support | Tianjian Liu et.al. | 2407.18064 | null |
2024-07-25 | Audio Entailment: Assessing Deductive Reasoning for Audio Understanding | Soham Deshmukh et.al. | 2407.18062 | null |
2024-07-25 | Difficulty Estimation and Simplification of French Text Using LLMs | Henri Jamet et.al. | 2407.18061 | null |
2024-07-25 | The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | Eric Yang et.al. | 2407.18044 | null |
2024-07-25 | RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models | Haoyu Chen et.al. | 2407.18035 | null |
2024-07-25 | GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy | Jan Batzner et.al. | 2407.18008 | null |
2024-07-24 | I Could've Asked That: Reformulating Unanswerable Questions | Wenting Zhao et.al. | 2407.17469 | link |
2024-07-24 | WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries | Wenting Zhao et.al. | 2407.17468 | null |
2024-07-24 | CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models | Jiawei Gu et.al. | 2407.17467 | null |
2024-07-24 | Yunhao Fang et.al. | 2407.17453 | null | |
2024-07-24 | Fluent Student-Teacher Redteaming | T. Ben Thompson et.al. | 2407.17447 | link |
2024-07-24 | Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | Michael-Andrei Panaitescu-Liess et.al. | 2407.17417 | null |
2024-07-24 | (PASS) Visual Prompt Locates Good Structure Sparsity through a Recurrent HyperNetwork | Tianjin Huang et.al. | 2407.17412 | null |
2024-07-24 | Dependency Transformer Grammars: Integrating Dependency Structures into Transformer Language Models | Yida Zhao et.al. | 2407.17406 | link |
2024-07-24 | Grammar-based Game Description Generation using Large Language Models | Tsunehiko Tanaka et.al. | 2407.17404 | null |
2024-07-24 | 3D Question Answering for City Scene Understanding | Penglei Sun et.al. | 2407.17398 | null |
2024-07-24 | PERSONA: A Reproducible Testbed for Pluralistic Alignment | Louis Castricato et.al. | 2407.17387 | null |
2024-07-24 | A Comprehensive Approach to Misspelling Correction with BERT and Levenshtein Distance | Amirreza Naziri et.al. | 2407.17383 | null |
2024-07-24 | MMRA: A Benchmark for Multi-granularity Multi-image Relational Association | Siwei Wu et.al. | 2407.17379 | null |
2024-07-24 | ViPer: Visual Personalization of Generative Models via Individual Preference Learning | Sogand Salehi et.al. | 2407.17365 | null |
2024-07-24 | Gradient-based inference of abstract task representations for generalization in neural networks | Ali Hummos et.al. | 2407.17356 | null |
2024-07-24 | Scalify: scale propagation for efficient low-precision LLM training | Paul Balança et.al. | 2407.17353 | link |
2024-07-24 | Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching | Yuyang Ding et.al. | 2407.17349 | null |
2024-07-24 | DexGANGrasp: Dexterous Generative Adversarial Grasping Synthesis for Task-Oriented Manipulation | Qian Feng et.al. | 2407.17348 | null |
2024-07-24 | Label Alignment and Reassignment with Generalist Large Language Model for Enhanced Cross-Domain Named Entity Recognition | Ke Bao et.al. | 2407.17344 | null |
2024-07-24 | How Good (Or Bad) Are LLMs at Detecting Misleading Visualizations? | Leo Yu-Ho Lo et.al. | 2407.17291 | null |
2024-07-23 | PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects | Junyi Li et.al. | 2407.16696 | null |
2024-07-23 | Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack | Xiaoyue Xu et.al. | 2407.16695 | null |
2024-07-23 | Can Large Language Models Automatically Jailbreak GPT-4V? | Yuanwei Wu et.al. | 2407.16686 | null |
2024-07-23 | SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation | Pengfei Chen et.al. | 2407.16682 | null |
2024-07-23 | RedAgent: Red Teaming Large Language Models with Context-aware Autonomous Language Agent | Huiyu Xu et.al. | 2407.16667 | null |
2024-07-23 | Course-Correction: Safety Alignment Using Synthetic Preferences | Rongwu Xu et.al. | 2407.16637 | null |
2024-07-23 | Lawma: The Power of Specialization for Legal Tasks | Ricardo Dominguez-Olmedo et.al. | 2407.16615 | null |
2024-07-23 | Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? | Jonathan Hayase et.al. | 2407.16607 | null |
2024-07-23 | Shared Imagination: LLMs Hallucinate Alike | Yilun Zhou et.al. | 2407.16604 | null |
2024-07-23 | A Comparative Study on Patient Language across Therapeutic Domains for Effective Patient Voice Classification in Online Health Discussions | Giorgos Lysandrou et.al. | 2407.16593 | null |
2024-07-23 | Exploring Automatic Cryptographic API Misuse Detection in the Era of LLMs | Yifan Xia et.al. | 2407.16576 | null |
2024-07-23 | TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback | Eunseop Yoon et.al. | 2407.16574 | null |
2024-07-23 | Retrieve, Generate, Evaluate: A Case Study for Medical Paraphrases Generation with Small Language Models | Ioana Buhnila et.al. | 2407.16565 | null |
2024-07-23 | Patched RTC: evaluating LLMs for diverse software development tasks | Asankhaya Sharma et.al. | 2407.16557 | null |
2024-07-24 | MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues | Liyun Zhang et.al. | 2407.16552 | null |
2024-07-23 | Quantifying the Role of Textual Predictability in Automatic Speech Recognition | Sean Robertson et.al. | 2407.16537 | null |
2024-07-23 | Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models | Aristeidis Panos et.al. | 2407.16526 | null |
2024-07-23 | AMONGAGENTS: Evaluating Large Language Models in the Interactive Text-Based Social Deduction Game | Yizhou Chi et.al. | 2407.16521 | null |
2024-07-23 | Language-Based Security for Low-Level MPC | Christian Skalka et.al. | 2407.16504 | null |
2024-07-23 | Machine Translation Hallucination Detection for Low and High Resource Languages using Large Language Models | Kenza Benkirane et.al. | 2407.16470 | null |
2024-07-22 | AutoAD-Zero: A Training-Free Framework for Zero-Shot Audio Description | Junyu Xie et.al. | 2407.15850 | link |
2024-07-22 | LLMmap: Fingerprinting For Large Language Models | Dario Pasquini et.al. | 2407.15847 | null |
2024-07-22 | SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | Mingze Xu et.al. | 2407.15841 | null |
2024-07-22 | MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity | Yangzhou Liu et.al. | 2407.15838 | null |
2024-07-22 | dMel: Speech Tokenization made Simple | He Bai et.al. | 2407.15835 | null |
2024-07-22 | J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling | Wataru Nakata et.al. | 2407.15828 | null |
2024-07-22 | Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight | Ziyuan Huang et.al. | 2407.15819 | null |
2024-07-22 | Perceptions of Linguistic Uncertainty by Language Models and Humans | Catarina G Belem et.al. | 2407.15814 | link |
2024-07-22 | AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection | Yunkang Cao et.al. | 2407.15795 | link |
2024-07-22 | CLIP with Generative Latent Replay: a Strong Baseline for Incremental Learning | Emanuele Frascaroli et.al. | 2407.15793 | link |
2024-07-22 | Extracting Structured Insights from Financial News: An Augmented LLM Driven Approach | Rian Dolphin et.al. | 2407.15788 | null |
2024-07-22 | Concept-Based Interpretable Reinforcement Learning with Limited to No Human Labels | Zhuorui Ye et.al. | 2407.15786 | null |
2024-07-22 | Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning | Kaiwen Wang et.al. | 2407.15762 | null |
2024-07-22 | MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | Marco Simoni et.al. | 2407.15748 | null |
2024-07-22 | OMoS-QA: A Dataset for Cross-Lingual Extractive Question Answering in a German Migration Context | Steffen Kleinle et.al. | 2407.15736 | null |
2024-07-22 | TaskGen: A Task-Based, Memory-Infused Agentic Framework using StrictJSON | John Chong Min Tan et.al. | 2407.15734 | null |
2024-07-22 | Zero-Shot Embeddings Inform Learning and Forgetting with Vision-Language Encoders | Laura Niss et.al. | 2407.15731 | null |
2024-07-22 | SAM2CLIP2SAM: Vision Language Model for Segmentation of 3D CT Scans for Covid-19 Detection | Dimitrios Kollias et.al. | 2407.15728 | null |
2024-07-22 | DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design | Zhi Hao Luo et.al. | 2407.15723 | link |
2024-07-22 | Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | Zhuoyan Xu et.al. | 2407.15720 | link |
2024-07-19 | Internal Consistency and Self-Feedback in Large Language Models: A Survey | Xun Liang et.al. | 2407.14507 | link |
2024-07-19 | On Pre-training of Multimodal Language Models Customized for Chart Understanding | Wan-Cyuan Fan et.al. | 2407.14506 | null |
2024-07-19 | PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding | Chenshu Hou et.al. | 2407.14491 | null |
2024-07-19 | Evaluating the Reliability of Self-Explanations in Large Language Models | Korbinian Randl et.al. | 2407.14487 | link |
2024-07-19 | Data-Centric Human Preference Optimization with Rationales | Hoang Anh Just et.al. | 2407.14477 | null |
2024-07-19 | Contrastive Learning with Counterfactual Explanations for Radiology Report Generation | Mingjie Li et.al. | 2407.14474 | null |
2024-07-19 | Check-Eval: A Checklist-based Approach for Evaluating Text Quality | Jayr Pereira et.al. | 2407.14467 | null |
2024-07-19 | Undermining Mental Proof: How AI Can Make Cooperation Harder by Making Thinking Easier | Zachary Wojtowicz et.al. | 2407.14452 | null |
2024-07-19 | Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding | Renshan Zhang et.al. | 2407.14439 | link |
2024-07-19 | Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Senthooran Rajamanoharan et.al. | 2407.14435 | null |
2024-07-19 | Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | HamidReza Imani et.al. | 2407.14417 | null |
2024-07-19 | System-1.x: Learning to Balance Fast and Slow Planning with Language Models | Swarnadeep Saha et.al. | 2407.14414 | link |
2024-07-19 | DEAL: Disentangle and Localize Concept-level Explanations for VLMs | Tang Li et.al. | 2407.14412 | null |
2024-07-19 | The Vision of Autonomic Computing: Can LLMs Make It a Reality? | Zhiyang Zhang et.al. | 2407.14402 | null |
2024-07-19 | Frontiers of Deep Learning: From Novel Application to Real-World Deployment | Rui Xie et.al. | 2407.14386 | null |
2024-07-19 | Open Artificial Knowledge | Vadim Borisov et.al. | 2407.14371 | null |
2024-07-19 | Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models | Xuenan Xu et.al. | 2407.14355 | null |
2024-07-19 | Improving Retrieval in Sponsored Search by Leveraging Query Context Signals | Akash Kumar Mohankumar et.al. | 2407.14346 | null |
2024-07-19 | LLMs left, right, and center: Assessing GPT's capabilities to label political bias from web domains | Raphael Hernandes et.al. | 2407.14344 | null |
2024-07-19 | Multimodal Misinformation Detection using Large Vision-Language Models | Sahar Tahmasebi et.al. | 2407.14321 | null |
2024-07-18 | Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data | Charles Jin et.al. | 2407.13765 | null |
2024-07-18 | SegPoint: Segment Any Point Cloud via Large Language Model | Shuting He et.al. | 2407.13761 | null |
2024-07-18 | Black-Box Opinion Manipulation Attacks to Retrieval-Augmented Generation of Large Language Models | Zhuo Chen et.al. | 2407.13757 | null |
2024-07-18 | CellularLint: A Systematic Approach to Identify Inconsistent Behavior in Cellular Network Specifications | Mirza Masfiqur Rahman et.al. | 2407.13742 | null |
2024-07-18 | Baba Is AI: Break the Rules to Beat the Benchmark | Nathan Cloos et.al. | 2407.13729 | null |
2024-07-18 | CoDefeater: Using LLMs To Find Defeaters in Assurance Cases | Usman Gohar et.al. | 2407.13717 | link |
2024-07-18 | Understanding Reference Policies in Direct Preference Optimization | Yixin Liu et.al. | 2407.13709 | null |
2024-07-18 | A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice | Shaina Raza et.al. | 2407.13699 | null |
2024-07-18 | Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation | Yotam Perlitz et.al. | 2407.13696 | link |
2024-07-18 | Prover-Verifier Games improve legibility of LLM outputs | Jan Hendrik Kirchner et.al. | 2407.13692 | null |
2024-07-18 | Shaded Route Planning Using Active Segmentation and Identification of Satellite Images | Longchao Da et.al. | 2407.13689 | null |
2024-07-18 | FuLG: 150B Romanian Corpus for Language Model Pretraining | Vlad-Andrei Bădoiu et.al. | 2407.13657 | null |
2024-07-18 | COMCAT: Leveraging Human Judgment to Improve Automatic Documentation and Summarization | Skyler Grandel et.al. | 2407.13648 | null |
2024-07-18 | Weak-to-Strong Reasoning | Yuqing Yang et.al. | 2407.13647 | link |
2024-07-18 | Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies | Chaofan Tao et.al. | 2407.13623 | null |
2024-07-18 | KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration | Youfu Yan et.al. | 2407.13598 | null |
2024-07-18 | PLANTS: A Novel Problem and Dataset for Summarization of Planning-Like (PL) Tasks | Vishal Pallagani et.al. | 2407.13597 | null |
2024-07-18 | EarthMarker: A Visual Prompt Learning Framework for Region-level and Point-level Remote Sensing Imagery Comprehension | Wei Zhang et.al. | 2407.13596 | null |
2024-07-18 | Robust Calibration of Large Vision-Language Adapters | Balamurali Murugesan et.al. | 2407.13588 | link |
2024-07-18 | Towards Zero-Shot Multimodal Machine Translation | Matthieu Futeral et.al. | 2407.13579 | link |
2024-07-17 | LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | Kaichen Zhang et.al. | 2407.12772 | link |
2024-07-17 | EchoSight: Advancing Visual-Language Models with Wiki Knowledge | Yibin Yan et.al. | 2407.12735 | null |
2024-07-17 | NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model | Zhongqun Zhang et.al. | 2407.12727 | null |
2024-07-17 | Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models? | Ben Yao et.al. | 2407.12725 | null |
2024-07-17 | The Future of Learning: Large Language Models through the Lens of Students | He Zhang et.al. | 2407.12723 | null |
2024-07-17 | MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models | Leyang Shen et.al. | 2407.12709 | link |
2024-07-17 | Subgraph-Aware Training of Text-based Methods for Knowledge Graph Completion | Youmin Ko et.al. | 2407.12703 | null |
2024-07-17 | Patch-Level Training for Large Language Models | Chenze Shao et.al. | 2407.12665 | link |
2024-07-17 | Zero-shot Text-guided Infinite Image Synthesis with LLM guidance | Soyeong Kwon et.al. | 2407.12642 | null |
2024-07-17 | Domain-specific or Uncertainty-aware models: Does it really make a difference for biomedical text classification? | Aman Sinha et.al. | 2407.12626 | null |
2024-07-17 | Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences | Claudio Pinhanez et.al. | 2407.12620 | null |
2024-07-17 | AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism | William Brannon et.al. | 2407.12613 | link |
2024-07-17 | VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | Ofir Abramovich et.al. | 2407.12594 | null |
2024-07-18 | Benchmarking Robust Self-Supervised Learning Across Diverse Downstream Tasks | Antoni Kowalczuk et.al. | 2407.12588 | link |
2024-07-17 | E5-V: Universal Embeddings with Multimodal Large Language Models | Ting Jiang et.al. | 2407.12580 | link |
2024-07-17 | Audio Conditioning for Music Generation via Discrete Bottleneck Features | Simon Rouard et.al. | 2407.12563 | null |
2024-07-17 | Conspiracy theories and where to find them on TikTok | Francesco Corso et.al. | 2407.12545 | null |
2024-07-17 | Abstraction Alignment: Comparing Model and Human Conceptual Relationships | Angie Boggust et.al. | 2407.12543 | link |
2024-07-17 | Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models | Xihe Qiu et.al. | 2407.12532 | null |
2024-07-17 | Crafting the Path: Robust Query Rewriting for Information Retrieval | Ingeol Baek et.al. | 2407.12529 | null |
2024-07-16 | UrbanWorld: An Urban World Model for 3D City Generation | Yu Shang et.al. | 2407.11965 | null |
2024-07-16 | NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | Mo Li et.al. | 2407.11963 | link |
2024-07-16 | Code Documentation and Analysis to Secure Software Development | Paul Attie et.al. | 2407.11934 | null |
2024-07-16 | What's Wrong? Refining Meeting Summaries with LLM Feedback | Frederic Kirstein et.al. | 2407.11919 | null |
2024-07-16 | GraphFM: A Scalable Framework for Multi-Graph Pretraining | Divyansha Lachi et.al. | 2407.11907 | null |
2024-07-16 | Ascend-CC: Confidential Computing on Heterogeneous NPU for Emerging Generative AI Workloads | Aritra Dhar et.al. | 2407.11888 | null |
2024-07-16 | Zero-shot Cross-Lingual Transfer for Synthetic Data Generation in Grammatical Error Detection | Gaetan Lopez Latouche et.al. | 2407.11854 | null |
2024-07-16 | Schema Matching with Large Language Models: an Experimental Study | Marcel Parciak et.al. | 2407.11852 | link |
2024-07-16 | LoFTI: Localization and Factuality Transfer to Indian Locales | Sona Elza Simon et.al. | 2407.11833 | link |
2024-07-16 | GPT Assisted Annotation of Rhetorical and Linguistic Features for Interpretable Propaganda Technique Detection in News Text | Kyle Hamilton et.al. | 2407.11827 | null |
2024-07-16 | PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation | Branden Butler et.al. | 2407.11798 | null |
2024-07-16 | Large Language Models as Misleading Assistants in Conversation | Betty Li Hou et.al. | 2407.11789 | null |
2024-07-16 | SwitchCIT: Switching for Continual Instruction Tuning of Large Language Models | Xinbo Wu et.al. | 2407.11780 | null |
2024-07-16 | Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text | Seyedeh Fatemeh Ebrahimi et.al. | 2407.11774 | null |
2024-07-16 | Educational Personalized Learning Path Planning with Large Language Models | Chee Ng et.al. | 2407.11773 | null |
2024-07-16 | XEdgeAI: A Human-centered Industrial Inspection Framework with Data-centric Explainable Edge AI Approach | Truong Thanh Hung Nguyen et.al. | 2407.11771 | null |
2024-07-16 | Robust Utility-Preserving Text Anonymization Based on Large Language Models | Tianyu Yang et.al. | 2407.11770 | link |
2024-07-16 | Vectoring Languages | Joseph Chen et.al. | 2407.11766 | null |
2024-07-16 | Exploring Quantization for Efficient Pre-Training of Transformer Language Models | Kamran Chitsaz et.al. | 2407.11722 | link |
2024-07-16 | Harnessing Large Language Models for Multimodal Product Bundling | Xiaohao Liu et.al. | 2407.11712 | null |
2024-07-15 | VGBench: Evaluating Large Language Models on Vector Graphics Understanding and Generation | Bocheng Zou et.al. | 2407.10972 | link |
2024-07-15 | Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | Hongyu Wang et.al. | 2407.10969 | null |
2024-07-15 | Fast Matrix Multiplications for Lookup Table-Quantized LLMs | Han Guo et.al. | 2407.10960 | null |
2024-07-15 | Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | Ruisheng Cao et.al. | 2407.10956 | link |
2024-07-15 | MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | Chengguang Gan et.al. | 2407.10953 | null |
2024-07-15 | Can Textual Semantics Mitigate Sounding Object Segmentation Preference? | Yaoting Wang et.al. | 2407.10947 | link |
2024-07-15 | Learning from Naturally Occurring Feedback | Shachar Don-Yehiya et.al. | 2407.10944 | link |
2024-07-15 | GRUtopia: Dream General Robots in a City at Scale | Hanqing Wang et.al. | 2407.10943 | link |
2024-07-15 | Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | Dilara Soylu et.al. | 2407.10930 | null |
2024-07-15 | Benchmarking Vision Language Models for Cultural Understanding | Shravan Nayak et.al. | 2407.10920 | null |
2024-07-15 | FinDKG: Dynamic Knowledge Graphs with Large Language Models for Detecting Global Trends in Financial Markets | Xiaohui Victor Li et.al. | 2407.10909 | link |
2024-07-15 | Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique | Mark Russinovich et.al. | 2407.10887 | null |
2024-07-15 | SLIP: Securing LLMs IP Using Weights Decomposition | Yehonathan Refael et.al. | 2407.10886 | null |
2024-07-15 | Understanding the Importance of Evolutionary Search in Automated Heuristic Design with Large Language Models | Rui Zhang et.al. | 2407.10873 | null |
2024-07-15 | GPT Sonograpy: Hand Gesture Decoding from Forearm Ultrasound Images via VLM | Keshav Bimbraw et.al. | 2407.10870 | null |
2024-07-15 | Physics-Inspired Generative Models in Medical Imaging: A Review | Dennis Hein et.al. | 2407.10856 | null |
2024-07-15 | Weighted Grouped Query Attention in Transformers | Sai Sena Chinnakonduru et.al. | 2407.10855 | null |
2024-07-15 | An Actionable Framework for Assessing Bias and Fairness in Large Language Model Use Cases | Dylan Bouchard et.al. | 2407.10853 | null |
2024-07-15 | MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs | Quang H. Nguyen et.al. | 2407.10834 | null |
2024-07-15 | BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy | Tim Menzner et.al. | 2407.10829 | null |
2024-07-12 | FairyLandAI: Personalized Fairy Tales utilizing ChatGPT and DALLE-3 | Georgios Makridis et.al. | 2407.09467 | null |
2024-07-12 | Human-like Episodic Memory for Infinite Context LLMs | Zafeirios Fountas et.al. | 2407.09450 | null |
2024-07-12 | ASTPrompter: Weakly Supervised Automated Language Model Red-Teaming to Identify Likely Toxic Prompts | Amelia F. Hardy et.al. | 2407.09447 | link |
2024-07-12 | MUSCLE: A Model Update Strategy for Compatible LLM Evolution | Jessica Echterhoff et.al. | 2407.09435 | null |
2024-07-12 | A Perspective on Foundation Models for the Electric Power Grid | Hendrik F. Hamann et.al. | 2407.09434 | null |
2024-07-12 | Open (Clinical) LLMs are Sensitive to Instruction Phrasings | Alberto Mario Ceballos Arroyo et.al. | 2407.09429 | link |
2024-07-12 | TelecomGPT: A Framework to Build Telecom-Specfic Large Language Models | Hang Zou et.al. | 2407.09424 | null |
2024-07-12 | Mitigating Entity-Level Hallucination in Large Language Models | Weihang Su et.al. | 2407.09417 | link |
2024-07-12 | SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | Shraman Pramanick et.al. | 2407.09413 | link |
2024-07-12 | Deep Bag-of-Words Model: An Efficient and Interpretable Relevance Architecture for Chinese E-Commerce | Zhe Lin et.al. | 2407.09395 | null |
2024-07-12 | PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | Saber Zerhoudi et.al. | 2407.09394 | link |
2024-07-12 | GAVEL: Generating Games Via Evolution and Language Models | Graham Todd et.al. | 2407.09388 | null |
2024-07-12 | Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text | Lucio La Cava et.al. | 2407.09364 | null |
2024-07-12 | Good Intentions, Risky Inventions: A Method for Assessing the Risks and Benefits of AI in Mobile and Wearable Uses | Marios Constantinides et.al. | 2407.09322 | link |
2024-07-12 | Scalability of Bayesian Network Structure Elicitation with Large Language Models: a Novel Methodology and Comparative Analysis | Nikolay Babakov et.al. | 2407.09311 | null |
2024-07-12 | Transformer Layers as Painters | Qi Sun et.al. | 2407.09298 | null |
2024-07-12 | Security Matrix for Multimodal Agents on Mobile Devices: A Systematic and Proof of Concept Study | Yulong Yang et.al. | 2407.09295 | null |
2024-07-12 | CEIPA: Counterfactual Explainable Incremental Prompt Attack Analysis on Large Language Models | Dong Shu et.al. | 2407.09292 | null |
2024-07-12 | Structuring Authenticity Assessments on Historical Documents using LLMs | Andrea Schimmenti et.al. | 2407.09290 | null |
2024-07-12 | WSESeg: Introducing a Dataset for the Segmentation of Winter Sports Equipment with a Baseline for Interactive Segmentation | Robin Schön et.al. | 2407.09288 | null |
2024-07-11 | MAVIS: Mathematical Visual Instruction Tuning | Renrui Zhang et.al. | 2407.08739 | link |
2024-07-11 | Real-Time Anomaly Detection and Reactive Planning with Large Language Models | Rohan Sinha et.al. | 2407.08735 | null |
2024-07-11 | Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | Zihao Zhou et.al. | 2407.08733 | null |
2024-07-11 | A Taxonomy for Data Contamination in Large Language Models | Medha Palavalli et.al. | 2407.08716 | null |
2024-07-11 | GTA: A Benchmark for General Tool Agents | Jize Wang et.al. | 2407.08713 | link |
2024-07-11 | eyeballvul: a future-proof benchmark for vulnerability detection in the wild | Timothee Chauvin et.al. | 2407.08708 | link |
2024-07-11 | Extracting Training Data from Document-Based VQA Models | Francesco Pinto et.al. | 2407.08707 | null |
2024-07-11 | HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models | Runhui Huang et.al. | 2407.08706 | null |
2024-07-11 | Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models | Zhening Xing et.al. | 2407.08701 | null |
2024-07-11 | Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | Anton Alexandrov et.al. | 2407.08699 | null |
2024-07-11 | Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight | Zhiqiang Xie et.al. | 2407.08694 | null |
2024-07-11 | Robotic Control via Embodied Chain-of-Thought Reasoning | Zawalski Michał et.al. | 2407.08693 | null |
2024-07-11 | SEED-Story: Multimodal Long Story Generation with Large Language Model | Shuai Yang et.al. | 2407.08683 | link |
2024-07-11 | NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Yi Zhang et.al. | 2407.08672 | null |
2024-07-11 | Uncertainty Estimation of Large Language Models in Medical Question Answering | Jiaxin Wu et.al. | 2407.08662 | null |
2024-07-11 | Towards Building Specialized Generalist AI with System 1 and System 2 Fusion | Kaiyan Zhang et.al. | 2407.08642 | null |
2024-07-11 | Junkang Wu et.al. | 2407.08639 | link | |
2024-07-11 | RoboMorph: Evolving Robot Morphology using Large Language Models | Kevin Qiu et.al. | 2407.08626 | null |
2024-07-11 | Tamil Language Computing: the Present and the Future | Kengatharaiyer Sarveswaran et.al. | 2407.08618 | null |
2024-07-11 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | Jay Shah et.al. | 2407.08608 | null |
2024-07-10 | Training on the Test Task Confounds Evaluation and Emergence | Ricardo Dominguez-Olmedo et.al. | 2407.07890 | link |
2024-07-10 | Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization | Junkang Wu et.al. | 2407.07880 | link |
2024-07-11 | Toto: Time Series Optimized Transformer for Observability | Ben Cohen et.al. | 2407.07874 | null |
2024-07-10 | FACTS About Building Retrieval Augmented Generation-based Chatbots | Rama Akkiraju et.al. | 2407.07858 | null |
2024-07-10 | OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training | Sami Jaghouar et.al. | 2407.07852 | link |
2024-07-10 | Natural Language Mechanisms via Self-Resolution with Foundation Models | Nicolas Della Penna et.al. | 2407.07845 | null |
2024-07-10 | Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective | Shengjia Chen et.al. | 2407.07841 | link |
2024-07-10 | Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison | Qian Yang et.al. | 2407.07840 | null |
2024-07-10 | Transformer Alignment in Large Language Models | Murdock Aubry et.al. | 2407.07810 | null |
2024-07-11 | AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning | Jongsuk Kim et.al. | 2407.07801 | link |
2024-07-10 | Attribute or Abstain: Large Language Models as Long Document Assistants | Jan Buchmann et.al. | 2407.07799 | link |
2024-07-11 | Evaluating Large Language Models with Grid-Based Game Competitions: An Extensible LLM Benchmark and Leaderboard | Oguzhan Topsakal et.al. | 2407.07796 | link |
2024-07-10 | Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communities | Tianjie Ju et.al. | 2407.07791 | link |
2024-07-10 | WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment | Jiefu Ou et.al. | 2407.07778 | null |
2024-07-10 | Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs | Hao-Tien Lewis Chiang et.al. | 2407.07775 | null |
2024-07-10 | Can ChatGPT Pass a Theory of Computing Course? | Matei A. Golesteanu et.al. | 2407.07757 | null |
2024-07-10 | Fine-Tuning Large Language Models with User-Level Differential Privacy | Zachary Charles et.al. | 2407.07737 | null |
2024-07-10 | PaliGemma: A versatile 3B VLM for transfer | Lucas Beyer et.al. | 2407.07726 | link |
2024-07-10 | Why should we ever automate moral decision making? | Vincent Conitzer et.al. | 2407.07671 | null |
2024-07-10 | A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability | Ting Fang Tan et.al. | 2407.07666 | null |
2024-07-09 | AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning | Jiaxi Cui et.al. | 2407.07094 | link |
2024-07-09 | FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | Liqun Ma et.al. | 2407.07093 | link |
2024-07-09 | CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation | Tong Chen et.al. | 2407.07087 | link |
2024-07-09 | Hypothetical Minds: Scaffolding Theory of Mind for Multi-Agent Tasks with Large Language Models | Logan Cross et.al. | 2407.07086 | link |
2024-07-09 | Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities | Shaltiel Shmidman et.al. | 2407.07080 | null |
2024-07-09 | Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | Yung-Sung Chuang et.al. | 2407.07071 | link |
2024-07-09 | Prompting Techniques for Secure Code Generation: A Systematic Investigation | Catherine Tony et.al. | 2407.07064 | null |
2024-07-09 | Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | Weize Chen et.al. | 2407.07061 | link |
2024-07-09 | Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | Wenqi Zhang et.al. | 2407.07053 | link |
2024-07-09 | ProtoSAM -- One Shot Medical Image Segmentation With Foundational Models | Lev Ayzenberg et.al. | 2407.07042 | link |
2024-07-09 | Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models | Yue Zhang et.al. | 2407.07035 | null |
2024-07-09 | Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization | Jeongseok Hyun et.al. | 2407.07024 | link |
2024-07-09 | Using Large Language Models for Generating Smart Contracts for Health Insurance from Textual Policies | Inwon Kang et.al. | 2407.07019 | null |
2024-07-09 | End-To-End Causal Effect Estimation from Unstructured Natural Language Data | Nikita Dhawan et.al. | 2407.07018 | null |
2024-07-09 | Is Large Language Model All You Need to Predict the Synthesizability and Precursors of Crystal Structures? | Zhilong Song et.al. | 2407.07016 | null |
2024-07-09 | Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning | J. Crosbie et.al. | 2407.07011 | null |
2024-07-09 | Metron: Holistic Performance Evaluation Framework for LLM Inference Systems | Amey Agrawal et.al. | 2407.07000 | link |
2024-07-09 | Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective | Yu-An Liu et.al. | 2407.06992 | link |
2024-07-09 | Segment-Based Interactive Machine Translation for Pre-trained Models | Angel Navarro et.al. | 2407.06990 | null |
2024-07-09 | Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models | Yi-Cheng Lin et.al. | 2407.06957 | link |
2024-07-08 | Multi-Object Hallucination in Vision-Language Models | Xuweiyi Chen et.al. | 2407.06192 | null |
2024-07-08 | 4D Contrastive Superflows are Dense 3D Representation Learners | Xiang Xu et.al. | 2407.06190 | link |
2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189 | link |
2024-07-08 | CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation | Xinying Guo et.al. | 2407.06188 | null |
2024-07-08 | JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation | Yu Zeng et.al. | 2407.06187 | null |
2024-07-08 | Vision-Language Models under Cultural and Inclusive Considerations | Antonia Karamolegkou et.al. | 2407.06177 | null |
2024-07-08 | On Speeding Up Language Model Evaluation | Jin Peng Zhou et.al. | 2407.06172 | null |
2024-07-08 | What's Wrong with Your Code Generated by Large Language Models? An Extensive Study | Shihan Dou et.al. | 2407.06153 | null |
2024-07-09 | Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks | Lukas Netz et.al. | 2407.06146 | null |
2024-07-08 | ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | Ethan Chern et.al. | 2407.06135 | link |
2024-07-08 | Evaluating the Semantic Profiling Abilities of LLMs for Natural Language Utterances in Data Visualization | Hannah K. Bako et.al. | 2407.06129 | link |
2024-07-08 | Depression Detection and Analysis using Large Language Models on Textual and Audio-Visual Modalities | Avinash Anand et.al. | 2407.06125 | null |
2024-07-08 | Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning | Yadong Zhang et.al. | 2407.06112 | null |
2024-07-08 | Artificial Intuition: Efficient Classification of Scientific Abstracts | Harsh Sakhrani et.al. | 2407.06093 | null |
2024-07-08 | Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models | Jinliang Lu et.al. | 2407.06089 | null |
2024-07-08 | From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty | Maor Ivgi et.al. | 2407.06071 | link |
2024-07-08 | Variational Best-of-N Alignment | Afra Amini et.al. | 2407.06057 | null |
2024-07-08 | MST5 -- Multilingual Question Answering over Knowledge Graphs | Nikit Srivastava et.al. | 2407.06041 | link |
2024-07-08 | PAS: Data-Efficient Plug-and-Play Prompt Augmentation System | Miao Zheng et.al. | 2407.06027 | null |
2024-07-08 | iLLM-TSC: Integration reinforcement learning and large language model for traffic signal control policy improvement | Aoyu Pang et.al. | 2407.06025 | link |
2024-07-05 | Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | Rudolf Laine et.al. | 2407.04694 | link |
2024-07-05 | ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | Yuzhe Gu et.al. | 2407.04693 | link |
2024-07-05 | Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge | Yuanze Lin et.al. | 2407.04681 | null |
2024-07-05 | Lost in Translation: The Algorithmic Gap Between LMs and the Brain | Tommaso Tosato et.al. | 2407.04680 | null |
2024-07-05 | Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | Ye Bai et.al. | 2407.04675 | null |
2024-07-05 | Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement | Yongji Wu et.al. | 2407.04656 | null |
2024-07-05 | Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models | Bolaji Yusuf et.al. | 2407.04641 | null |
2024-07-05 | Entity Decomposition with Filtering: A Zero-Shot Clinical Named Entity Recognition Framework | Reza Averly et.al. | 2407.04629 | null |
2024-07-05 | On scalable oversight with weak LLMs judging strong LLMs | Zachary Kenton et.al. | 2407.04622 | null |
2024-07-05 | CountGD: Multi-Modal Open-World Counting | Niki Amini-Naieni et.al. | 2407.04619 | null |
2024-07-05 | ARM: Efficient Guided Decoding with Autoregressive Reward Models | Sergey Troshin et.al. | 2407.04615 | null |
2024-07-05 | AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation | Yuhan Zhu et.al. | 2407.04603 | null |
2024-07-05 | Written Term Detection Improves Spoken Term Detection | Bolaji Yusuf et.al. | 2407.04601 | link |
2024-07-05 | Testing learning hypotheses using neural networks by manipulating learning data | Cara Su-Yi Leong et.al. | 2407.04593 | null |
2024-07-05 | Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions | Shumaila Javaid et.al. | 2407.04581 | null |
2024-07-05 | VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models | Hang Gao et.al. | 2407.04573 | null |
2024-07-05 | Not (yet) the whole story: Evaluating Visual Storytelling Requires More than Measuring Coherence, Grounding, and Repetition | Aditya K Surikuchi et.al. | 2407.04559 | null |
2024-07-05 | Spontaneous Reward Hacking in Iterative Self-Refinement | Jane Pan et.al. | 2407.04549 | null |
2024-07-05 | PoPreRo: A New Dataset for Popularity Prediction of Romanian Reddit Posts | Ana-Cristina Rogoz et.al. | 2407.04541 | link |
2024-07-05 | GPT vs RETRO: Exploring the Intersection of Retrieval and Parameter-Efficient Fine-Tuning | Aleksander Ficek et.al. | 2407.04528 | null |
2024-07-03 | Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages | Max Zuo et.al. | 2407.03321 | link |
2024-07-03 | InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Pan Zhang et.al. | 2407.03320 | link |
2024-07-03 | BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations | Zhantao Yang et.al. | 2407.03314 | null |
2024-07-03 | Universal Length Generalization with Turing Programs | Kaiying Hou et.al. | 2407.03310 | null |
2024-07-03 | Large Language Models for JSON Schema Discovery | Michael J. Mior et.al. | 2407.03286 | null |
2024-07-03 | LLM Internal States Reveal Hallucination Risk Faced With a Query | Ziwei Ji et.al. | 2407.03282 | null |
2024-07-03 | STF: Sentence Transformer Fine-Tuning For Topic Categorization With Limited Data | Kheir Eddine Daouadi et.al. | 2407.03253 | null |
2024-07-03 | Improving Retrieval-augmented Text-to-SQL with AST-based Ranking and Schema Pruning | Zhili Shen et.al. | 2407.03227 | null |
2024-07-03 | How Does Quantization Affect Multilingual LLMs? | Kelly Marchisio et.al. | 2407.03211 | null |
2024-07-03 | TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | Ruida Wang et.al. | 2407.03203 | link |
2024-07-03 | Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models | Haritz Puerto et.al. | 2407.03181 | link |
2024-07-03 | Investigating Decoder-only Large Language Models for Speech-to-text Translation | Chao-Wei Huang et.al. | 2407.03169 | null |
2024-07-03 | SOS! Soft Prompt Attack Against Open-Source Large Language Models | Ziqing Yang et.al. | 2407.03160 | null |
2024-07-03 | Let the Code LLM Edit Itself When You Edit the Code | Zhenyu He et.al. | 2407.03157 | null |
2024-07-03 | Reinforcement Learning for Sequence Design Leveraging Protein Language Models | Jithendaraa Subramanian et.al. | 2407.03154 | null |
2024-07-03 | Enhancing Translation Accuracy of Large Language Models through Continual Pre-Training on Parallel Data | Minato Kondo et.al. | 2407.03145 | null |
2024-07-03 | Social Bias Evaluation for Large Language Models Requires Prompt Variations | Rem Hida et.al. | 2407.03129 | link |
2024-07-03 | KeyVideoLLM: Towards Large-scale Video Keyframe Selection | Hao Liang et.al. | 2407.03104 | null |
2024-07-03 | Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory | Suyeon Lee et.al. | 2407.03103 | link |
2024-07-03 | ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text Monitoring | Le Fang et.al. | 2407.03063 | null |
2024-07-02 | MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Huiqiang Jiang et.al. | 2407.02490 | link |
2024-07-02 | Neurocache: Efficient Vector Retrieval for Long-range Language Modeling | Ali Safaya et.al. | 2407.02486 | link |
2024-07-02 | RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | Yue Yu et.al. | 2407.02485 | null |
2024-07-02 | MMedAgent: Learning to Use Medical Tools with Multi-modal Agent | Binxu Li et.al. | 2407.02483 | null |
2024-07-02 | Understanding Alignment in Multimodal LLMs: A Comprehensive Study | Elmira Amirloo et.al. | 2407.02477 | null |
2024-07-02 | Open Scene Graphs for Open World Object-Goal Navigation | Joel Loo et.al. | 2407.02473 | null |
2024-07-02 | ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions | Chan Young Park et.al. | 2407.02472 | link |
2024-07-02 | Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I | Harrie Oosterhuis et.al. | 2407.02464 | null |
2024-07-02 | Ensemble of pre-trained language models and data augmentation for hate speech detection from Arabic tweets | Kheir Eddine Daouadi et.al. | 2407.02448 | null |
2024-07-03 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411 | null |
2024-07-02 | CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models | Song Wang et.al. | 2407.02408 | null |
2024-07-02 | Assessing the Code Clone Detection Capability of Large Language Models | Zixian Zhang et.al. | 2407.02402 | null |
2024-07-02 | Learning to Refine with Fine-Grained Natural Language Feedback | Manya Wadhwa et.al. | 2407.02397 | link |
2024-07-02 | Is Your AI-Generated Code Really Secure? Evaluating Large Language Models on Secure Code Generation with CodeSecEval | Jiexin Wang et.al. | 2407.02395 | null |
2024-07-02 | TokenPacker: Efficient Visual Projector for Multimodal LLM | Wentong Li et.al. | 2407.02392 | link |
2024-07-02 | Talking to Machines: do you read me? | Lina M. Rojas-Barahona et.al. | 2407.02354 | null |
2024-07-02 | Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification | Pritish Sahu et.al. | 2407.02352 | null |
2024-07-02 | Generative Large Language Models in Automated Fact-Checking: A Survey | Ivan Vykopal et.al. | 2407.02351 | null |
2024-07-02 | Conceptual Codebook Learning for Vision-Language Models | Yi Zhang et.al. | 2407.02350 | null |
2024-07-02 | MORPHEUS: Modeling Role from Personalized Dialogue History by Exploring and Utilizing Latent Space | Yihong Tang et.al. | 2407.02345 | null |
2024-06-28 | Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | Sukmin Yun et.al. | 2406.20098 | link |
2024-06-28 | LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | Xiang Li et.al. | 2406.20095 | link |
2024-06-28 | Scaling Synthetic Data Creation with 1,000,000,000 Personas | Xin Chan et.al. | 2406.20094 | link |
2024-06-28 | LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression | Jieneng Chen et.al. | 2406.20092 | link |
2024-06-28 | ProgressGym: Alignment with a Millennium of Moral Progress | Tianyi Qiu et.al. | 2406.20087 | null |
2024-06-28 | Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | Yicheng Chen et.al. | 2406.20085 | null |
2024-06-28 | Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | Anisha Gunjal et.al. | 2406.20079 | link |
2024-06-28 | EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | Yuxuan Zhang et.al. | 2406.20076 | link |
2024-06-28 | To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models | Bastien Liétard et.al. | 2406.20054 | null |
2024-06-28 | Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation | Danny Halawi et.al. | 2406.20053 | null |
2024-07-01 | BMW Agents -- A Framework For Task Automation Through Multi-Agent Collaboration | Noel Crawford et.al. | 2406.20041 | null |
2024-06-28 | BioMNER: A Dataset for Biomedical Method Entity Recognition | Chen Tang et.al. | 2406.20038 | null |
2024-06-28 | LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models | Renzhi Wang et.al. | 2406.20030 | null |
2024-06-28 | ToolBeHonest: A Multi-level Hallucination Diagnostic Benchmark for Tool-Augmented Large Language Models | Yuxiang Zhang et.al. | 2406.20015 | link |
2024-06-28 | The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models | Xinyi Chen et.al. | 2406.19999 | link |
2024-06-28 | Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model | Habib Hajimolahoseini et.al. | 2406.19995 | null |
2024-06-28 | ScaleBiO: Scalable Bilevel Optimization for LLM Data Reweighting | Rui Pan et.al. | 2406.19976 | null |
2024-06-28 | STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical | Guohao Sun et.al. | 2406.19973 | null |
2024-06-28 | Into the Unknown: Generating Geospatial Descriptions for New Environments | Tzuf Paz-Argaman et.al. | 2406.19967 | null |
2024-06-28 | Simulating Financial Market via Large Language Model based Agents | Shen Gao et.al. | 2406.19966 | null |
2024-06-27 | ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos | Jr-Jen Chen et.al. | 2406.19392 | link |
2024-06-27 | The Remarkable Robustness of LLMs: Stages of Inference? | Vedang Lad et.al. | 2406.19384 | link |
2024-06-27 | The Model Arena for Cross-lingual Sentiment Analysis: A Comparative Study in the Era of Large Language Models | Xiliang Zhu et.al. | 2406.19358 | null |
2024-06-27 | DiVERT: Distractor Generation with Variational Errors Represented as Text for Math Multiple-choice Questions | Nigel Fernandez et.al. | 2406.19356 | null |
2024-06-27 | Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? | Peter Hase et.al. | 2406.19354 | null |
2024-06-27 | IndoToxic2024: A Demographically-Enriched Dataset of Hate Speech and Toxicity Types for Indonesian Language | Lucky Susanto et.al. | 2406.19349 | null |
2024-06-27 | Jump Starting Bandits with LLM-Generated Prior Knowledge | Parand A. Alamdari et.al. | 2406.19317 | null |
2024-06-27 | MCNC: Manifold Constrained Network Compression | Chayne Thrash et.al. | 2406.19301 | null |
2024-06-27 | From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | Zheyang Xiong et.al. | 2406.19292 | null |
2024-06-27 | PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models | Cathy Mengying Fang et.al. | 2406.19283 | null |
2024-06-27 | HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale | Junying Chen et.al. | 2406.19280 | link |
2024-06-27 | VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation | Yixiao Song et.al. | 2406.19276 | link |
2024-06-27 | AutoPureData: Automated Filtering of Web Data for LLM Fine-tuning | Praneeth Vadlapati et.al. | 2406.19271 | link |
2024-06-27 | Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding | Yue Fan et.al. | 2406.19263 | link |
2024-06-27 | Enhancing Video-Language Representations with Structural Spatio-Temporal Alignment | Hao Fei et.al. | 2406.19255 | null |
2024-06-27 | AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation | Jia Fu et.al. | 2406.19251 | null |
2024-06-27 | Revealing Fine-Grained Values and Opinions in Large Language Models | Dustin Wright et.al. | 2406.19238 | link |
2024-06-28 | FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Shubhankar Singh et.al. | 2406.19237 | null |
2024-06-27 | Seeing Is Believing: Black-Box Membership Inference Attacks Against Retrieval Augmented Generation | Yuying Li et.al. | 2406.19234 | null |
2024-06-28 | RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs | Ekaterina Taktasheva et.al. | 2406.19232 | link |
2024-06-26 | Towards Compositionality in Concept Learning | Adam Stein et.al. | 2406.18534 | link |
2024-06-26 | Symbolic Learning Enables Self-Evolving Agents | Wangchunshu Zhou et.al. | 2406.18532 | link |
2024-06-26 | PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation | Christoph Leiter et.al. | 2406.18528 | link |
2024-06-26 | CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | Zirui Wang et.al. | 2406.18521 | link |
2024-06-26 | "Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | Grace Li et.al. | 2406.18512 | null |
2024-06-26 | WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models | Liwei Jiang et.al. | 2406.18510 | null |
2024-06-26 | Mental Modeling of Reinforcement Learning Agents by Language Models | Wenhao Lu et.al. | 2406.18505 | null |
2024-06-26 | Is In-Context Learning a Type of Gradient-Based Learning? Evidence from the Inverse Frequency Effect in Structural Priming | Zhenghao Zhou et.al. | 2406.18501 | null |
2024-06-26 | Role-Play Zero-Shot Prompting with Large Language Models for Open-Domain Human-Machine Conversation | Ahmed Njifenjou et.al. | 2406.18460 | null |
2024-06-26 | Cascading Large Language Models for Salient Event Graph Generation | Xingwei Tan et.al. | 2406.18449 | link |
2024-06-26 | New intelligent empowerment for digital transformation | Peng Yifeng et.al. | 2406.18440 | null |
2024-06-26 | IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons | Dan Shi et.al. | 2406.18406 | null |
2024-06-26 | Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers | Yibo Jiang et.al. | 2406.18400 | null |
2024-06-26 | Adversarial Search Engine Optimization for Large Language Models | Fredrik Nestaas et.al. | 2406.18382 | null |
2024-06-26 | MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | Haolang Lu et.al. | 2406.18379 | null |
2024-06-26 | Themis: Towards Flexible and Interpretable NLG Evaluation | Xinyu Hu et.al. | 2406.18365 | link |
2024-06-26 | AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations | Adam Dahlgren Lindström et.al. | 2406.18346 | null |
2024-06-26 | PDFA Distillation via String Probability Queries {PDFA Distillation via String Probability Queries} | Robert Baumgartner et.al. | 2406.18328 | link |
2024-06-26 | PaCoST: Paired Confidence Significance Testing for Benchmark Contamination Detection in Large Language Models | Huixuan Zhang et.al. | 2406.18326 | null |
2024-06-26 | MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data | Meng Fang et.al. | 2406.18321 | null |
2024-06-25 | MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | Xiangyu Zhao et.al. | 2406.17770 | link |
2024-06-25 | EXTRACT: Efficient Policy Learning by Extracting Transferrable Robot Skills from Offline Data | Jesse Zhang et.al. | 2406.17768 | null |
2024-06-25 | BMIKE-53: Investigating Cross-Lingual Knowledge Editing with In-Context Learning | Ercong Nie et.al. | 2406.17764 | null |
2024-06-25 | CaLMQA: Exploring culturally specific long-form question answering across 23 languages | Shane Arora et.al. | 2406.17761 | link |
2024-06-25 | Accelerating Clinical Evidence Synthesis with Large Language Models | Zifeng Wang et.al. | 2406.17755 | null |
2024-06-25 | Measuring and Benchmarking Large Language Models' Capabilities to Generate Persuasive Language | Amalie Brogaard Pauli et.al. | 2406.17753 | null |
2024-06-25 | Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon | USVSN Sai Prashanth et.al. | 2406.17746 | link |
2024-06-25 | Point-SAM: Promptable 3D Segmentation Model for Point Clouds | Yuchen Zhou et.al. | 2406.17741 | link |
2024-06-25 | Find Parent then Label Children: A Two-stage Taxonomy Completion Method with Pre-trained Language Model | Fei Xia et.al. | 2406.17739 | null |
2024-06-25 | LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users | Elinor Poole-Dayan et.al. | 2406.17737 | null |
2024-06-25 | FedBiOT: LLM Local Fine-tuning in Federated Learning without Full Model | Feijie Wu et.al. | 2406.17706 | link |
2024-06-25 | From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | Thom Lake et.al. | 2406.17692 | link |
2024-06-25 | VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation | Kun Qian et.al. | 2406.17681 | link |
2024-06-25 | Quantifying AI Psychology: A Psychometrics Benchmark for Large Language Models | Yuan Li et.al. | 2406.17675 | null |
2024-06-25 | LaTable: Towards Large Tabular Models | Boris van Breugel et.al. | 2406.17673 | null |
2024-06-25 | LLM-ARC: Enhancing LLMs with an Automated Reasoning Critic | Aditya Kalyanpur et.al. | 2406.17663 | null |
2024-06-25 | Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | Aashiq Muhamed et.al. | 2406.17660 | link |
2024-06-25 | DKPROMPT: Domain Knowledge Prompting Vision-Language Models for Open-World Planning | Xiaohan Zhang et.al. | 2406.17659 | null |
2024-06-25 | Leveraging Large Language Models for Software Model Completion: Results from Industrial and Public Datasets | Christof Tinnes et.al. | 2406.17651 | null |
2024-06-25 | Variationist: Exploring Multifaceted Variation and Bias in Written Language Data | Alan Ramponi et.al. | 2406.17647 | link |
2024-06-24 | Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs | Shengbang Tong et.al. | 2406.16860 | link |
2024-06-24 | EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | Yuhui Li et.al. | 2406.16858 | link |
2024-06-24 | Long Context Transfer from Language to Vision | Peiyuan Zhang et.al. | 2406.16852 | link |
2024-06-24 | Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts | Aditya Sharma et.al. | 2406.16851 | null |
2024-06-24 | RaTEScore: A Metric for Radiology Report Generation | Weike Zhao et.al. | 2406.16845 | null |
2024-06-24 | From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models | Sean Welleck et.al. | 2406.16838 | null |
2024-06-24 | USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long |
Mounika Marreddy et.al. | 2406.16833 | null |
2024-06-24 | Understanding and Mitigating Tokenization Bias in Language Models | Buu Phan et.al. | 2406.16829 | null |
2024-06-24 | Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | Ronak Pradeep et.al. | 2406.16828 | link |
2024-06-24 | GPT-4V Explorations: Mining Autonomous Driving | Zixuan Li et.al. | 2406.16817 | null |
2024-06-24 | RES-Q: Evaluating Code-Editing Large Language Model Systems at the Repository Scale | Beck LaBash et.al. | 2406.16801 | link |
2024-06-24 | Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | Ashwinee Panda et.al. | 2406.16797 | link |
2024-06-24 | Adam-mini: Use Fewer Learning Rates To Gain More | Yushun Zhang et.al. | 2406.16793 | link |
2024-06-24 | M2Lingual: Enhancing Multilingual, Multi-Turn Instruction Alignment in Large Language Models | Rishabh Maheshwary et.al. | 2406.16783 | null |
2024-06-24 | It Is Not About What You Say, It Is About How You Say It: A Surprisingly Simple Approach for Improving Reading Comprehension | Sagi Shaier et.al. | 2406.16779 | null |
2024-06-24 | Finding Transformer Circuits with Edge Pruning | Adithya Bhaskar et.al. | 2406.16778 | link |
2024-06-24 | Blending LLMs into Cascaded Speech Translation: KIT's Offline Speech Translation System for IWSLT 2024 | Sai Koneru et.al. | 2406.16777 | null |
2024-06-24 | WARP: On the Benefits of Weight Averaged Rewarded Policies | Alexandre Ramé et.al. | 2406.16768 | null |
2024-06-24 | The GPT-WritingPrompts Dataset: A Comparative Analysis of Character Portrayal in Short Stories | Xi Yu Huang et.al. | 2406.16767 | link |
2024-06-24 | Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | Euiin Yi et.al. | 2406.16758 | link |
2024-06-21 | GenoTEX: A Benchmark for Evaluating LLM-Based Exploration of Gene Expression Data in Alignment with Bioinformaticians | Haoyang Liu et.al. | 2406.15341 | link |
2024-06-21 | Gradient-Mask Tuning Elevates the Upper Limits of LLM Performance | Haoling Li et.al. | 2406.15330 | null |
2024-06-21 | Bug In the Code Stack: Can LLMs Find Bugs in Large Python Code Stacks | Hokyung Lee et.al. | 2406.15325 | link |
2024-06-21 | Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model | Doyoung Kim et.al. | 2406.15275 | null |
2024-06-21 | Towards Fine-Grained Citation Evaluation in Generated Text: A Comparative Analysis of Faithfulness Metrics | Weijia Zhang et.al. | 2406.15264 | null |
2024-06-21 | Unsupervised Morphological Tree Tokenizer | Qingyang Zhu et.al. | 2406.15245 | null |
2024-06-21 | Large Batch Analysis for Adagrad Under Anisotropic Smoothness | Yuxing Liu et.al. | 2406.15244 | null |
2024-06-21 | Detecting Synthetic Lyrics with Few-Shot Inference | Yanis Labrak et.al. | 2406.15231 | null |
2024-06-21 | A LLM-Based Ranking Method for the Evaluation of Automatic Counter-Narrative Generation | Irune Zubiaga et.al. | 2406.15227 | null |
2024-06-21 | Unsupervised Extraction of Dialogue Policies from Conversations | Makesh Narsimhan Sreedhar et.al. | 2406.15214 | null |
2024-06-21 | Prompting Whisper for QA-driven Zero-shot End-to-end Spoken Language Understanding | Mohan Li et.al. | 2406.15209 | null |
2024-06-21 | Exploring the Efficacy of Robotic Assistants with ChatGPT and Claude in Enhancing ADHD Therapy: Innovating Treatment Paradigms | Santiago Berrezueta-Guzman et.al. | 2406.15198 | null |
2024-06-21 | UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis | Yulong Hui et.al. | 2406.15187 | link |
2024-06-21 | Hybrid Alignment Training for Large Language Models | Chenglong Wang et.al. | 2406.15178 | link |
2024-06-21 | EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot | Hao Fei et.al. | 2406.15177 | link |
2024-06-21 | Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss | Wei He et.al. | 2406.15175 | null |
2024-06-21 | Évaluation des capacités de réponse de larges modèles de langage (LLM) pour des questions d'historiens | Mathieu Chartier et.al. | 2406.15173 | null |
2024-06-21 | Assessing Good, Bad and Ugly Arguments Generated by ChatGPT: a New Dataset, its Methodology and Associated Tasks | Victor Hugo Nascimento Rocha et.al. | 2406.15130 | link |
2024-06-21 | Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network | Badr AlKhamissi et.al. | 2406.15109 | link |
2024-06-21 | PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data | Ishaan Watts et.al. | 2406.15053 | null |
2024-06-20 | Model Merging and Safety Alignment: One Bad Model Spoils the Bunch | Hasan Abed Al Kader Hammoud et.al. | 2406.14563 | null |
2024-06-20 | Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Sachit Menon et.al. | 2406.14562 | null |
2024-06-20 | How to Compute the Probability of a Word | Tiago Pimentel et.al. | 2406.14561 | null |
2024-06-21 | Asynchronous Large Language Model Enhanced Planner for Autonomous Driving | Yuan Chen et.al. | 2406.14556 | null |
2024-06-20 | GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models | Shilong Li et.al. | 2406.14550 | null |
2024-06-20 | Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models | Sunny Duan et.al. | 2406.14549 | null |
2024-06-20 | Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein et.al. | 2406.14546 | link |
2024-06-20 | Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems | Đorđe Klisura et.al. | 2406.14545 | null |
2024-06-20 | Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | Yuxuan Qiao et.al. | 2406.14544 | link |
2024-06-20 | Are LLMs Naturally Good at Synthetic Tabular Data Generation? | Shengzhe Xu et.al. | 2406.14541 | link |
2024-06-20 | PostMark: A Robust Blackbox Watermark for Large Language Models | Yapei Chang et.al. | 2406.14517 | link |
2024-06-20 | MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | Xinyu Fang et.al. | 2406.14515 | link |
2024-06-20 | Evidence of a log scaling law for political persuasion with large language models | Kobi Hackenburg et.al. | 2406.14508 | link |
2024-06-20 | Overview of the CAIL 2023 Argument Mining Track | Jingcong Liang et.al. | 2406.14503 | null |
2024-06-20 | Improving Expert Radiology Report Summarization by Prompting Large Language Models with a Layperson Summary | Xingmeng Zhao et.al. | 2406.14500 | null |
2024-06-20 | LLaSA: Large Multimodal Agent for Human Activity Analysis Through Wearable Sensors | Sheikh Asif Imran et.al. | 2406.14498 | link |
2024-06-20 | CodeRAG-Bench: Can Retrieval Augment Code Generation? | Zora Zhiruo Wang et.al. | 2406.14497 | link |
2024-06-20 | African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | Gregor Geigle et.al. | 2406.14496 | link |
2024-06-20 | Does Object Grounding Really Reduce Hallucination of Large Vision-Language Models? | Gregor Geigle et.al. | 2406.14492 | null |
2024-06-20 | Instruction Pre-Training: Language Models are Supervised Multitask Learners | Daixuan Cheng et.al. | 2406.14491 | link |
2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846 | null |
2024-06-18 | Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Haoxiang Wang et.al. | 2406.12845 | link |
2024-06-18 | Synergizing Foundation Models and Federated Learning: A Survey | Shenghui Li et.al. | 2406.12844 | null |
2024-06-18 | GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation | Ci-Siang Lin et.al. | 2406.12834 | null |
2024-06-18 | LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation | Seyedarmin Azizi et.al. | 2406.12832 | link |
2024-06-18 | What Are the Odds? Language Models Are Capable of Probabilistic Reasoning | Akshay Paruchuri et.al. | 2406.12830 | null |
2024-06-18 | From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries | Hitesh Wadhwa et.al. | 2406.12824 | null |
2024-06-18 | Is It Good Data for Multilingual Instruction Tuning or Just Bad Multilingual Evaluation for Large Language Models? | Pinzhen Chen et.al. | 2406.12822 | null |
2024-06-18 | Adversarial Attacks on Multimodal Agents | Chen Henry Wu et.al. | 2406.12814 | link |
2024-06-18 | Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? | Zhe Yang et.al. | 2406.12809 | null |
2024-06-18 | Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents | Zehao Wang et.al. | 2406.12806 | null |
2024-06-18 | Supporting Human Raters with the Detection of Harmful Content using Large Language Models | Kurt Thomas et.al. | 2406.12800 | null |
2024-06-18 | ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | Team GLM et.al. | 2406.12793 | link |
2024-06-18 | In-Context Learning of Energy Functions | Rylan Schaeffer et.al. | 2406.12785 | null |
2024-06-18 | UBENCH: Benchmarking Uncertainty in Large Language Models with Multiple Choice Questions | Xunzhi Wang et.al. | 2406.12784 | link |
2024-06-18 | Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries | Eden Biran et.al. | 2406.12775 | link |
2024-06-18 | Towards Exact Gradient-based Training on Analog In-memory Computing | Zhaoxian Wu et.al. | 2406.12774 | null |
2024-06-18 | GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping | Angel Daruna et.al. | 2406.12756 | null |
2024-06-18 | OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | Zhen Huang et.al. | 2406.12753 | link |
2024-06-18 | Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | Bingchen Zhao et.al. | 2406.12742 | link |
2024-06-17 | LLaNA: Large Language and NeRF Assistant | Andrea Amaduzzi et.al. | 2406.11840 | null |
2024-06-17 | mDPO: Conditional Preference Optimization for Multimodal Large Language Models | Fei Wang et.al. | 2406.11839 | null |
2024-06-17 | MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | Ziyu Liu et.al. | 2406.11833 | link |
2024-06-17 | Unveiling Encoder-Free Vision-Language Models | Haiwen Diao et.al. | 2406.11832 | link |
2024-06-17 | Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models | Bingqi Ma et.al. | 2406.11831 | null |
2024-06-17 | Language Modeling with Editable External Knowledge | Belinda Z. Li et.al. | 2406.11830 | link |
2024-06-17 | WPO: Enhancing RLHF with Weighted Preference Optimization | Wenxuan Zhou et.al. | 2406.11827 | link |
2024-06-17 | On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning | Geewook Kim et.al. | 2406.11823 | link |
2024-06-17 | MegaScenes: Scene-Level View Synthesis at Scale | Joseph Tung et.al. | 2406.11819 | null |
2024-06-17 | Embodied Instruction Following in Unknown Environments | Zhenyu Wu et.al. | 2406.11818 | null |
2024-06-17 | Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level | Jie Liu et.al. | 2406.11817 | null |
2024-06-17 | VideoLLM-online: Online Video Large Language Model for Streaming Video | Joya Chen et.al. | 2406.11816 | null |
2024-06-17 | How Do Large Language Models Acquire Factual Knowledge During Pretraining? | Hoyeon Chang et.al. | 2406.11813 | null |
2024-06-17 | RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content | Joao Monteiro et.al. | 2406.11811 | null |
2024-06-17 | Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations | Rima Hazra et.al. | 2406.11801 | link |
2024-06-17 | DataComp-LM: In search of the next generation of training sets for language models | Jeffrey Li et.al. | 2406.11794 | null |
2024-06-17 | CELL your Model: Contrastive Explanation Methods for Large Language Models | Ronny Luss et.al. | 2406.11785 | null |
2024-06-17 | Split, Unlearn, Merge: Leveraging Data Attributes for More Effective Unlearning in LLMs | Swanand Ravindra Kadhe et.al. | 2406.11780 | null |
2024-06-17 | Improving Multi-Agent Debate with Sparse Communication Topology | Yunxuan Li et.al. | 2406.11776 | null |
2024-06-17 | Task Me Anything | Jieyu Zhang et.al. | 2406.11775 | link |
2024-06-14 | Quantifying Variance in Evaluation Benchmarks | Lovish Madaan et.al. | 2406.10229 | null |
2024-06-14 | EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models | Julian Straub et.al. | 2406.10224 | null |
2024-06-14 | Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding | Ridouane Ghermi et.al. | 2406.10221 | null |
2024-06-14 | Semantic Membership Inference Attack against Large Language Models | Hamid Mozaffari et.al. | 2406.10218 | null |
2024-06-14 | Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs | Rui Yang et.al. | 2406.10216 | null |
2024-06-14 | DevBench: A multimodal developmental benchmark for language learning | Alvin Wei Ming Tan et.al. | 2406.10215 | null |
2024-06-14 | Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | Abhimanyu Hans et.al. | 2406.10209 | link |
2024-06-14 | A Fundamental Trade-off in Aligned Language Models and its Relation to Sampling Adaptors | Naaman Tan et.al. | 2406.10203 | link |
2024-06-14 | TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners | Tomas de la Rosa et.al. | 2406.10196 | null |
2024-06-14 | Detecting and Evaluating Medical Hallucinations in Large Vision Language Models | Jiawei Chen et.al. | 2406.10185 | null |
2024-06-14 | Practical offloading for fine-tuning LLM on commodity GPU via learned subspace projectors | Siyuan Chen et.al. | 2406.10181 | null |
2024-06-14 | Let the Poem Hit the Rhythm: Using a Byte-Based Transformer for Beat-Aligned Poetry Generation | Mohamad Elzohbi et.al. | 2406.10174 | link |
2024-06-14 | IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce | Wenxuan Ding et.al. | 2406.10173 | link |
2024-06-14 | Datasets for Multilingual Answer Sentence Selection | Matteo Gabburo et.al. | 2406.10172 | null |
2024-06-14 | CarLLaVA: Vision language models for camera-only closed-loop driving | Katrin Renz et.al. | 2406.10165 | null |
2024-06-14 | Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | Carson Denison et.al. | 2406.10162 | link |
2024-06-14 | RoboGolf: Mastering Real-World Minigolf with a Reflective Multi-Modality Vision-Language Model | Hantao Zhou et.al. | 2406.10157 | null |
2024-06-14 | BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | Yuri Kuratov et.al. | 2406.10149 | link |
2024-06-14 | Evaluation of Large Language Models: STEM education and Gender Stereotypes | Smilla Due et.al. | 2406.10133 | null |
2024-06-14 | The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models | Yan Liu et.al. | 2406.10130 | link |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
2024-06-13 | Explore the Limits of Omni-modal Pretraining at Scale | Yiyuan Zhang et.al. | 2406.09412 | link |
2024-06-13 | 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | Roman Bachmann et.al. | 2406.09406 | null |
2024-06-13 | Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Yushi Hu et.al. | 2406.09403 | null |
2024-06-13 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Junke Wang et.al. | 2406.09399 | link |
2024-06-13 | Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms | Miaosen Zhang et.al. | 2406.09397 | null |
2024-06-13 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396 | link |
2024-06-13 | Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition | Youngtaek Oh et.al. | 2406.09388 | link |
2024-06-13 | Towards Vision-Language Geo-Foundation Model: A Survey | Yue Zhou et.al. | 2406.09385 | link |
2024-06-13 | Reflecting on the State of Rehearsal-free Continual Learning with Pretrained Models | Lukas Thede et.al. | 2406.09384 | null |
2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
2024-06-13 | ElicitationGPT: Text Elicitation Mechanisms via Language Models | Yifan Wu et.al. | 2406.09363 | null |
2024-06-13 | Enhancing Domain Adaptation through Prompt Gradient Alignment | Hoang Phan et.al. | 2406.09353 | null |
2024-06-13 | Separations in the Representational Capabilities of Transformers and Recurrent Architectures | Satwik Bhattamishra et.al. | 2406.09347 | null |
2024-06-13 | DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding | Suwon Shon et.al. | 2406.09345 | null |
2024-06-13 | ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models | David Anugraha et.al. | 2406.09334 | link |
2024-06-13 | REVS: Unlearning Sensitive Information in Language Models via Rank Editing in the Vocabulary Space | Tomer Ashuach et.al. | 2406.09325 | null |
2024-06-13 | Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Zhao Xu et.al. | 2406.09324 | link |
2024-06-13 | JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models | Delong Ran et.al. | 2406.09321 | link |
2024-06-13 | Common and Rare Fundus Diseases Identification Using Vision-Language Foundation Model with Knowledge of Over 400 Diseases | Meng Wang et.al. | 2406.09317 | link |
2024-06-12 | What If We Recaption Billions of Web Images with LLaMA-3? | Xianhang Li et.al. | 2406.08478 | null |
2024-06-12 | Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | Ting-Ji Huang et.al. | 2406.08477 | null |
2024-06-12 | Real2Code: Reconstruct Articulated Objects via Code Generation | Zhao Mandi et.al. | 2406.08474 | null |
2024-06-12 | PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences | Daiwei Chen et.al. | 2406.08469 | null |
2024-06-12 | Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | Zhangchen Xu et.al. | 2406.08464 | link |
2024-06-12 | AToM-Bot: Embodied Fulfillment of Unspoken Human Needs with Affective Theory of Mind | Wei Ding et.al. | 2406.08455 | null |
2024-06-12 | OLMES: A Standard for Language Model Evaluations | Yuling Gu et.al. | 2406.08446 | null |
2024-06-12 | SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models | Chun Yin et.al. | 2406.08445 | null |
2024-06-12 | TasTe: Teaching Large Language Models to Translate through Self-Reflection | Yutong Wang et.al. | 2406.08434 | link |
2024-06-12 | Next-Generation Database Interfaces: A Survey of LLM-based Text-to-SQL | Zijin Hong et.al. | 2406.08426 | null |
2024-06-12 | OmniCorpus: An Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | Qingyun Li et.al. | 2406.08418 | link |
2024-06-12 | Discovering Preference Optimization Algorithms with and for Large Language Models | Chris Lu et.al. | 2406.08414 | link |
2024-06-12 | Memory Is All You Need: An Overview of Compute-in-Memory Architectures for Accelerating Large Language Model Inference | Christopher Wolters et.al. | 2406.08413 | null |
2024-06-13 | MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos | Xuehai He et.al. | 2406.08407 | link |
2024-06-12 | Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models | Chun-Yi Kuan et.al. | 2406.08402 | link |
2024-06-12 | cPAPERS: A Dataset of Situated and Multimodal Interactive Conversations in Scientific Papers | Anirudh Sundar et.al. | 2406.08398 | null |
2024-06-12 | VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jiannan Wu et.al. | 2406.08394 | link |
2024-06-12 | Large Language Models Must Be Taught to Know What They Don't Know | Sanyam Kapoor et.al. | 2406.08391 | link |
2024-06-12 | Banal Deception Human-AI Ecosystems: A Study of People's Perceptions of LLM-generated Deceptive Behaviour | Xiao Zhan et.al. | 2406.08386 | null |
2024-06-13 | APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation | Weizhao He et.al. | 2406.08372 | null |
2024-06-11 | A3VLM: Actionable Articulation-Aware Vision Language Model | Siyuan Huang et.al. | 2406.07549 | link |
2024-06-11 | Image and Video Tokenization with Binary Spherical Quantization | Yue Zhao et.al. | 2406.07548 | link |
2024-06-11 | Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arena | Aidar Myrzakhan et.al. | 2406.07545 | link |
2024-06-11 | QuickLLaMA: Query-aware Inference Acceleration for Large Language Models | Jingyao Li et.al. | 2406.07528 | link |
2024-06-11 | Simple and Effective Masked Diffusion Language Models | Subham Sekhar Sahoo et.al. | 2406.07524 | link |
2024-06-11 | Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling | Liliang Ren et.al. | 2406.07522 | link |
2024-06-11 | Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | Yunzhen Feng et.al. | 2406.07515 | null |
2024-06-11 | THaLLE: Text Hyperlocally Augmented Large Language Extension -- Technical Report | KBTG Labs et.al. | 2406.07505 | null |
2024-06-11 | Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions | Renjie Pi et.al. | 2406.07502 | link |
2024-06-11 | TextGrad: Automatic "Differentiation" via Text | Mert Yuksekgonul et.al. | 2406.07496 | link |
2024-06-11 | CADS: A Systematic Literature Review on the Challenges of Abstractive Dialogue Summarization | Frederic Kirstein et.al. | 2406.07494 | null |
2024-06-11 | Paraphrasing in Affirmative Terms Improves Negation Understanding | MohammadHossein Rezaei et.al. | 2406.07492 | null |
2024-06-11 | PITCH: Productivity and Mental Well-being Coaching through Daily Conversational Interaction | Adnan Abbas et.al. | 2406.07485 | null |
2024-06-11 | Advancing Annotation of Stance in Social Media Posts: A Comparative Analysis of Large Language Models and Crowd Sourcing | Mao Li et.al. | 2406.07483 | null |
2024-06-11 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476 | link |
2024-06-11 | Anomaly Detection on Unstable Logs with GPT Models | Fatemeh Hadadi et.al. | 2406.07467 | null |
2024-06-11 | Estimating the Hallucination Rate of Generative AI | Andrew Jesson et.al. | 2406.07457 | null |
2024-06-11 | Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis | Qining Zhang et.al. | 2406.07455 | null |
2024-06-11 | On the Robustness of Document-Level Relation Extraction Models to Entity Name Variations | Shiao Meng et.al. | 2406.07444 | link |
2024-06-11 | McEval: Massively Multilingual Code Evaluation | Linzheng Chai et.al. | 2406.07436 | null |
2024-06-10 | Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Peize Sun et.al. | 2406.06525 | link |
2024-06-10 | UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | Shivani Upadhyay et.al. | 2406.06519 | link |
2024-06-10 | Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Louis Blankemeier et.al. | 2406.06512 | null |
2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | null |
2024-06-10 | Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation | Oishi Banerjee et.al. | 2406.06496 | null |
2024-06-10 | Can Language Models Serve as Text-Based World Simulators? | Ruoyao Wang et.al. | 2406.06485 | null |
2024-06-10 | Parallelizing Linear Transformers with the Delta Rule over Sequence Length | Songlin Yang et.al. | 2406.06484 | link |
2024-06-10 | Towards a Personal Health Large Language Model | Justin Cosentino et.al. | 2406.06474 | null |
2024-06-10 | AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction | Zhen Xing et.al. | 2406.06465 | null |
2024-06-10 | Transforming Wearable Data into Health Insights using Large Language Model Agents | Mike A. Merrill et.al. | 2406.06464 | null |
2024-06-10 | VCR: Visual Caption Restoration | Tianyu Zhang et.al. | 2406.06462 | link |
2024-06-11 | Reasoning in Token Economies: Budget-Aware Evaluation of LLM Reasoning Strategies | Junlin Wang et.al. | 2406.06461 | null |
2024-06-10 | Evaluating the Retrieval Component in LLM-Based Question Answering Systems | Ashkan Alinejad et.al. | 2406.06458 | null |
2024-06-10 | A Large Language Model Pipeline for Breast Cancer Oncology | Tristen Pool et.al. | 2406.06455 | null |
2024-06-10 | Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course | Aadarsh Padiyath et.al. | 2406.06451 | null |
2024-06-10 | LLM Dataset Inference: Did you train on my dataset? | Pratyush Maini et.al. | 2406.06443 | link |
2024-06-10 | Interpretability of Language Models via Task Spaces | Lucas Weber et.al. | 2406.06441 | null |
2024-06-10 | Language Models are Alignable Decision-Makers: Dataset and Application to the Medical Triage Domain | Brian Hu et.al. | 2406.06435 | link |
2024-06-10 | Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking | Gabriel Rioux et.al. | 2406.06425 | null |
2024-06-10 | An Empirical Design Justice Approach to Identifying Ethical Considerations in the Intersection of Large Language Models and Social Robotics | Alva Markelius et.al. | 2406.06400 | null |
2024-06-07 | 3D-GRAND: Towards Better Grounding and Less Hallucination for 3D-LLMs | Jianing Yang et.al. | 2406.05132 | link |
2024-06-07 | An Empirical Study on Parameter-Efficient Fine-Tuning for MultiModal Large Language Models | Xiongtao Zhou et.al. | 2406.05130 | null |
2024-06-07 | Towards Semantic Equivalence of Tokenization in Multimodal LLM | Shengqiong Wu et.al. | 2406.05127 | null |
2024-06-07 | Large Generative Graph Models | Yu Wang et.al. | 2406.05109 | null |
2024-06-07 | LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration | Tavor Lipman et.al. | 2406.05107 | null |
2024-06-07 | Corpus Poisoning via Approximate Greedy Gradient Descent | Jinyan Su et.al. | 2406.05087 | link |
2024-06-07 | Multi-Head RAG: Solving Multi-Aspect Problems with LLMs | Maciej Besta et.al. | 2406.05085 | link |
2024-06-07 | SUMIE: A Synthetic Benchmark for Incremental Entity Summarization | Eunjeong Hwang et.al. | 2406.05079 | null |
2024-06-07 | Are Large Language Models More Empathetic than Humans? | Anuradha Welivita et.al. | 2406.05063 | null |
2024-06-07 | Robustness Assessment of Mathematical Reasoning in the Presence of Missing and Contradictory Conditions | Shi-Yu Tian et.al. | 2406.05055 | null |
2024-06-07 | Hints-In-Browser: Benchmarking Language Models for Programming Feedback Generation | Nachiket Kotalwar et.al. | 2406.05053 | null |
2024-06-07 | Bootstrapping Referring Multi-Object Tracking | Yani Zhang et.al. | 2406.05039 | link |
2024-06-07 | Scenarios and Approaches for Situated Natural Language Explanations | Pengshuo Qiu et.al. | 2406.05035 | null |
2024-06-07 | CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search | Fengran Mo et.al. | 2406.05013 | link |
2024-06-07 | Compositional Generalization with Grounded Language Models | Sondre Wold et.al. | 2406.04989 | link |
2024-06-07 | Language models emulate certain cognitive profiles: An investigation of how predictability measures interact with individual differences | Patrick Haller et.al. | 2406.04988 | null |
2024-06-07 | MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jitai Hao et.al. | 2406.04984 | link |
2024-06-07 | CityCraft: A Real Crafter for 3D City Generation | Jie Deng et.al. | 2406.04983 | null |
2024-06-07 | Quantifying Geospatial in the Common Crawl Corpus | Ilya Ilyankou et.al. | 2406.04952 | null |
2024-06-07 | BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense | Baktash Ansari et.al. | 2406.04947 | link |
2024-06-06 | Verbalized Machine Learning: Revisiting Machine Learning with Language Models | Tim Z. Xiao et.al. | 2406.04344 | null |
2024-06-06 | Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single Image | Stanislaw Szymanowicz et.al. | 2406.04343 | null |
2024-06-06 | Learning 1D Causal Visual Representation with De-focus Attention Networks | Chenxin Tao et.al. | 2406.04342 | link |
2024-06-06 | RoboMamba: Multimodal State Space Model for Efficient Robot Reasoning and Manipulation | Jiaming Liu et.al. | 2406.04339 | null |
2024-06-06 | Coherent Zero-Shot Visual Instruction Generation | Quynh Phung et.al. | 2406.04337 | null |
2024-06-06 | DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs | Lingchen Meng et.al. | 2406.04334 | null |
2024-06-06 | PaCE: Parsimonious Concept Engineering for Large Language Models | Jinqi Luo et.al. | 2406.04331 | link |
2024-06-06 | Parameter-Inverted Image Pyramid Networks | Xizhou Zhu et.al. | 2406.04330 | link |
2024-06-06 | Simplified and Generalized Masked Diffusion for Discrete Data | Jiaxin Shi et.al. | 2406.04329 | null |
2024-06-06 | Causal Estimation of Memorisation Profiles | Pietro Lesci et.al. | 2406.04327 | link |
2024-06-06 | ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Lin Chen et.al. | 2406.04325 | null |
2024-06-06 | Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step | Zhanhao Liang et.al. | 2406.04314 | null |
2024-06-06 | Improving Alignment and Robustness with Short Circuiting | Andy Zou et.al. | 2406.04313 | link |
2024-06-06 | Semantically Diverse Language Generation for Uncertainty Estimation in Language Models | Lukas Aichberger et.al. | 2406.04306 | link |
2024-06-06 | Quixer: A Quantum Transformer Model | Nikhil Khatri et.al. | 2406.04305 | null |
2024-06-06 | Text-to-Drive: Diverse Driving Behavior Synthesis via Large Language Models | Phat Nguyen et.al. | 2406.04300 | null |
2024-06-06 | VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval | Junjie Zhou et.al. | 2406.04292 | link |
2024-06-06 | Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation | Adam Fisch et.al. | 2406.04291 | null |
2024-06-07 | What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages | Nadav Borenstein et.al. | 2406.04289 | null |
2024-06-06 | Characterizing Similarities and Divergences in Conversational Tones in Humans and LLMs by Sampling with People | Dun-Ming Huang et.al. | 2406.04278 | link |
2024-06-05 | Wings: Learning Multimodal LLMs without Text-only Forgetting | Yi-Kai Zhang et.al. | 2406.03496 | null |
2024-06-06 | Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | Ao Sun et.al. | 2406.03488 | link |
2024-06-05 | Analyzing LLM Behavior in Dialogue Summarization: Unveiling Circumstantial Hallucination Trends | Sanjana Ramprasad et.al. | 2406.03487 | null |
2024-06-05 | BIPED: Pedagogically Informed Tutoring System for ESL Education | Soonwoo Kwon et.al. | 2406.03486 | null |
2024-06-05 | Does your data spark joy? Performance gains from domain upsampling at the end of training | Cody Blakeney et.al. | 2406.03476 | null |
2024-06-05 | AD-H: Autonomous Driving with Hierarchical Agents | Zaibin Zhang et.al. | 2406.03474 | null |
2024-06-05 | What is the Best Way for ChatGPT to Translate Poetry? | Shanshan Wang et.al. | 2406.03450 | null |
2024-06-05 | Pre-trained Large Language Models Use Fourier Features to Compute Addition | Tianyi Zhou et.al. | 2406.03445 | null |
2024-06-05 | Are language models rational? The case of coherence norms and belief revision | Thomas Hofweber et.al. | 2406.03442 | null |
2024-06-05 | Cycles of Thought: Measuring LLM Confidence through Stable Explanations | Evan Becker et.al. | 2406.03441 | null |
2024-06-05 | Computation-Efficient Era: A Comprehensive Survey of State Space Models in Medical Image Analysis | Moein Heidari et.al. | 2406.03430 | link |
2024-06-05 | Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach | Saehyung Lee et.al. | 2406.03411 | link |
2024-06-05 | Automating Turkish Educational Quiz Generation Using Large Language Models | Kamyar Zeinalipour et.al. | 2406.03397 | link |
2024-06-05 | Log Parsing with Self-Generated In-Context Learning and Self-Correction | Yifan Wu et.al. | 2406.03376 | null |
2024-06-05 | IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | David Ifeoluwa Adelani et.al. | 2406.03368 | null |
2024-06-05 | CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning | Xinrui Lin et.al. | 2406.03367 | null |
2024-06-05 | LLM-based Rewriting of Inappropriate Argumentation using Reinforcement Learning from Machine Feedback | Timon Ziegenbein et.al. | 2406.03363 | null |
2024-06-05 | Save It for the "Hot" Day: An LLM-Empowered Visual Analytics System for Heat Risk Management | Haobo Li et.al. | 2406.03317 | null |
2024-06-05 | The Good, the Bad, and the Hulk-like GPT: Analyzing Emotional Decisions of Large Language Models in Cooperation and Bargaining Games | Mikhail Mozikov et.al. | 2406.03299 | null |
2024-06-05 | SpikeLM: Towards General Spike-Driven Language Modeling via Elastic Bi-Spiking Mechanisms | Xingrun Xing et.al. | 2406.03287 | link |
2024-06-04 | Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks | Tianyu He et.al. | 2406.02550 | link |
2024-06-04 | Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation | Mohamed El Amine Boudjoghra et.al. | 2406.02548 | link |
2024-06-04 | Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning | Alex Jinpeng Wang et.al. | 2406.02547 | link |
2024-06-04 | To Believe or Not to Believe Your LLM | Yasin Abbasi Yadkori et.al. | 2406.02543 | null |
2024-06-04 | Loki: Low-Rank Keys for Efficient Sparse Attention | Prajwal Singhania et.al. | 2406.02542 | null |
2024-06-04 | Parrot: Multilingual Visual Instruction Tuning | Hai-Long Sun et.al. | 2406.02539 | null |
2024-06-04 | TopViewRS: Vision-Language Models as Top-View Spatial Reasoners | Chengzu Li et.al. | 2406.02537 | link |
2024-06-04 | Mitigate Position Bias in Large Language Models via Scaling a Single Dimension | Yijiong Yu et.al. | 2406.02536 | link |
2024-06-04 | SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices | Ruslan Svirschevski et.al. | 2406.02532 | link |
2024-06-04 | Scalable MatMul-free Language Modeling | Rui-Jie Zhu et.al. | 2406.02528 | link |
2024-06-04 | CheckEmbed: Effective Verification of LLM Solutions to Open-Ended Tasks | Maciej Besta et.al. | 2406.02524 | link |
2024-06-04 | RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots | Soroush Nasiriany et.al. | 2406.02523 | null |
2024-06-04 | Demystifying the Compression of Mixture-of-Experts Through a Unified Framework | Shwai He et.al. | 2406.02500 | link |
2024-06-04 | Hiding Text in Large Language Models: Introducing Unconditional Token Forcing Confusion | Jakub Hoscilowicz et.al. | 2406.02481 | link |
2024-06-04 | Analyzing Temporal Complex Events with Large Language Models? A Benchmark towards Temporal, Long Context Understanding | Zhihan Zhang et.al. | 2406.02472 | null |
2024-06-04 | Meta-Designing Quantum Experiments with Language Models | Sören Arlt et.al. | 2406.02470 | null |
2024-06-04 | Seed-TTS: A Family of High-Quality Versatile Speech Generation Models | Philip Anastassiou et.al. | 2406.02430 | link |
2024-06-04 | Self-Supervised Singing Voice Pre-Training towards Speech-to-Singing Conversion | Ruiqi Li et.al. | 2406.02429 | null |
2024-06-04 | GrootVL: Tree Topology is All You Need in State Space Model | Yicheng Xiao et.al. | 2406.02395 | link |
2024-06-04 | Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | Maxime Griot et.al. | 2406.02394 | link |
2024-05-31 | Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | Chaoyou Fu et.al. | 2405.21075 | null |
2024-05-31 | Code Pretraining Improves Entity Tracking Abilities of Language Models | Najoung Kim et.al. | 2405.21068 | null |
2024-05-31 | Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality | Tri Dao et.al. | 2405.21060 | link |
2024-05-31 | RydbergGPT | David Fitzek et.al. | 2405.21052 | link |
2024-05-31 | Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling | Jiatao Gu et.al. | 2405.21048 | null |
2024-05-31 | Grammar-Aligned Decoding | Kanghee Park et.al. | 2405.21047 | null |
2024-05-31 | Exploratory Preference Optimization: Harnessing Implicit Q-Approximation for Sample-Efficient RLHF* | Tengyang Xie et.al. | 2405.21046 | null |
2024-05-31 | Direct Alignment of Language Models via Quality-Aware Self-Refinement | Runsheng Yu et.al. | 2405.21040 | null |
2024-05-31 | Standards for Belief Representations in LLMs | Daniel A. Herrmann et.al. | 2405.21030 | null |
2024-05-31 | LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | Elias Stengel-Eskin et.al. | 2405.21028 | link |
2024-05-31 | You Only Scan Once: Efficient Multi-dimension Sequential Modeling with LightNet | Zhen Qin et.al. | 2405.21022 | null |
2024-05-31 | Improved Techniques for Optimization-Based Jailbreaking on Large Language Models | Xiaojun Jia et.al. | 2405.21018 | link |
2024-06-03 | StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond | Pengyuan Lyu et.al. | 2405.21013 | null |
2024-05-31 | Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models | Yi Yang et.al. | 2405.20991 | link |
2024-05-31 | DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models | Linli Yao et.al. | 2405.20985 | link |
2024-05-31 | Enhancing Noise Robustness of Retrieval-Augmented Language Models with Adaptive Adversarial Training | Feiteng Fang et.al. | 2405.20978 | link |
2024-05-31 | SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | Tianyang Xu et.al. | 2405.20974 | link |
2024-05-31 | LCQ: Low-Rank Codebook based Quantization for Large Language Models | Wen-Pu Cai et.al. | 2405.20973 | null |
2024-06-03 | Large Language Models are Zero-Shot Next Location Predictors | Ciro Beneduce et.al. | 2405.20962 | link |
2024-06-03 | A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians | Piotr Wojciech Mirowski et.al. | 2405.20956 | null |
2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340 | null |
2024-05-30 | Visual Perception by Large Language Model's Weights | Feipeng Ma et.al. | 2405.20339 | null |
2024-05-30 | Xwin-LM: Strong and Scalable Alignment Practice for LLMs | Bolin Ni et.al. | 2405.20335 | link |
2024-05-31 | ParSEL: Parameterized Shape Editing with Language | Aditya Ganeshan et.al. | 2405.20319 | null |
2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318 | link |
2024-05-30 | ANAH: Analytical Annotation of Hallucinations in Large Language Models | Ziwei Ji et.al. | 2405.20315 | link |
2024-05-30 | Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | Guillaume Huguet et.al. | 2405.20313 | null |
2024-05-30 | Large Language Models Can Self-Improve At Web Agent Tasks | Ajay Patel et.al. | 2405.20309 | link |
2024-05-30 | Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models | Himangi Mittal et.al. | 2405.20305 | null |
2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | link |
2024-05-30 | Who Writes the Review, Human or AI? | Panagiotis C. Theocharopoulos et.al. | 2405.20285 | null |
2024-05-30 | ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections | Massimo Bini et.al. | 2405.20271 | link |
2024-05-30 | Evaluating Large Language Model Biases in Persona-Steered Generation | Andy Liu et.al. | 2405.20253 | link |
2024-05-30 | Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization | Yuchi Liu et.al. | 2405.20252 | link |
2024-05-30 | Retrieval Augmented Structured Generation: Business Document Information Extraction As Tool Use | Franz Louis Cesista et.al. | 2405.20245 | null |
2024-05-30 | Context Injection Attacks on Large Language Models | Cheng'an Wei et.al. | 2405.20234 | null |
2024-05-30 | Data-efficient fine-tuning of foundational models for first-principles quality sublimation enthalpies | Harveen Kaur et.al. | 2405.20217 | null |
2024-05-30 | TS-Align: A Teacher-Student Collaborative Framework for Scalable Iterative Finetuning of Large Language Models | Chen Zhang et.al. | 2405.20215 | null |
2024-05-30 | One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments | Ke Yi et.al. | 2405.20202 | null |
2024-05-31 | Using Large Language Models for Humanitarian Frontline Negotiation: Opportunities and Considerations | Zilin Ma et.al. | 2405.20195 | null |
2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335 | null |
2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | link |
2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328 | null |
2024-05-29 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang et.al. | 2405.19327 | link |
2024-05-29 | Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
2024-05-29 | Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | Minghan Li et.al. | 2405.19325 | null |
2024-05-29 | Are Large Language Models Chameleons? | Mingmeng Geng et.al. | 2405.19323 | null |
2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | null |
2024-05-29 | Robust Preference Optimization through Reward Model Distillation | Adam Fisch et.al. | 2405.19316 | null |
2024-05-29 | Matryoshka Query Transformer for Large Vision-Language Models | Wenbo Hu et.al. | 2405.19315 | link |
2024-05-29 | Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice | Jian-Qiao Zhu et.al. | 2405.19313 | null |
2024-05-29 | Expert-Guided Extinction of Toxic Tokens for Debiased Generation | Xueyao Sun et.al. | 2405.19299 | null |
2024-05-29 | MASSIVE Multilingual Abstract Meaning Representation: A Dataset and Baselines for Hallucination Detection | Michael Regan et.al. | 2405.19285 | null |
2024-05-29 | Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform | Viviane Potocnik et.al. | 2405.19284 | null |
2024-05-29 | Programmable Motion Generation for Open-Set Motion Control Tasks | Hanchao Liu et.al. | 2405.19283 | null |
2024-05-29 | PediatricsGPT: Large Language Models as Chinese Medical Assistants for Pediatric Applications | Dingkang Yang et.al. | 2405.19266 | null |
2024-05-29 | AlchemistCoder: Harmonizing and Eliciting Code Capability by Hindsight Tuning on Multi-source Data | Zifan Song et.al. | 2405.19265 | link |
2024-05-29 | Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models | Zhanhui Zhou et.al. | 2405.19262 | link |
2024-05-28 | Why are Visually-Grounded Language Models Bad at Image Classification? | Yuhui Zhang et.al. | 2405.18415 | link |
2024-05-28 | Don't Forget to Connect! Improving RAG with Graph-based Reranking | Jialin Dong et.al. | 2405.18414 | null |
2024-05-28 | WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization | Jiawei Ma et.al. | 2405.18405 | null |
2024-05-29 | Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | Ethan Shen et.al. | 2405.18400 | link |
2024-05-28 | Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Yixiao Zhang et.al. | 2405.18386 | link |
2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380 | link |
2024-05-28 | LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | Anthony Sarah et.al. | 2405.18377 | null |
2024-05-28 | Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning | Dongjie Chen et.al. | 2405.18376 | link |
2024-05-28 | Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning | Phakphum Artkaew et.al. | 2405.18375 | link |
2024-05-28 | PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework | Eshaan Agarwal et.al. | 2405.18369 | null |
2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
2024-05-28 | Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs | Somnath Kumar et.al. | 2405.18359 | null |
2024-05-28 | MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | Somnath Kumar et.al. | 2405.18358 | null |
2024-05-28 | Faithful Logical Reasoning via Symbolic Chain-of-Thought | Jundong Xu et.al. | 2405.18357 | link |
2024-05-28 | Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography | Jie Liu et.al. | 2405.18356 | link |
2024-05-28 | Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation | Anjanava Biswas et.al. | 2405.18346 | null |
2024-05-28 | The Battle of LLMs: A Comparative Study in Conversational QA Tasks | Aryan Rangapur et.al. | 2405.18344 | null |
2024-05-28 | Frustratingly Easy Test-Time Adaptation of Vision-Language Models | Matteo Farina et.al. | 2405.18330 | link |
2024-05-28 | Multi-modal Generation via Cross-Modal In-Context Learning | Amandeep Kumar et.al. | 2405.18304 | link |
2024-05-28 | Semantic are Beacons: A Semantic Perspective for Unveiling Parameter-Efficient Fine-Tuning in Knowledge Learning | Renzhi Wang et.al. | 2405.18292 | null |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428 | null |
2024-05-27 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang et.al. | 2405.17427 | link |
2024-05-27 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence | Zhuoling Li et.al. | 2405.17424 | null |
2024-05-27 | Privacy-Aware Visual Language Models | Laurens Samson et.al. | 2405.17423 | null |
2024-05-27 | Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | Jiaming Liu et.al. | 2405.17418 | null |
2024-05-27 | THREAD: Thinking Deeper with Recursive Spawning | Philip Schroeder et.al. | 2405.17402 | link |
2024-05-27 | The Expressive Capacity of State Space Models: A Formal Language Perspective | Yash Sarrof et.al. | 2405.17394 | null |
2024-05-27 | MindMerger: Efficient Boosting LLM Reasoning in non-English Languages | Zixian Huang et.al. | 2405.17386 | link |
2024-05-27 | Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective | Zhen Qin et.al. | 2405.17383 | null |
2024-05-27 | ReMoDetect: Reward Models Recognize Aligned LLM's Generations | Hyunseok Lee et.al. | 2405.17382 | null |
2024-05-27 | Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention | Zhen Qin et.al. | 2405.17381 | link |
2024-05-27 | RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects | Ahmed Allam et.al. | 2405.17378 | link |
2024-05-28 | Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models | ShengYun Peng et.al. | 2405.17374 | null |
2024-05-27 | Prompt Optimization with Human Feedback | Xiaoqiang Lin et.al. | 2405.17346 | link |
2024-05-27 | Exploring and steering the moral compass of Large Language Models | Alejandro Tlaie et.al. | 2405.17345 | link |
2024-05-27 | Cost-efficient Knowledge-based Question Answering with Large Language Models | Junnan Dong et.al. | 2405.17337 | null |
2024-05-27 | XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser | Xianfu Cheng et.al. | 2405.17336 | null |
2024-05-27 | FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation | Yuting Ma et.al. | 2405.17267 | null |
2024-05-27 | On the Noise Robustness of In-Context Learning for Text Generation | Hongfu Gao et.al. | 2405.17264 | null |
2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765 | null |
2024-05-24 | Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence | Abhinav Patil et.al. | 2405.15750 | null |
2024-05-24 | Sparse maximal update parameterization: A holistic approach to sparse training dynamics | Nolan Dey et.al. | 2405.15743 | null |
2024-05-24 | Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias | Andres Algaba et.al. | 2405.15739 | link |
2024-05-24 | LM4LV: A Frozen Large Language Model for Low-level Vision Tasks | Boyang Zheng et.al. | 2405.15734 | link |
2024-05-24 | Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks | Jerome Sieber et.al. | 2405.15731 | link |
2024-05-24 | Optimizing Large Language Models for OpenAPI Code Completion | Bohdan Petryshyn et.al. | 2405.15729 | link |
2024-05-24 | Disease-informed Adaptation of Vision-Language Models | Jiajin Zhang et.al. | 2405.15728 | link |
2024-05-24 | The Impact of Geometric Complexity on Neural Collapse in Transfer Learning | Michael Munn et.al. | 2405.15706 | null |
2024-05-24 | Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | Yue Zhang et.al. | 2405.15684 | null |
2024-05-24 | VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap | Sreyan Ghosh et.al. | 2405.15683 | null |
2024-05-24 | What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | Abdelrahman Abdelhamed et.al. | 2405.15668 | null |
2024-05-24 | Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning | Wenhan Chang et.al. | 2405.15662 | null |
2024-05-24 | Simen Gaure et.al. | 2405.15652 | null | |
2024-05-24 | LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots | Ruoyu Wang et.al. | 2405.15646 | null |
2024-05-24 | GECKO: Generative Language Model for English, Code and Korean | Sungwoo Oh et.al. | 2405.15640 | null |
2024-05-24 | M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models | Hongyu Wang et.al. | 2405.15638 | link |
2024-05-24 | GPTZoo: A Large-scale Dataset of GPTs for the Research Community | Xinyi Hou et.al. | 2405.15630 | link |
2024-05-24 | A Comparative Analysis of Distributed Training Strategies for GPT-2 | Ishan Patwardhan et.al. | 2405.15628 | null |
2024-05-24 | Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment | Hao Sun et.al. | 2405.15624 | null |
2024-05-23 | PuzzleAvatar: Assembling 3D Avatars from Personal Albums | Yuliang Xiu et.al. | 2405.14869 | null |
2024-05-23 | A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | Asaf Yehudai et.al. | 2405.14863 | null |
2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862 | null |
2024-05-23 | Not All Language Model Features Are Linear | Joshua Engels et.al. | 2405.14860 | link |
2024-05-23 | PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | Vladimir Malinovskii et.al. | 2405.14852 | link |
2024-05-23 | A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis | Yue Yang et.al. | 2405.14839 | null |
2024-05-23 | From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | Yuntian Deng et.al. | 2405.14838 | link |
2024-05-23 | HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | Bernal Jiménez Gutiérrez et.al. | 2405.14831 | link |
2024-05-23 | Designing A Sustainable Marine Debris Clean-up Framework without Human Labels | Raymond Wang et.al. | 2405.14815 | link |
2024-05-23 | As an AI Language Model, "Yes I Would Recommend Calling the Police'': Norm Inconsistency in LLM Decision-Making | Shomik Jain et.al. | 2405.14812 | null |
2024-05-23 | Implicit Personalization in Language Models: A Systematic Study | Zhijing Jin et.al. | 2405.14808 | link |
2024-05-23 | Can LLMs Solve longer Math Word Problems Better? | Xin Xu et.al. | 2405.14804 | null |
2024-05-23 | Lessons from the Trenches on Reproducible Evaluation of Language Models | Stella Biderman et.al. | 2405.14782 | null |
2024-05-23 | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Peng Wang et.al. | 2405.14768 | link |
2024-05-23 | FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | Hongyang Yang et.al. | 2405.14767 | link |
2024-05-23 | Evaluating Large Language Models for Public Health Classification and Extraction Tasks | Joshua Harris et.al. | 2405.14766 | null |
2024-05-23 | Large language models can be zero-shot anomaly detectors for time series? | Sarah Alnegheimish et.al. | 2405.14755 | null |
2024-05-23 | A Transformer-Based Approach for Smart Invocation of Automatic Code Completion | Aral de Moor et.al. | 2405.14753 | link |
2024-05-23 | MultiCast: Zero-Shot Multivariate Time Series Forecasting Using LLMs | Georgios Chatzigeorgakidis et.al. | 2405.14748 | null |
2024-05-23 | Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View | Xuan Liu et.al. | 2405.14744 | null |
2024-05-21 | Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | William Brandon et.al. | 2405.12981 | null |
2024-05-21 | OmniGlue: Generalizable Feature Matching with Foundation Model Guidance | Hanwen Jiang et.al. | 2405.12979 | link |
2024-05-21 | BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once | Theodore Zhao et.al. | 2405.12971 | null |
2024-05-21 | Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale | Shriram Chennakesavalu et.al. | 2405.12961 | link |
2024-05-21 | Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | Zhangyue Yin et.al. | 2405.12939 | link |
2024-05-21 | Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | Bilgehan Sel et.al. | 2405.12933 | null |
2024-05-21 | Code-mixed Sentiment and Hate-speech Prediction | Anjali Yadav et.al. | 2405.12929 | null |
2024-05-21 | Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples | Tim Menzies et.al. | 2405.12920 | link |
2024-05-21 | G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation | Xingyuan Pan et.al. | 2405.12915 | link |
2024-05-21 | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation | Zhiyu Tan et.al. | 2405.12914 | link |
2024-05-21 | Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment | Holli Sargeant et.al. | 2405.12910 | link |
2024-05-21 | Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents | San Kim et.al. | 2405.12900 | null |
2024-05-21 | Investigating Persuasion Techniques in Arabic: An Empirical Study Leveraging Large Language Models | Abdurahmman Alzahrani et.al. | 2405.12884 | null |
2024-05-21 | LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language | James Requeima et.al. | 2405.12856 | link |
2024-05-21 | OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models | Zhaojian Yu et.al. | 2405.12843 | link |
2024-05-21 | SmartFlow: Robotic Process Automation using LLMs | Arushi Jain et.al. | 2405.12842 | null |
2024-05-21 | Large Language Models Meet NLP: A Survey | Libo Qin et.al. | 2405.12819 | link |
2024-05-21 | Test Oracle Automation in the era of LLMs | Facundo Molina et.al. | 2405.12766 | null |
2024-05-21 | C3L: Content Correlated Vision-Language Instruction Tuning Data Generation via Contrastive Learning | Ji Ma et.al. | 2405.12752 | null |
2024-05-21 | Generative AI and Large Language Models for Cyber Security: All Insights You Need | Mohamed Amine Ferrag et.al. | 2405.12750 | null |
2024-05-20 | Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning | Guanglin Zhou et.al. | 2405.12217 | link |
2024-05-20 | MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | Hongwei Liu et.al. | 2405.12209 | link |
2024-05-20 | Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey | Thiago S. Vaillant et.al. | 2405.12195 | link |
2024-05-20 | CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models | Haoxiang Shi et.al. | 2405.12174 | null |
2024-05-20 | Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging | Xiaobo Liang et.al. | 2405.12163 | link |
2024-05-20 | Eliciting Problem Specifications via Large Language Models | Robert E. Wray et.al. | 2405.12147 | null |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
2024-05-20 | MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | Ting Jiang et.al. | 2405.12130 | link |
2024-05-20 | Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation | Zhankui He et.al. | 2405.12119 | null |
2024-05-20 | Imp: Highly Capable Large Multimodal Models for Mobile Devices | Zhenwei Shao et.al. | 2405.12107 | link |
2024-05-20 | DOP: Diagnostic-Oriented Prompting for Large Language Models in Mathematical Correction | Hao Chen et.al. | 2405.12100 | null |
2024-05-20 | Distributional Semantics, Holism, and the Instability of Meaning | Jumbly Grindrod et.al. | 2405.12084 | null |
2024-05-20 | PARALLELGPUOS: A Concurrent OS-level GPU Checkpoint and Restore System using Validated Speculation | Zhuobin Huang et.al. | 2405.12079 | null |
2024-05-20 | CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models | Tong Zhang et.al. | 2405.12063 | link |
2024-05-20 | STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents | Yue Chen et.al. | 2405.12059 | null |
2024-05-20 | KG-RAG: Bridging the Gap Between Knowledge and Creativity | Diego Sanmartin et.al. | 2405.12035 | null |
2024-05-20 | Can AI Relate: Testing Large Language Model Response for Mental Health Support | Saadia Gabriel et.al. | 2405.12021 | null |
2024-05-20 | MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering | Jingqun Tang et.al. | 2405.11985 | link |
2024-05-20 | A review on the use of large language models as virtual tutors | Silvia García-Méndez et.al. | 2405.11983 | null |
2024-05-20 | Position-Guided Prompt Learning for Anomaly Detection in Chest X-Rays | Zhichao Sun et.al. | 2405.11976 | link |
2024-05-17 | Observational Scaling Laws and the Predictability of Language Model Performance | Yangjun Ruan et.al. | 2405.10938 | link |
2024-05-17 | A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | Kaiyu Huang et.al. | 2405.10936 | link |
2024-05-17 | The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks | Lucius Bushnaq et.al. | 2405.10928 | link |
2024-05-17 | Blackbox Adaptation for Medical Image Segmentation | Jay N. Paranjape et.al. | 2405.10913 | link |
2024-05-17 | COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | Dimitrios P. Panagoulias et.al. | 2405.10893 | null |
2024-05-17 | Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review | Hongyi Yang et.al. | 2405.10883 | null |
2024-05-17 | ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning Chains | Zhaopei Huang et.al. | 2405.10860 | link |
2024-05-17 | The Future of Large Language Model Pre-training is Federated | Lorenzo Sani et.al. | 2405.10853 | null |
2024-05-17 | Open-Vocabulary Spatio-Temporal Action Detection | Tao Wu et.al. | 2405.10832 | null |
2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825 | null |
2024-05-17 | ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | Markus Bayer et.al. | 2405.10808 | null |
2024-05-17 | The Relational Machine Calculus | Chris Barrett et.al. | 2405.10801 | null |
2024-05-17 | Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings | Albert Sawczyn et.al. | 2405.10745 | null |
2024-05-17 | Efficient Multimodal Large Language Models: A Survey | Yizhang Jin et.al. | 2405.10739 | link |
2024-05-17 | INDUS: Effective and Efficient Language Models for Scientific Applications | Bishwaranjan Bhattacharjee et.al. | 2405.10725 | null |
2024-05-17 | SignLLM: Sign Languages Production Large Language Models | Sen Fang et.al. | 2405.10718 | null |
2024-05-17 | Persian Pronoun Resolution: Leveraging Neural Networks and Language Models | Hassan Haji Mohammadi et.al. | 2405.10714 | null |
2024-05-17 | SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks | Michael Shliselberg et.al. | 2405.10700 | null |
2024-05-17 | Revolutionizing Process Mining: A Novel Architecture for ChatGPT Integration and Enhanced User Experience through Optimized Prompt Engineering | Mehrdad Agha Mohammad Ali Kermani et.al. | 2405.10689 | null |
2024-05-17 | Realistic Evaluation of Toxicity in Large Language Models | Tinh Son Luong et.al. | 2405.10659 | null |
2024-05-16 | UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | Sahel Sharifymoghaddam et.al. | 2405.10311 | null |
2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
2024-05-16 | Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees | Yu Gui et.al. | 2405.10301 | null |
2024-05-16 | HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models | Rhea Sanjay Sukthanker et.al. | 2405.10299 | link |
2024-05-17 | Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | Yuexiang Zhai et.al. | 2405.10292 | null |
2024-05-16 | Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction | Jianhao Chen et.al. | 2405.10288 | link |
2024-05-16 | FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models | Adrian Bulat et.al. | 2405.10286 | null |
2024-05-16 | Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers | Tuo Zhang et.al. | 2405.10276 | null |
2024-05-16 | Keep It Private: Unsupervised Privatization of Online Text | Calvin Bao et.al. | 2405.10260 | link |
2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255 | link |
2024-05-16 | PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology | George Shaikovski et.al. | 2405.10254 | null |
2024-05-16 | A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | Xuanfan Ni et.al. | 2405.10251 | null |
2024-05-16 | IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers | Hao Yan et.al. | 2405.10250 | null |
2024-05-16 | A Foundation Model for Brain Lesion Segmentation with Mixture of Modality Experts | Xinru Zhang et.al. | 2405.10246 | link |
2024-05-16 | DocuMint: Docstring Generation for Python using Small Language Models | Bibek Poudel et.al. | 2405.10243 | link |
2024-05-16 | Low-Rank Adaptation of Time Series Foundational Models for Out-of-Domain Modality Forecasting | Divij Gupta et.al. | 2405.10216 | null |
2024-05-16 | CPsyExam: A Chinese Benchmark for Evaluating Psychology using Examinations | Jiahao Zhao et.al. | 2405.10212 | null |
2024-05-16 | LFED: A Literary Fiction Evaluation Dataset for Large Language Models | Linhao Yu et.al. | 2405.10166 | link |
2024-05-16 | PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning | Jiancheng Pan et.al. | 2405.10160 | link |
2024-05-16 | Speaker Verification in Agent-Generated Conversations | Yizhe Yang et.al. | 2405.10150 | null |
2024-05-15 | Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming | Bushi Xiao et.al. | 2405.09508 | null |
2024-05-15 | Constrained Learning for Causal Inference and Semiparametric Statistics | Tiffany Tianhui Cai et.al. | 2405.09493 | null |
2024-05-15 | Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts | Donya Rooein et.al. | 2405.09482 | null |
2024-05-15 | Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models | Majid Zarharan et.al. | 2405.09454 | link |
2024-05-15 | M |
Yufeng Jiang et.al. | 2405.09446 | link |
2024-05-15 | Facilitating Opinion Diversity through Hybrid NLP Approaches | Michiel van der Meer et.al. | 2405.09439 | null |
2024-05-15 | A Survey On Text-to-3D Contents Generation In The Wild | Chenhan Jiang et.al. | 2405.09431 | null |
2024-05-15 | MicroPython Testbed for Federated Learning Algorithms | Miroslav Popovic et.al. | 2405.09423 | link |
2024-05-15 | Matching domain experts by training from scratch on domain knowledge | Xiaoliang Luo et.al. | 2405.09395 | null |
2024-05-15 | Compositional imprecise probability | Jack Liell-Cock et.al. | 2405.09391 | null |
2024-05-15 | PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | Devansh Jain et.al. | 2405.09373 | null |
2024-05-15 | SARATR-X: A Foundation Model for Synthetic Aperture Radar Images Target Recognition | Weijie L et.al. | 2405.09365 | null |
2024-05-15 | Large Language Model Bias Mitigation from the Perspective of Knowledge Editing | Ruizhe Chen et.al. | 2405.09341 | null |
2024-05-15 | Prompting-based Synthetic Data Generation for Few-Shot Question Answering | Maximilian Schmidt et.al. | 2405.09335 | null |
2024-05-15 | Transfer Learning in Pre-Trained Large Language Models for Malware Detection Based on System Calls | Pedro Miguel Sánchez Sánchez et.al. | 2405.09318 | null |
2024-05-15 | Comparing the Efficacy of GPT-4 and Chat-GPT in Mental Health Care: A Blind Assessment of Large Language Models for Psychological Support | Birger Moell et.al. | 2405.09300 | null |
2024-05-15 | Do language models capture implied discourse meanings? An investigation with exhaustivity implicatures of Korean morphology | Hagyeong Shin et.al. | 2405.09293 | null |
2024-05-15 | Sign of the Times: Evaluating the use of Large Language Models for Idiomaticity Detection | Dylan Phelps et.al. | 2405.09279 | null |
2024-05-15 | Dynamic Activation Pitfalls in LLaMA Models: An Empirical Study | Chi Ma et.al. | 2405.09274 | null |
2024-05-15 | New Textual Corpora for Serbian Language Modeling | Mihailo Škorić et.al. | 2405.09250 | null |
2024-05-14 | Efficient Vision-Language Pre-training by Cluster Masking | Zihao Wei et.al. | 2405.08815 | link |
2024-05-14 | Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs | Edison Jair Bejarano Sepulveda et.al. | 2405.08792 | link |
2024-05-14 | Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring | Tiantian Zhang et.al. | 2405.08786 | link |
2024-05-14 | Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | Akhila Yerukola et.al. | 2405.08760 | link |
2024-05-14 | Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach | Syed Mhamudul Hasan et.al. | 2405.08755 | null |
2024-05-14 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Zhimin Li et.al. | 2405.08748 | link |
2024-05-14 | Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory | Xueyan Niu et.al. | 2405.08707 | null |
2024-05-14 | EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera | Beilei Cui et.al. | 2405.08672 | link |
2024-05-14 | Promoting AI Equity in Science: Generalized Domain Prompt Learning for Accessible VLM Research | Qinglong Cao et.al. | 2405.08668 | link |
2024-05-14 | Thinking Tokens for Language Modeling | David Herel et.al. | 2405.08644 | null |
2024-05-15 | ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation | Dimitris Gkoumas et.al. | 2405.08619 | null |
2024-05-14 | A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine | Hanguang Xiao et.al. | 2405.08603 | null |
2024-05-15 | EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark | Xiaohui Zhang et.al. | 2405.08596 | null |
2024-05-14 | Open-Vocabulary Object Detection via Neighboring Region Attention Alignment | Sunyuan Qiang et.al. | 2405.08593 | null |
2024-05-14 | Improving Transformers with Dynamically Composable Multi-Head Attention | Da Xiao et.al. | 2405.08553 | link |
2024-05-14 | Self-Distillation Improves DNA Sequence Inference | Tong Yu et.al. | 2405.08538 | link |
2024-05-14 | Falcon 7b for Software Mention Detection in Scholarly Documents | AmeerAli Khan et.al. | 2405.08514 | null |
2024-05-14 | Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure | Odysseas S. Chlapanis et.al. | 2405.08502 | link |
2024-05-14 | Is Less More? Quality, Quantity and Context in Idiom Processing with Natural Language Models | Agne Knietaite et.al. | 2405.08497 | link |
2024-05-14 | Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models | Andrea Piergentili et.al. | 2405.08477 | null |
2024-05-13 | Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | Chengyue Wu et.al. | 2405.07990 | null |
2024-05-13 | A Generalist Learner for Multifaceted Medical Image Interpretation | Hong-Yu Zhou et.al. | 2405.07988 | null |
2024-05-13 | The Platonic Representation Hypothesis | Minyoung Huh et.al. | 2405.07987 | link |
2024-05-13 | Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentation | Kevin Stangl et.al. | 2405.07969 | null |
2024-05-13 | PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation | Suad Alshammari et.al. | 2405.07963 | null |
2024-05-13 | AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | Samuel Schmidgall et.al. | 2405.07960 | null |
2024-05-13 | EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning | Yinzhu Quan et.al. | 2405.07938 | link |
2024-05-13 | PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | Ziyang Zhang et.al. | 2405.07932 | link |
2024-05-13 | Stable Diffusion-based Data Augmentation for Federated Learning with Non-IID Data | Mahdi Morafah et.al. | 2405.07925 | null |
2024-05-13 | Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? | Hari Chandana Kuchibhotla et.al. | 2405.07921 | null |
2024-05-13 | A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | Ferdinand Schlatt et.al. | 2405.07920 | link |
2024-05-13 | PLUTO: Pathology-Universal Transformer | Dinkar Juyal et.al. | 2405.07905 | null |
2024-05-13 | Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers | Alena Tsanda et.al. | 2405.07886 | link |
2024-05-13 | Zero-Shot Tokenizer Transfer | Benjamin Minixhofer et.al. | 2405.07883 | link |
2024-05-13 | RLHF Workflow: From Reward Modeling to Online RLHF | Hanze Dong et.al. | 2405.07863 | link |
2024-05-13 | Can LLMs Help Predict Elections? (Counter)Evidence from the World's Largest Democracy | Pratik Gujral et.al. | 2405.07828 | null |
2024-05-13 | A View of How Language Models Will Transform Law | Frank Fagan et.al. | 2405.07826 | null |
2024-05-13 | FreeVA: Offline MLLM as Training-Free Video Assistant | Wenhao Wu et.al. | 2405.07798 | link |
2024-05-13 | DEPTH: Discourse Education through Pre-Training Hierarchically | Zachary Bamberger et.al. | 2405.07788 | link |
2024-05-13 | Generating Human Motion in 3D Scenes from Text Descriptions | Zhi Cen et.al. | 2405.07784 | null |
2024-05-10 | Linearizing Large Language Models | Jean Mercat et.al. | 2405.06640 | link |
2024-05-10 | Value Augmented Sampling for Language Model Alignment and Personalization | Seungwook Han et.al. | 2405.06639 | link |
2024-05-10 | Multimodal LLMs Struggle with Basic Visual Network Analysis: a VNA Benchmark | Evan M. Williams et.al. | 2405.06634 | link |
2024-05-10 | Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | Chakshu Moar et.al. | 2405.06626 | null |
2024-05-10 | Explaining Text Similarity in Transformer Models | Alexandros Vasileiou et.al. | 2405.06604 | link |
2024-05-10 | Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach | Elham Ravanbakhsh et.al. | 2405.06586 | null |
2024-05-10 | What Can Natural Language Processing Do for Peer Review? | Ilia Kuznetsov et.al. | 2405.06563 | link |
2024-05-10 | Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | Mengjia Niu et.al. | 2405.06545 | null |
2024-05-10 | Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts | Wenyu Huang et.al. | 2405.06524 | null |
2024-05-10 | UniDM: A Unified Framework for Data Manipulation with Large Language Models | Yichen Qian et.al. | 2405.06510 | null |
2024-05-10 | Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling | Lyumanshan Ye et.al. | 2405.06495 | null |
2024-05-10 | Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification | Yaoqin Ye et.al. | 2405.06468 | link |
2024-05-10 | Improving Instruction Following in Language Models through Proxy-Based Uncertainty Estimation | JoonHo Lee et.al. | 2405.06424 | link |
2024-05-10 | Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? | Hunter McNichols et.al. | 2405.06414 | link |
2024-05-10 | Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL | Ning Cheng et.al. | 2405.06410 | null |
2024-05-10 | Program Synthesis using Inductive Logic Programming for the Abstraction and Reasoning Corpus | Filipe Marinho Rocha et.al. | 2405.06399 | null |
2024-05-10 | Memory Mosaics | Jianyu Zhang et.al. | 2405.06394 | link |
2024-05-10 | LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play | Li-Chun Lu et.al. | 2405.06373 | null |
2024-05-10 | LMD3: Language Model Data Density Dependence | John Kirchenbauer et.al. | 2405.06331 | null |
2024-05-10 | Correlation Dimension of Natural Language in a Statistical Manifold | Xin Du et.al. | 2405.06321 | null |
2024-05-09 | Natural Language Processing RELIES on Linguistics | Juri Opitz et.al. | 2405.05966 | null |
2024-05-09 | OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning | Dan Qiao et.al. | 2405.05957 | link |
2024-05-09 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram et.al. | 2405.05956 | link |
2024-05-09 | Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | Junzhi Chen et.al. | 2405.05955 | link |
2024-05-09 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Jiachen Li et.al. | 2405.05949 | link |
2024-05-09 | DOLOMITES: Domain-Specific Long-Form Methodical Tasks | Chaitanya Malaviya et.al. | 2405.05938 | null |
2024-05-09 | Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness | Siyuan Li et.al. | 2405.05930 | null |
2024-05-09 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | Zorik Gekhman et.al. | 2405.05904 | null |
2024-05-09 | Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes | Ziang Guo et.al. | 2405.05885 | null |
2024-05-09 | FlockGPT: Guiding UAV Flocking with Linguistic Orchestration | Artem Lykov et.al. | 2405.05872 | null |
2024-05-09 | Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control | Gunshi Gupta et.al. | 2405.05852 | link |
2024-05-09 | Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning | Artem Lykov et.al. | 2405.05824 | link |
2024-05-09 | Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference | Zhihang Lin et.al. | 2405.05803 | link |
2024-05-09 | Towards a More Inclusive AI: Progress and Perspectives in Large Language Model Training for the Sámi Language | Ronny Paul et.al. | 2405.05777 | null |
2024-05-09 | Experimental Pragmatics with Machines: Testing LLM Predictions for the Inferences of Plain and Embedded Disjunctions | Polina Tsvilodub et.al. | 2405.05776 | null |
2024-05-09 | Large Language Model-Aided Evolutionary Search for Constrained Multiobjective Optimization | Zeyi Wang et.al. | 2405.05767 | null |
2024-05-09 | Similarity Guided Multimodal Fusion Transformer for Semantic Location Prediction in Social Media | Zhizhen Zhang et.al. | 2405.05760 | null |
2024-05-09 | Exploring the Potential of Human-LLM Synergy in Advancing Qualitative Analysis: A Case Study on Mental-Illness Stigma | Han Meng et.al. | 2405.05758 | null |
2024-05-09 | Can large language models understand uncommon meanings of common words? | Jinyang Wu et.al. | 2405.05741 | null |
2024-05-09 | Evaluating Dialect Robustness of Language Models via Conversation Understanding | Dipankar Srirag et.al. | 2405.05688 | link |
2024-05-08 | THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models | Prannay Kaul et.al. | 2405.05256 | null |
2024-05-08 | You Only Cache Once: Decoder-Decoder Architectures for Language Models | Yutao Sun et.al. | 2405.05254 | link |
2024-05-08 | Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge | Charles Koutcheme et.al. | 2405.05253 | link |
2024-05-09 | LLMs with Personalities in Multi-issue Negotiation Games | Sean Noh et.al. | 2405.05248 | null |
2024-05-08 | EVA-X: A Foundation Model for General Chest X-ray Analysis with Self-supervised Learning | Jingfeng Yao et.al. | 2405.05237 | link |
2024-05-08 | SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants | Masoud Moghani et.al. | 2405.05226 | null |
2024-05-08 | Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | Jiuxiang Gu et.al. | 2405.05219 | null |
2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216 | link |
2024-05-08 | MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning | Inderjeet Nair et.al. | 2405.05189 | null |
2024-05-08 | Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming | Tommaso Pasini et.al. | 2405.05176 | null |
2024-05-08 | Air Gap: Protecting Privacy-Conscious Conversational Agents | Eugene Bagdasaryan et.al. | 2405.05175 | null |
2024-05-08 | XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples | Peiqin Lin et.al. | 2405.05116 | link |
2024-05-08 | QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs | Weijia Zhang et.al. | 2405.05109 | null |
2024-05-08 | Concerns on Bias in Large Language Models when Creating Synthetic Personae | Helena A. Haxvig et.al. | 2405.05080 | null |
2024-05-08 | Impact of Tone-Aware Explanations in Recommender Systems | Ayano Okoso et.al. | 2405.05061 | null |
2024-05-08 | Conversational Topic Recommendation in Counseling and Psychotherapy with Decision Transformer and Large Language Models | Aylin Gunal et.al. | 2405.05060 | null |
2024-05-08 | Seeds of Stereotypes: A Large-Scale Textual Analysis of Race and Gender Associations with Diseases in Online Sources | Lasse Hyldig Hansen et.al. | 2405.05049 | null |
2024-05-08 | Ning Wang et.al. | 2405.05010 | null | |
2024-05-08 | ADELIE: Aligning Large Language Models on Information Extraction | Yunjia Qi et.al. | 2405.05008 | link |
2024-05-08 | NAVRepair: Node-type Aware C/C++ Code Vulnerability Repair | Ruoke Wang et.al. | 2405.04994 | null |
2024-05-07 | ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning | Jing Lin et.al. | 2405.04533 | null |
2024-05-07 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
2024-05-07 | NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | Shudan Zhang et.al. | 2405.04520 | null |
2024-05-07 | xLSTM: Extended Long Short-Term Memory | Maximilian Beck et.al. | 2405.04517 | null |
2024-05-07 | A Transformer with Stack Attention | Jiaoda Li et.al. | 2405.04515 | link |
2024-05-08 | Unveiling Disparities in Web Task Handling Between Human and Web Agent | Kihoon Son et.al. | 2405.04497 | null |
2024-05-07 | Toward In-Context Teaching: Adapting Examples to Students' Misconceptions | Alexis Ross et.al. | 2405.04495 | null |
2024-05-07 | Representation Learning of Daily Movement Data Using Text Encoders | Alexander Capstick et.al. | 2405.04494 | link |
2024-05-08 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | DeepSeek-AI et.al. | 2405.04434 | link |
2024-05-07 | The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring | Lena Armstrong et.al. | 2405.04412 | null |
2024-05-07 | Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks | Georgios Pantazopoulos et.al. | 2405.04403 | link |
2024-05-07 | Large Language Models Cannot Explain Themselves | Advait Sarkar et.al. | 2405.04382 | null |
2024-05-07 | A Fourth Wave of Open Data? Exploring the Spectrum of Scenarios for Open Data and Generative AI | Hannah Chafetz et.al. | 2405.04333 | null |
2024-05-07 | Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | Atharvan Dogra et.al. | 2405.04325 | null |
2024-05-07 | Granite Code Models: A Family of Open Foundation Models for Code Intelligence | Mayank Mishra et.al. | 2405.04324 | link |
2024-05-07 | Accelerating Speculative Decoding using Dynamic Speculation Length | Jonathan Mamou et.al. | 2405.04304 | null |
2024-05-07 | Enhancing the Efficiency and Accuracy of Underlying Asset Reviews in Structured Finance: The Application of Multi-agent Framework | Xiangpeng Wan et.al. | 2405.04294 | link |
2024-05-07 | Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore | Junchao Wu et.al. | 2405.04286 | null |
2024-05-07 | On the Foundations of Earth and Climate Foundation Models | Xiao Xiang Zhu et.al. | 2405.04285 | null |
2024-05-07 | Semantic API Alignment: Linking High-level User Goals to APIs | Robert Feldt et.al. | 2405.04236 | null |
2024-05-06 | Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690 | null |
2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689 | null |
2024-05-06 | Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames | Keith Burghardt et.al. | 2405.03688 | link |
2024-05-06 | Language-Image Models with 3D Understanding | Jang Hyun Cho et.al. | 2405.03685 | null |
2024-05-06 | AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design | Kamal Choudhary et.al. | 2405.03680 | null |
2024-05-06 | When LLMs Meet Cybersecurity: A Systematic Literature Review | Jie Zhang et.al. | 2405.03644 | link |
2024-05-06 | A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama | Vlad-Andrei Cursaru et.al. | 2405.03616 | null |
2024-05-06 | GREEN: Generative Radiology Report Evaluation and Error Notation | Sophie Ostmeier et.al. | 2405.03595 | null |
2024-05-06 | Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | Abhinav Agarwalla et.al. | 2405.03594 | null |
2024-05-06 | Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing | Han Liu et.al. | 2405.03565 | null |
2024-05-07 | ID-centric Pre-training for Recommendation | Yiqing Wu et.al. | 2405.03562 | null |
2024-05-06 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553 | link |
2024-05-06 | MAmmoTH2: Scaling Instructions from the Web | Xiang Yue et.al. | 2405.03548 | null |
2024-05-06 | Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions | Xingyou Song et.al. | 2405.03547 | null |
2024-05-06 | Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning | Yubo Mai et.al. | 2405.03509 | null |
2024-05-06 | UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images | Yiting Qu et.al. | 2405.03486 | null |
2024-05-06 | LGTM: Local-to-Global Text-Driven Human Motion Diffusion Model | Haowen Sun et.al. | 2405.03485 | link |
2024-05-06 | Doing Personal LAPS: LLM-Augmented Dialogue Construction for Personalized Multi-Session Conversational Search | Hideaki Joko et.al. | 2405.03480 | link |
2024-05-07 | Large Language Models (LLMs) as Agents for Augmented Democracy | Jairo Gudiño-Rosero et.al. | 2405.03452 | null |
2024-05-06 | SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence | Hangyuan Ji et.al. | 2405.03446 | link |
2024-05-03 | Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models | Piotr Padlewski et.al. | 2405.02287 | link |
2024-05-03 | Structural Pruning of Pre-trained Language Models via Neural Architecture Search | Aaron Klein et.al. | 2405.02267 | null |
2024-05-03 | On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning? | Maxime Zanella et.al. | 2405.02266 | link |
2024-05-03 | Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows | Jasmine Y. Shih et.al. | 2405.02260 | null |
2024-05-03 | What matters when building vision-language models? | Hugo Laurençon et.al. | 2405.02246 | null |
2024-05-03 | REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs | Deepa Tilwani et.al. | 2405.02228 | null |
2024-05-03 | Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks | Lujing Zhang et.al. | 2405.02225 | null |
2024-05-03 | FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems | Yashar Deldjoo et.al. | 2405.02219 | null |
2024-05-03 | Automatic Programming: Large Language Models and Beyond | Michael R. Lyu et.al. | 2405.02213 | null |
2024-05-03 | Assessing and Verifying Task Utility in LLM-Powered Applications | Negar Arabzadeh et.al. | 2405.02178 | null |
2024-05-03 | Hoaxpedia: A Unified Wikipedia Hoax Articles Dataset | Hsuvas Borkakoty et.al. | 2405.02175 | null |
2024-05-03 | Mapping the Unseen: Unified Promptable Panoptic Mapping with Dynamic Labeling using Foundation Models | Mohamad Al Mdfaa et.al. | 2405.02162 | null |
2024-05-03 | Neural Context Flows for Learning Generalizable Dynamical Systems | Roussel Desmond Nzoyem et.al. | 2405.02154 | link |
2024-05-03 | The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | Giuseppe Russo Latona et.al. | 2405.02150 | link |
2024-05-03 | MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain | Chao Jiang et.al. | 2405.02144 | null |
2024-05-03 | Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection | Guillem Ramírez et.al. | 2405.02134 | null |
2024-05-03 | Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets | Xuelong Geng et.al. | 2405.02132 | null |
2024-05-03 | Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph | Vladyslav Nechakhin et.al. | 2405.02105 | null |
2024-05-03 | Argumentative Large Language Models for Explainable and Contestable Decision-Making | Gabriel Freedman et.al. | 2405.02079 | null |
2024-05-03 | Comparative Analysis of Retrieval Systems in the Real World | Dmytro Mozolevskyi et.al. | 2405.02048 | null |
2024-05-02 | Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models | Seungone Kim et.al. | 2405.01535 | link |
2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534 | null |
2024-05-02 | OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning | Shihao Wang et.al. | 2405.01533 | link |
2024-05-02 | FLAME: Factuality-Aware Alignment for Large Language Models | Sheng-Chieh Lin et.al. | 2405.01525 | null |
2024-05-02 | A separability-based approach to quantifying generalization: which layer is best? | Luciano Dyballa et.al. | 2405.01524 | null |
2024-05-02 | Transformer-Aided Semantic Communications | Matin Mortaheb et.al. | 2405.01521 | null |
2024-05-02 | D2PO: Discriminator-Guided DPO with Response Evaluation Models | Prasann Singhal et.al. | 2405.01511 | link |
2024-05-02 | Analyzing the Role of Semantic Representations in the Era of Large Language Models | Zhijing Jin et.al. | 2405.01502 | link |
2024-05-02 | Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models | Raymond Fok et.al. | 2405.01501 | null |
2024-05-02 | Controllable Text Generation in the Instruction-Tuning Era | Dhananjay Ashok et.al. | 2405.01490 | null |
2024-05-02 | MANTIS: Interleaved Multi-Image Instruction Tuning | Dongfu Jiang et.al. | 2405.01483 | link |
2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
2024-05-02 | V-FLUTE: Visual Figurative Language Understanding with Textual Explanations | Arkadiy Saakyan et.al. | 2405.01474 | link |
2024-05-02 | Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning | Théo Moutakanni et.al. | 2405.01469 | null |
2024-05-02 | Understanding Retrieval-Augmented Task Adaptation for Vision-Language Models | Yifei Ming et.al. | 2405.01468 | null |
2024-05-02 | A Systematic Literature Review on Large Language Models for Automated Program Repair | Quanjun Zhang et.al. | 2405.01466 | link |
2024-05-02 | Natural Language to Verilog: Design of a Recurrent Spiking Neural Network using Large Language Models and ChatGPT | Paola Vitolo et.al. | 2405.01419 | null |
2024-05-02 | MiniGPT-3D: Efficiently Aligning 3D Point Clouds with Large Language Models using 2D Priors | Yuan Tang et.al. | 2405.01413 | link |
2024-05-02 | Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving | Xin Quan et.al. | 2405.01379 | null |
2024-05-02 | GAIA: A General AI Assistant for Intelligent Accelerator Operations | Frank Mayet et.al. | 2405.01359 | null |
2024-05-01 | Self-Play Preference Optimization for Language Model Alignment | Yue Wu et.al. | 2405.00675 | link |
2024-05-01 | Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 | Junsang Yoon et.al. | 2405.00664 | link |
2024-05-01 | HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models | Ningke Li et.al. | 2405.00648 | null |
2024-05-01 | When Quantization Affects Confidence of Large Language Models? | Irina Proskurina et.al. | 2405.00632 | link |
2024-05-01 | "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust | Sunnie S. Y. Kim et.al. | 2405.00623 | null |
2024-05-01 | Causal Evaluation of Language Models | Sirui Chen et.al. | 2405.00622 | link |
2024-05-01 | Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | Yida Mu et.al. | 2405.00611 | link |
2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602 | null |
2024-05-01 | Are Models Biased on Text without Gender-related Language? | Catarina G Belém et.al. | 2405.00588 | link |
2024-05-01 | The Real, the Better: Aligning Large Language Models with Online Human Behaviors | Guanying Jiang et.al. | 2405.00578 | null |
2024-05-01 | EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model | Deng Li et.al. | 2405.00574 | null |
2024-05-01 | NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance | Huan-Yi Su et.al. | 2405.00566 | null |
2024-05-01 | Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment | Zhili Liu et.al. | 2405.00557 | null |
2024-05-01 | Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs | Nicolas Gorlo et.al. | 2405.00552 | link |
2024-05-01 | ChatBI: Towards Natural Language to Complex Business Intelligence SQL | Jinqing Lian et.al. | 2405.00527 | null |
2024-05-01 | CookingSense: A Culinary Knowledgebase with Multidisciplinary Assertions | Donghee Choi et.al. | 2405.00523 | null |
2024-05-01 | Navigating WebAI: Training Agents to Complete Web Tasks with Large Language Models and Reinforcement Learning | Lucas-Andreï Thil et.al. | 2405.00516 | null |
2024-05-01 | GOLD: Geometry Problem Solver with Natural Language Description | Jiaxin Zhang et.al. | 2405.00494 | link |
2024-05-01 | Is Temperature the Creativity Parameter of Large Language Models? | Max Peeperkorn et.al. | 2405.00492 | null |
2024-05-01 | The Pyramid of Captions | Delong Chen et.al. | 2405.00485 | null |
2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
2024-04-30 | PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification | Leon Garza et.al. | 2404.19744 | null |
2024-04-30 | Better & Faster Large Language Models via Multi-token Prediction | Fabian Gloeckle et.al. | 2404.19737 | null |
2024-04-30 | A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications | Steph Buongiorno et.al. | 2404.19729 | null |
2024-04-30 | PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | Steph Buongiorno et.al. | 2404.19721 | null |
2024-04-30 | Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | Constantinos Patsakis et.al. | 2404.19715 | null |
2024-04-30 | Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models | Scott Sumpter et.al. | 2404.19713 | null |
2024-04-30 | When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | Tiziano Labruna et.al. | 2404.19705 | link |
2024-04-30 | Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Chun Feng et.al. | 2404.19696 | null |
2024-04-30 | Towards Generalist Robot Learning from Internet Video: A Survey | Robert McCarthy et.al. | 2404.19664 | null |
2024-04-30 | MetaCoCo: A New Few-Shot Classification Benchmark with Spurious Correlation | Min Zhang et.al. | 2404.19644 | null |
2024-04-30 | On Training a Neural Network to Explain Binaries | Alexander Interrante-Grant et.al. | 2404.19631 | null |
2024-04-30 | Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model | Denys Godwin et.al. | 2404.19609 | null |
2024-04-30 | Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning | Xuanli He et.al. | 2404.19597 | null |
2024-04-30 | RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing | Yucheng Hu et.al. | 2404.19543 | link |
2024-04-30 | MoST: Multi-modality Scene Tokenization for Motion Prediction | Norman Mu et.al. | 2404.19531 | null |
2024-04-30 | Do Large Language Models Understand Conversational Implicature -- A case study with a chinese sitcom | Shisen Yue et.al. | 2404.19509 | link |
2024-04-30 | More Compute Is What You Need | Zhen Guo et.al. | 2404.19484 | null |
2024-05-01 | Neuro-Vision to Language: Image Reconstruction and Language enabled Interaction via Brain Recordings | Guobin Shen et.al. | 2404.19438 | null |
2024-04-30 | Can Large Language Models put 2 and 2 together? Probing for Entailed Arithmetical Relationships | D. Panas et.al. | 2404.19432 | null |
2024-04-29 | Hallucination of Multimodal Large Language Models: A Survey | Zechen Bai et.al. | 2404.18930 | link |
2024-04-29 | Holmes: Benchmark the Linguistic Competence of Language Models | Andreas Waldis et.al. | 2404.18923 | null |
2024-04-29 | DPO Meets PPO: Reinforced Token Optimization for RLHF | Han Zhong et.al. | 2404.18922 | null |
2024-04-29 | TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation | Junhao Cheng et.al. | 2404.18919 | link |
2024-04-29 | Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | Fangcheng Liu et.al. | 2404.18911 | link |
2024-04-29 | Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking | Hong Jin Kang et.al. | 2404.18881 | link |
2024-04-29 | More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness | Aaron J. Li et.al. | 2404.18870 | link |
2024-04-29 | Truth-value judgment in language models: belief directions are context sensitive | Stefan F. Schouten et.al. | 2404.18865 | null |
2024-04-29 | Performance-Aligned LLMs for Generating Fast Code | Daniel Nichols et.al. | 2404.18864 | null |
2024-04-29 | A Survey on Vision Mamba: Models, Applications and Challenges | Rui Xu et.al. | 2404.18861 | link |
2024-04-29 | VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning | Aidan Z. H. Yang et.al. | 2404.18852 | null |
2024-04-29 | FeDeRA:Efficient Fine-tuning of Language Models in Federated Learning Leveraging Weight Decomposition | Yuxuan Yan et.al. | 2404.18848 | null |
2024-04-29 | It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments | Petter Mæhlum et.al. | 2404.18832 | null |
2024-04-29 | Benchmarking Benchmark Leakage in Large Language Models | Ruijie Xu et.al. | 2404.18824 | link |
2024-04-29 | AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering | Wenxiang Zhao et.al. | 2404.18816 | null |
2024-04-29 | Unknown Script: Impact of Script on Cross-Lingual Transfer | Wondimagegnhue Tsegaye Tufa et.al. | 2404.18810 | link |
2024-04-29 | Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models | Pat Verga et.al. | 2404.18796 | null |
2024-04-29 | PECC: Problem Extraction and Coding Challenges | Patrick Haller et.al. | 2404.18766 | link |
2024-04-29 | Transitive Vision-Language Prompt Learning for Domain Generalization | Liyuan Wang et.al. | 2404.18758 | null |
2024-04-29 | Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models | Hongyi Zhu et.al. | 2404.18746 | null |
2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546 | link |
2024-04-26 | Exploring the Distinctiveness and Fidelity of the Descriptions Generated by Large Vision-Language Models | Yuhang Huang et.al. | 2404.17534 | null |
2024-04-26 | Large Language Model Agent as a Mechanical Designer | Yayati Jadhav et.al. | 2404.17525 | null |
2024-04-26 | On the Use of Large Language Models to Generate Capability Ontologies | Luis Miguel Vieira da Silva et.al. | 2404.17524 | link |
2024-04-26 | Enhancing Legal Compliance and Regulation Analysis with Large Language Models | Shabnam Hassani et.al. | 2404.17522 | null |
2024-04-26 | A Comprehensive Evaluation on Event Reasoning of Large Language Models | Zhengwei Tao et.al. | 2404.17513 | link |
2024-04-26 | CEval: A Benchmark for Evaluating Counterfactual Text Generation | Van Bach Nguyen et.al. | 2404.17475 | null |
2024-04-26 | Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System | Robin Schmucker et.al. | 2404.17460 | null |
2024-04-26 | "ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses | Bruno Pereira Cipriano et.al. | 2404.17443 | null |
2024-04-26 | PromptCIR: Blind Compressed Image Restoration with Prompt Learning | Bingchen Li et.al. | 2404.17433 | link |
2024-04-26 | Evaluation of Geographical Distortions in Language Models: A Crucial Step Towards Equitable Representations | Rémy Decoupes et.al. | 2404.17401 | null |
2024-04-26 | UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning | Maoxun Yuan et.al. | 2404.17360 | null |
2024-04-26 | InspectorRAGet: An Introspection Platform for RAG Evaluation | Kshitij Fadnis et.al. | 2404.17347 | link |
2024-04-26 | Introducing cosmosGPT: Monolingual Training for Turkish Language Models | H. Toprak Kesgin et.al. | 2404.17336 | null |
2024-04-26 | A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation | Xin Zhang et.al. | 2404.17335 | null |
2024-04-26 | An Extendable Cloud-Native Alloy Property Explorer | Zhuoyuan Li et.al. | 2404.17330 | link |
2024-04-26 | When to Trust LLMs: Aligning Confidence with Response Quality | Shuchang Tao et.al. | 2404.17287 | null |
2024-04-26 | Reinforcement Retrieval Leveraging Fine-grained Feedback for Fact Checking News Claims with Black-Box LLM | Xuan Zhang et.al. | 2404.17283 | link |
2024-04-26 | Prompting Towards Alleviating Code-Switched Data Scarcity in Under-Resourced Languages with GPT as a Pivot | Michelle Terblanche et.al. | 2404.17216 | null |
2024-04-26 | Low-Rank Knowledge Decomposition for Medical Foundation Models | Yuhang Zhou et.al. | 2404.17184 | null |
2024-04-25 | The Third Monocular Depth Estimation Challenge | Jaime Spencer et.al. | 2404.16831 | null |
2024-04-25 | Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials | Ye Fang et.al. | 2404.16829 | null |
2024-04-25 | V2A-Mark: Versatile Deep Visual-Audio Watermarking for Manipulation Localization and Copyright Protection | Xuanyu Zhang et.al. | 2404.16824 | null |
2024-04-25 | How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Zhe Chen et.al. | 2404.16821 | link |
2024-04-25 | IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | Harman Singh et.al. | 2404.16816 | link |
2024-04-26 | Make Your LLM Fully Utilize the Context | Shengnan An et.al. | 2404.16811 | link |
2024-04-25 | Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning | Tianhui Zhang et.al. | 2404.16807 | null |
2024-04-25 | AAPL: Adding Attributes to Prompt Learning for Vision-Language Models | Gahyeon Kim et.al. | 2404.16804 | link |
2024-04-25 | Weak-to-Strong Extrapolation Expedites Alignment | Chujie Zheng et.al. | 2404.16792 | link |
2024-04-25 | SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Bohao Li et.al. | 2404.16790 | link |
2024-04-25 | Continual Learning of Large Language Models: A Comprehensive Survey | Haizhou Shi et.al. | 2404.16789 | link |
2024-04-25 | Modeling Selective Feature Attention for Representation-based Siamese Text Matching | Jianxiang Zang et.al. | 2404.16776 | link |
2024-04-25 | REBEL: Reinforcement Learning via Regressing Relative Rewards | Zhaolin Gao et.al. | 2404.16767 | link |
2024-04-25 | Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model | Runzhe Zhan et.al. | 2404.16766 | null |
2024-04-25 | RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis | Xiaoman Zhang et.al. | 2404.16754 | null |
2024-04-25 | Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class | Mazda Moayeri et.al. | 2404.16717 | null |
2024-04-25 | Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding | Mostafa Elhoushi et.al. | 2404.16710 | null |
2024-04-25 | Cooperate or Collapse: Emergence of Sustainability Behaviors in a Society of LLM Agents | Giorgio Piatti et.al. | 2404.16698 | link |
2024-04-25 | Influence of Solution Efficiency and Valence of Instruction on Additive and Subtractive Solution Strategies in Humans and GPT-4 | Lydia Uhler et.al. | 2404.16692 | null |
2024-04-25 | EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning | Hongxia Xie et.al. | 2404.16670 | link |
2024-04-24 | Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data | Aliaksei Vertsel et.al. | 2404.15604 | null |
2024-04-24 | ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Henry Peng Zou et.al. | 2404.15592 | link |
2024-04-24 | MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis | Jiaxin Zhuang et.al. | 2404.15580 | null |
2024-04-24 | Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? | Hossein Salami et.al. | 2404.15578 | null |
2024-04-24 | Retrieval Head Mechanistically Explains Long-Context Factuality | Wenhao Wu et.al. | 2404.15574 | link |
2024-04-23 | PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models | Shashi Kant Gupta et.al. | 2404.15549 | null |
2024-04-23 | BattleAgent: Multi-modal Dynamic Emulation on Historical Battles to Complement Historical Analysis | Shuhang Lin et.al. | 2404.15532 | link |
2024-04-23 | Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Mihir Parmar et.al. | 2404.15522 | link |
2024-04-23 | Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Young Kyun Jang et.al. | 2404.15516 | null |
2024-04-23 | ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models | Weizhi Tang et.al. | 2404.15515 | null |
2024-04-23 | IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents | Jean-Philippe Corbeil et.al. | 2404.15488 | link |
2024-04-23 | Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | Het Patel et.al. | 2404.15485 | null |
2024-04-23 | Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT | Darui Lu et.al. | 2404.15458 | null |
2024-04-23 | XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference | João Monteiro et.al. | 2404.15420 | null |
2024-04-23 | Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs | Davide Caffagni et.al. | 2404.15406 | null |
2024-04-23 | Aligning LLM Agents by Learning Latent Preference from User Edits | Ge Gao et.al. | 2404.15269 | link |
2024-04-23 | XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts | Yifeng Ding et.al. | 2404.15247 | link |
2024-04-23 | CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies | Weiyan Shi et.al. | 2404.15238 | link |
2024-04-23 | Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | Aidan Z. H. Yang et.al. | 2404.15236 | null |
2024-04-23 | Re-Thinking Inverse Graphics With Large Language Models | Peter Kulits et.al. | 2404.15228 | null |
2024-04-23 | Does Instruction Tuning Make LLMs More Consistent? | Constanza Fierro et.al. | 2404.15206 | null |
2024-04-23 | Setting up the Data Printer with Improved English to Ukrainian Machine Translation | Yurii Paniv et.al. | 2404.15196 | link |
2024-04-23 | Regressive Side Effects of Training Language Models to Mimic Student Misconceptions | Shashank Sonkar et.al. | 2404.15156 | null |
2024-04-23 | Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Raphael Poulain et.al. | 2404.15149 | link |
2024-04-23 | Rethinking LLM Memorization through the Lens of Adversarial Compression | Avi Schwarzschild et.al. | 2404.15146 | null |
2024-04-23 | MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning | Sunan He et.al. | 2404.15127 | null |
2024-04-23 | Identifying Fairness Issues in Automatically Generated Testing Content | Kevin Stowe et.al. | 2404.15104 | null |
2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | null |
2024-04-23 | Detection of circular permutations by Protein Language Models | Yue Hu et.al. | 2404.15087 | link |
2024-04-23 | Multi-Head Mixture-of-Experts | Xun Wu et.al. | 2404.15045 | null |
2024-04-23 | TAXI: Evaluating Categorical Knowledge Editing for Language Models | Derek Powell et.al. | 2404.15004 | link |
2024-04-23 | Transformers Can Represent |
Anej Svete et.al. | 2404.14994 | null |
2024-04-23 | A Short Review for Ontology Learning from Text: Stride from Shallow Learning, Deep Learning to Large Language Models Trend | Rick Du et.al. | 2404.14991 | null |
2024-04-23 | Kerstin Kläser et.al. | 2404.14986 | null | |
2024-04-23 | Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case | Muhammad Asif Auyb et.al. | 2404.14977 | null |
2024-04-22 | AutoAD III: The Prequel -- Back to the Pixels | Tengda Han et.al. | 2404.14412 | null |
2024-04-22 | SpaceByte: Towards Deleting Tokenization from Large Language Modeling | Kevin Slagle et.al. | 2404.14408 | link |
2024-04-22 | RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? | Adrian de Wynter et.al. | 2404.14397 | link |
2024-04-22 | SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation | Yuying Ge et.al. | 2404.14396 | link |
2024-04-22 | PARAMANU-GANITA: Language Model with Mathematical Capabilities | Mitodru Niyogi et.al. | 2404.14395 | null |
2024-04-22 | A Multimodal Automated Interpretability Agent | Tamar Rott Shaham et.al. | 2404.14394 | null |
2024-04-22 | A Survey on Self-Evolution of Large Language Models | Zhengwei Tao et.al. | 2404.14387 | link |
2024-04-22 | Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph | Xiaochen Kev Gao et.al. | 2404.14372 | link |
2024-04-23 | Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | Fahim Tajwar et.al. | 2404.14367 | link |
2024-04-22 | Better Synthetic Data by Retrieving and Transforming Existing Datasets | Saumya Gandhi et.al. | 2404.14361 | link |
2024-04-22 | Rethinking Legal Compliance Automation: Opportunities with Large Language Models | Shabnam Hassani et.al. | 2404.14356 | null |
2024-04-22 | Calc-CMU at SemEval-2024 Task 7: Pre-Calc -- Learning to Use the Calculator Improves Numeracy in Language Models | Vishruth Veerendranath et.al. | 2404.14355 | link |
2024-04-22 | Automated Long Answer Grading with RiceChem Dataset | Shashank Sonkar et.al. | 2404.14316 | link |
2024-04-22 | Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels | Jan-Philipp Fränken et.al. | 2404.14313 | link |
2024-04-22 | Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report) | Xiang Yin et.al. | 2404.14304 | link |
2024-04-22 | Marking: Visual Grading with Highlighting Errors and Annotating Missing Bits | Shashank Sonkar et.al. | 2404.14301 | null |
2024-04-22 | Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach | Yao Wan et.al. | 2404.14296 | link |
2024-04-22 | A Survey on Efficient Inference for Large Language Models | Zixuan Zhou et.al. | 2404.14294 | null |
2024-04-22 | LLM-Personalize: Aligning LLM Planners with Human Preferences via Reinforced Self-Training for Housekeeping Robots | Dongge Han et.al. | 2404.14285 | null |
2024-04-22 | Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback | Wenyi Xiao et.al. | 2404.14233 | null |
2024-04-19 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context | Zhuofan Zong et.al. | 2404.13046 | link |
2024-04-19 | Unified Scene Representation and Reconstruction for 3D Large Language Models | Tao Chu et.al. | 2404.13044 | null |
2024-04-19 | Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil et.al. | 2404.13043 | null |
2024-04-19 | Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs | Biyang Guo et.al. | 2404.13033 | link |
2024-04-19 | When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering | Stephen Choi et.al. | 2404.13028 | null |
2024-04-19 | Stronger Random Baselines for In-Context Learning | Gregory Yauney et.al. | 2404.13020 | link |
2024-04-19 | Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | Chuofan Ma et.al. | 2404.13013 | null |
2024-04-19 | Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs | Clemencia Siro et.al. | 2404.12994 | link |
2024-04-19 | FineRec:Exploring Fine-grained Sequential Recommendation | Xiaokun Zhang et.al. | 2404.12975 | link |
2024-04-19 | Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models | Yian Li et.al. | 2404.12966 | null |
2024-04-19 | Towards Reliable Latent Knowledge Estimation in LLMs: In-Context Learning vs. Prompting Based Factual Knowledge Extraction | Qinyuan Wu et.al. | 2404.12957 | null |
2024-04-19 | Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models | Konstantinos Vilouras et.al. | 2404.12920 | null |
2024-04-19 | Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models | Zhenyang Ni et.al. | 2404.12916 | link |
2024-04-19 | Large Language Models for Networking: Workflow, Advances and Challenges | Chang Liu et.al. | 2404.12901 | null |
2024-04-19 | Enabling Natural Zero-Shot Prompting on Encoder Models via Statement-Tuning | Ahmed Elshabrawy et.al. | 2404.12897 | null |
2024-04-19 | Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation | Guanhua Chen et.al. | 2404.12879 | null |
2024-04-19 | LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency | Zhaodonghui Li et.al. | 2404.12872 | link |
2024-04-19 | How Does the Textual Information Affect the Retrieval of Multimodal In-Context Learning? | Yang Luo et.al. | 2404.12866 | null |
2024-04-19 | Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation | Yilong Chen et.al. | 2404.12861 | null |
2024-04-19 | TartuNLP @ SIGTYP 2024 Shared Task: Adapting XLM-RoBERTa for Ancient and Historical Languages | Aleksei Dorkin et.al. | 2404.12845 | null |
2024-04-18 | BLINK: Multimodal Large Language Models Can See but Not Perceive | Xingyu Fu et.al. | 2404.12390 | null |
2024-04-18 | Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | Aitor Ormazabal et.al. | 2404.12387 | null |
2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372 | null |
2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365 | link |
2024-04-18 | From |
Rafael Rafailov et.al. | 2404.12358 | null |
2024-04-18 | Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation | Jingmin Sun et.al. | 2404.12355 | link |
2024-04-18 | V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning | Hang Hua et.al. | 2404.12353 | null |
2024-04-18 | Evaluating AI for Law: Bridging the Gap with Open-Source Solutions | Rohan Bhambhoria et.al. | 2404.12349 | null |
2024-04-18 | Large Language Models in Targeted Sentiment Analysis | Nicolay Rusnachenko et.al. | 2404.12342 | link |
2024-04-18 | Normative Requirements Operationalization with Large Language Models | Nick Feng et.al. | 2404.12335 | null |
2024-04-18 | Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment | Zhaofeng Wu et.al. | 2404.12318 | null |
2024-04-18 | Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems | Jiangbo Yu et.al. | 2404.12317 | null |
2024-04-18 | Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Yusuke Sakai et.al. | 2404.12299 | null |
2024-04-18 | Augmenting emotion features in irony detection with Large language modeling | Yucheng Lin et.al. | 2404.12291 | null |
2024-04-18 | Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery | Yona Falinie A. Gaus et.al. | 2404.12285 | null |
2024-04-18 | Enhancing Embedding Performance through Large Language Model-based Text Enrichment and Rewriting | Nicholas Harris et.al. | 2404.12283 | null |
2024-04-18 | Advancing the Robustness of Large Language Models through Self-Denoised Smoothing | Jiabao Ji et.al. | 2404.12274 | link |
2024-04-18 | FedEval-LLM: Federated Evaluation of Large Language Models on Downstream Tasks with Collective Wisdom | Yuanqin He et.al. | 2404.12273 | null |
2024-04-18 | Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences | Shreya Shankar et.al. | 2404.12272 | null |
2024-04-18 | Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM | Michelle S. Lam et.al. | 2404.12259 | link |
2024-04-17 | Private federated discovery of out-of-vocabulary words for Gboard | Ziteng Sun et.al. | 2404.11607 | null |
2024-04-17 | VG4D: Vision-Language Model Goes 4D Video Recognition | Zhichao Deng et.al. | 2404.11605 | link |
2024-04-17 | A Deep Dive into Large Language Models for Automated Bug Localization and Repair | Soneya Binta Hossain et.al. | 2404.11595 | null |
2024-04-17 | Prompt Optimizer of Text-to-Image Diffusion Models for Abstract Concept Understanding | Zezhong Fan et.al. | 2404.11589 | null |
2024-04-17 | LLMTune: Accelerate Database Knob Tuning with Large Language Models | Xinmei Huang et.al. | 2404.11581 | link |
2024-04-17 | On the Scalability of GNNs for Molecular Graphs | Maciej Sypetkowski et.al. | 2404.11568 | null |
2024-04-17 | MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | Kuan-Chieh et.al. | 2404.11565 | null |
2024-04-17 | Quantifying Multilingual Performance of Large Language Models Across Languages | Zihao Li et.al. | 2404.11553 | null |
2024-04-17 | Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis | Soyoung Yang et.al. | 2404.11539 | null |
2024-04-17 | FedPFT: Federated Proxy Fine-Tuning of Foundation Models | Zhaopeng Peng et.al. | 2404.11536 | link |
2024-04-17 | Select and Reorder: A Novel Approach for Neural Sign Language Production | Harry Walsh et.al. | 2404.11532 | null |
2024-04-17 | Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization | Costas Mavromatis et.al. | 2404.11531 | link |
2024-04-17 | Embedding Privacy in Computational Social Science and Artificial Intelligence Research | Keenan Jones et.al. | 2404.11515 | null |
2024-04-17 | Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models | Yushuo Chen et.al. | 2404.11502 | link |
2024-04-17 | Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Yue Zhou et.al. | 2404.11500 | link |
2024-04-18 | Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent | Wei Chen et.al. | 2404.11459 | null |
2024-04-17 | Unifying Bias and Unfairness in Information Retrieval: A Survey of Challenges and Opportunities with Large Language Models | Sunhao Dai et.al. | 2404.11457 | link |
2024-04-17 | AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts | Meng Jiang et.al. | 2404.11449 | null |
2024-04-17 | Open-Ended Wargames with Large Language Models | Daniel P. Hogan et.al. | 2404.11446 | link |
2024-04-17 | DUPE: Detection Undermining via Prompt Engineering for Deepfake Text | James Weichert et.al. | 2404.11408 | null |
2024-04-16 | Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback | Qiwei Di et.al. | 2404.10776 | null |
2024-04-16 | COMBO: Compositional World Models for Embodied Multi-Agent Cooperation | Hongxin Zhang et.al. | 2404.10775 | null |
2024-04-16 | Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Yu-Yang Li et.al. | 2404.10757 | link |
2024-04-16 | Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | Shusheng Xu et.al. | 2404.10719 | null |
2024-04-16 | Dual Modalities of Text: Visual and Textual Generative Pre-training | Yekun Chai et.al. | 2404.10710 | null |
2024-04-16 | Question Difficulty Ranking for Multiple-Choice Reading Comprehension | Vatsal Raina et.al. | 2404.10704 | null |
2024-04-16 | An empirical study on code review activity prediction in practice | Doriane Olewicki et.al. | 2404.10703 | null |
2024-04-16 | Automating REST API Postman Test Cases Using LLM | S Deepika Sri et.al. | 2404.10678 | null |
2024-04-16 | Self-playing Adversarial Language Game Enhances LLM Reasoning | Pengyu Cheng et.al. | 2404.10642 | link |
2024-04-16 | HLAT: High-quality Large Language Model Pre-trained on AWS Trainium | Haozheng Fan et.al. | 2404.10630 | null |
2024-04-16 | Private Attribute Inference from Images with Vision-Language Models | Batuhan Tömekçe et.al. | 2404.10618 | null |
2024-04-16 | Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases | Yanze Li et.al. | 2404.10595 | null |
2024-04-16 | Construction of Domain-specified Japanese Large Language Model for Finance through Continual Pre-training | Masanori Hirano et.al. | 2404.10555 | null |
2024-04-16 | Unveiling the Misuse Potential of Base Large Language Models via In-Context Learning | Xiao Wang et.al. | 2404.10552 | null |
2024-04-16 | Capturing the Macroscopic Behaviour of Molecular Dynamics with Membership Functions | Alexander Sikorski et.al. | 2404.10523 | link |
2024-04-16 | CoTAR: Chain-of-Thought Attribution Reasoning with Multi-level Granularity | Moshe Berchansky et.al. | 2404.10513 | null |
2024-04-16 | White Men Lead, Black Women Help: Uncovering Gender, Racial, and Intersectional Bias in Language Agency | Yixin Wan et.al. | 2404.10508 | null |
2024-04-16 | Self-Supervised Visual Preference Alignment | Ke Zhu et.al. | 2404.10501 | link |
2024-04-16 | When Emotional Stimuli meet Prompt Designing: An Auto-Prompt Graphical Paradigm | Chenggian Ma et.al. | 2404.10500 | null |
2024-04-16 | Spiral of Silences: How is Large Language Model Killing Information Retrieval? -- A Case Study on Open Domain Question Answering | Xiaoyang Chen et.al. | 2404.10496 | link |
2024-04-15 | KG-CTG: Citation Generation through Knowledge Graph-guided Large Language Models | Avinash Anand et.al. | 2404.09763 | null |
2024-04-15 | Resilience of Large Language Models for Noisy Instructions | Bin Wang et.al. | 2404.09754 | null |
2024-04-15 | Personalized Collaborative Fine-Tuning for On-Device Large Language Models | Nicolas Wagner et.al. | 2404.09753 | link |
2024-04-15 | AMPCliff: quantitative definition and benchmarking of activity cliffs in antimicrobial peptides | Kewei Li et.al. | 2404.09738 | link |
2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737 | null |
2024-04-15 | Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models | Ziwei Luo et.al. | 2404.09732 | link |
2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717 | null |
2024-04-15 | Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction | David Sobrín-Hidalgo et.al. | 2404.09705 | null |
2024-04-15 | Generative AI for Game Theory-based Mobile Networking | Long He et.al. | 2404.09699 | null |
2024-04-15 | Are Large Language Models Reliable Argument Quality Annotators? | Nailia Mirzakhmedova et.al. | 2404.09696 | link |
2024-04-15 | LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models | Guangyan Li et.al. | 2404.09695 | null |
2024-04-15 | Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation | Juhwan Choi et.al. | 2404.09682 | null |
2024-04-15 | Learn Your Reference Model for Real Good Alignment | Alexey Gorbatovski et.al. | 2404.09656 | null |
2024-04-15 | Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection | Jiaqi Zhu et.al. | 2404.09654 | null |
2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632 | link |
2024-04-15 | AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception | Yipo Huang et.al. | 2404.09624 | link |
2024-04-15 | UNIAA: A Unified Multi-modal Image Aesthetic Assessment Baseline and Benchmark | Zhaokun Zhou et.al. | 2404.09619 | null |
2024-04-15 | A Self-feedback Knowledge Elicitation Approach for Chemical Reaction Predictions | Pengfei Liu et.al. | 2404.09606 | link |
2024-04-15 | Improving Recall of Large Language Models: A Model Collaboration Approach for Relational Triple Extraction | Zepeng Ding et.al. | 2404.09593 | null |
2024-04-15 | Modelling Language | Jumbly Grindrod et.al. | 2404.09579 | null |
2024-04-15 | Transformers, Contextualism, and Polysemy | Jumbly Grindrod et.al. | 2404.09577 | null |
2024-04-15 | Large language models and linguistic intentionality | Jumbly Grindrod et.al. | 2404.09576 | null |
2024-04-12 | Probing the 3D Awareness of Visual Foundation Models | Mohamed El Banani et.al. | 2404.08636 | link |
2024-04-12 | Pre-training Small Base LMs with Fewer Tokens | Sunny Sanyal et.al. | 2404.08634 | link |
2024-04-12 | FCert: Certifiably Robust Few-Shot Classification in the Era of Foundation Models | Yanting Wang et.al. | 2404.08631 | link |
2024-04-12 | Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation | Yanhao Zheng et.al. | 2404.08603 | link |
2024-04-12 | Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Övgü Özdemir et.al. | 2404.08589 | link |
2024-04-12 | Pathological Primitive Segmentation Based on Visual Foundation Model with Zero-Shot Mask Generation | Abu Bakor Hayat Arnob et.al. | 2404.08584 | link |
2024-04-12 | FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation | Riza Velioglu et.al. | 2404.08582 | link |
2024-04-12 | Lossy Image Compression with Foundation Diffusion Models | Lucas Relic et.al. | 2404.08580 | null |
2024-04-12 | Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation | Hanlin Tian et.al. | 2404.08570 | link |
2024-04-12 | RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs | Shreyas Chaudhari et.al. | 2404.08555 | null |
2024-04-12 | Memory Traces: Are Transformers Tulving Machines? | Jean-Marie Chauvet et.al. | 2404.08543 | null |
2024-04-12 | Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward | Xuan Xie et.al. | 2404.08517 | null |
2024-04-12 | ChatGPT and general-purpose AI count fruits in pictures surprisingly well | Konlavach Mengsuwan et.al. | 2404.08515 | null |
2024-04-12 | Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | Haoran Qiu et.al. | 2404.08509 | link |
2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506 | link |
2024-04-12 | Strategic Interactions between Large Language Models-based Agents in Beauty Contests | Siting Lu et.al. | 2404.08492 | null |
2024-04-12 | Mitigating Language-Level Performance Disparity in mPLMs via Teacher Language Selection and Cross-lingual Self-Distillation | Haozhe Zhao et.al. | 2404.08491 | link |
2024-04-12 | Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian | Stefano De Paoli et.al. | 2404.08488 | null |
2024-04-12 | Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task | Hassan Ali et.al. | 2404.08424 | null |
2024-04-12 | Adapting the Segment Anything Model During Usage in Novel Situations | Robin Schön et.al. | 2404.08421 | null |
2024-04-11 | OpenBias: Open-set Bias Detection in Text-to-Image Generative Models | Moreno D'Incà et.al. | 2404.07990 | link |
2024-04-11 | Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding | Yiwen Tang et.al. | 2404.07989 | link |
2024-04-11 | Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Representation Learning | Simon Schrodi et.al. | 2404.07983 | null |
2024-04-11 | Language Imbalance Can Boost Cross-lingual Generalisation | Anton Schäfer et.al. | 2404.07982 | link |
2024-04-11 | Manipulating Large Language Models to Increase Product Visibility | Aounon Kumar et.al. | 2404.07981 | link |
2024-04-11 | LLoCO: Learning Long Contexts Offline | Sijun Tan et.al. | 2404.07979 | link |
2024-04-11 | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Haotian Zhang et.al. | 2404.07973 | null |
2024-04-11 | Rho-1: Not All Tokens Are What You Need | Zhenghao Lin et.al. | 2404.07965 | link |
2024-04-11 | On Unified Prompt Tuning for Request Quality Assurance in Public Code Review | Xinyu Chen et.al. | 2404.07942 | null |
2024-04-11 | Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation | Jinkyung Park et.al. | 2404.07926 | null |
2024-04-11 | LaVy: Vietnamese Multimodal Large Language Model | Chi Tran et.al. | 2404.07922 | link |
2024-04-11 | AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs | Zeyi Liao et.al. | 2404.07921 | link |
2024-04-11 | DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation | Anna C. Doris et.al. | 2404.07917 | link |
2024-04-11 | HGRN2: Gated Linear RNNs with State Expansion | Zhen Qin et.al. | 2404.07904 | link |
2024-04-11 | High-Dimension Human Value Representation in Large Language Models | Samuel Cahyawijaya et.al. | 2404.07900 | link |
2024-04-11 | Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations | Dayeon Ki et.al. | 2404.07851 | link |
2024-04-11 | On Training Data Influence of GPT Models | Qingyi Liu et.al. | 2404.07840 | link |
2024-04-11 | RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | Aleksandar Botev et.al. | 2404.07839 | link |
2024-04-11 | Streamlined Photoacoustic Image Processing with Foundation Models: A Training-Free Solution | Handi Deng et.al. | 2404.07833 | null |
2024-04-11 | Heron-Bench: A Benchmark for Evaluating Vision Language Models in Japanese | Yuichi Inoue et.al. | 2404.07824 | link |
2024-04-10 | BRAVE: Broadening the visual encoding of vision-language models | Oğuzhan Fatih Kar et.al. | 2404.07204 | null |
2024-04-10 | UMBRAE: Unified Multimodal Decoding of Brain Signals | Weihao Xia et.al. | 2404.07202 | link |
2024-04-10 | Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic | Sachin Goyal et.al. | 2404.07177 | link |
2024-04-10 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Tsendsuren Munkhdalai et.al. | 2404.07143 | null |
2024-04-10 | Open reaction-diffusion systems: bridging probabilistic theory across scales | Mauricio J. del Razo et.al. | 2404.07119 | null |
2024-04-10 | Continuous Language Model Interpolation for Dynamic and Controllable Text Generation | Sara Kangaslahti et.al. | 2404.07117 | link |
2024-04-11 | From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications | Yongqiang Ma et.al. | 2404.07108 | null |
2024-04-10 | Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Bowen Jin et.al. | 2404.07103 | link |
2024-04-10 | Dynamic Generation of Personalities with Large Language Models | Jianzhi Liu et.al. | 2404.07084 | link |
2024-04-10 | VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning | Alexandros Xenos et.al. | 2404.07078 | link |
2024-04-10 | Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? | Mingyu Jin et.al. | 2404.07066 | link |
2024-04-10 | Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study | Alessandro Stolfo et.al. | 2404.07060 | null |
2024-04-10 | Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation | Elisa Sanchez-Bayona et.al. | 2404.07053 | link |
2024-04-10 | ORacle: Large Vision-Language Models for Knowledge-Guided Holistic OR Domain Modeling | Ege Özsoy et.al. | 2404.07031 | null |
2024-04-10 | Improving Language Model Reasoning with Self-motivated Learning | Yunlong Feng et.al. | 2404.07017 | null |
2024-04-10 | A Mathematical Theory for Learning Semantic Languages by Abstract Learners | Kuo-Yu Liao et.al. | 2404.07009 | null |
2024-04-10 | WordDecipher: Enhancing Digital Workspace Communication with Explainable AI for Non-native English Speakers | Yuexi Chen et.al. | 2404.07005 | null |
2024-04-10 | LM Transparency Tool: Interactive Tool for Analyzing Transformer Language Models | Igor Tufanov et.al. | 2404.07004 | null |
2024-04-10 | Event Grounded Criminal Court View Generation withCooperative (Large) Language Models | Linan Yue et.al. | 2404.07001 | link |
2024-04-10 | Advancing Real-time Pandemic Forecasting Using Large Language Models: A COVID-19 Case Study | Hongru Du et.al. | 2404.06962 | link |
2024-04-09 | InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD | Xiaoyi Dong et.al. | 2404.06512 | link |
2024-04-09 | Can Feedback Enhance Semantic Grounding in Large Vision-Language Models? | Yuan-Hong Liao et.al. | 2404.06510 | null |
2024-04-09 | On the Effect of (Near) Duplicate Subwords in Language Modelling | Anton Schäfer et.al. | 2404.06508 | link |
2024-04-09 | Pitfalls of Conversational LLMs on News Debiasing | Ipek Baris Schlicht et.al. | 2404.06488 | null |
2024-04-10 | Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks | Chonghua Wang et.al. | 2404.06480 | link |
2024-04-10 | Text-Based Reasoning About Vector Graphics | Zhenhailong Wang et.al. | 2404.06479 | null |
2024-04-09 | Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models | Zihan Fang et.al. | 2404.06448 | null |
2024-04-09 | Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems | Kunal Garg et.al. | 2404.06413 | null |
2024-04-09 | AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Luca Gioacchini et.al. | 2404.06411 | link |
2024-04-09 | Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak | Hongyu Cai et.al. | 2404.06407 | link |
2024-04-09 | Apprentices to Research Assistants: Advancing Research with Large Language Models | M. Namvarpour et.al. | 2404.06404 | null |
2024-04-09 | MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | Shengding Hu et.al. | 2404.06395 | link |
2024-04-09 | MuPT: A Generative Symbolic Music Pretrained Transformer | Xingwei Qu et.al. | 2404.06393 | null |
2024-04-09 | Event Extraction in Basque: Typologically motivated Cross-Lingual Transfer-Learning Analysis | Mikel Zubillaga et.al. | 2404.06392 | null |
2024-04-09 | Latent Distance Guided Alignment Training for Large Language Models | Haotian Luo et.al. | 2404.06390 | null |
2024-04-09 | Model Generation from Requirements with LLMs: an Exploratory Study | Alessio Ferrari et.al. | 2404.06371 | null |
2024-04-09 | Enhancing Decision Analysis with a Large Language Model: pyDecision a Comprehensive Library of MCDA Methods in Python | Valdecy Pereira et.al. | 2404.06370 | link |
2024-04-09 | VISION2UI: A Real-World Dataset with Layout for Code Generation from UI Designs | Yi Gui et.al. | 2404.06369 | null |
2024-04-09 | ClinLinker: Medical Entity Linking of Clinical Concept Mentions in Spanish | Fernando Gallego et.al. | 2404.06367 | null |
2024-04-09 | Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation | Sidra Aleem et.al. | 2404.06362 | link |
2024-04-08 | MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Bo He et.al. | 2404.05726 | link |
2024-04-08 | Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | Keen You et.al. | 2404.05719 | null |
2024-04-08 | Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding | Ahmad Idrissi-Yaghir et.al. | 2404.05694 | null |
2024-04-08 | Evaluating Mathematical Reasoning Beyond Accuracy | Shijie Xia et.al. | 2404.05692 | link |
2024-04-08 | Retrieval-Augmented Open-Vocabulary Object Detection | Jooyeon Kim et.al. | 2404.05687 | link |
2024-04-08 | MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | Kunpeng Song et.al. | 2404.05674 | link |
2024-04-08 | CoReS: Orchestrating the Dance of Reasoning and Segmentation | Xiaoyi Bao et.al. | 2404.05673 | null |
2024-04-08 | Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data | Haitham Hammami et.al. | 2404.05632 | link |
2024-04-08 | LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking | Faren Yan et.al. | 2404.05624 | null |
2024-04-08 | MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning | Matteo Farina et.al. | 2404.05621 | link |
2024-04-08 | SpeechAlign: Aligning Speech Generation to Human Preferences | Dong Zhang et.al. | 2404.05600 | link |
2024-04-08 | MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Iñigo Alonso et.al. | 2404.05590 | null |
2024-04-08 | Enhancing Software Related Information Extraction with Generative Language Models through Single-Choice Question Answering | Wolfgang Otto et.al. | 2404.05587 | null |
2024-04-08 | Towards More General Video-based Deepfake Detection through Facial Feature Guided Adaptation for Foundation Model | Yue-Hua Han et.al. | 2404.05583 | null |
2024-04-08 | 360°REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System | Shen Gao et.al. | 2404.05569 | null |
2024-04-08 | Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models | Bowen Pan et.al. | 2404.05567 | null |
2024-04-08 | Chinese Sequence Labeling with Semi-Supervised Boundary-Aware Language Model Pre-training | Longhui Zhang et.al. | 2404.05560 | link |
2024-04-08 | Evaluating Interventional Reasoning Capabilities of Large Language Models | Tejas Kasetty et.al. | 2404.05545 | null |
2024-04-08 | OPSD: an Offensive Persian Social media Dataset and its baseline evaluations | Mehran Safayani et.al. | 2404.05540 | null |
2024-04-08 | Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data | Tim Baumgärtner et.al. | 2404.05530 | null |
2024-04-05 | Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) | Michael Saxon et.al. | 2404.04251 | link |
2024-04-05 | Physical Property Understanding from Language-Embedded Feature Fields | Albert J. Zhai et.al. | 2404.04242 | null |
2024-04-05 | Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Harsh Kohli et.al. | 2404.04237 | null |
2024-04-05 | player2vec: A Language Modeling Approach to Understand Player Behavior in Games | Tianze Wang et.al. | 2404.04234 | null |
2024-04-05 | Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation | Ji-Jia Wu et.al. | 2404.04231 | link |
2024-04-05 | Unlocking Parameter-Efficient Fine-Tuning for Low-Resource Language Translation | Tong Su et.al. | 2404.04212 | null |
2024-04-05 | Social Skill Training with Large Language Models | Diyi Yang et.al. | 2404.04204 | null |
2024-04-05 | Do Sentence Transformers Learn Quasi-Geospatial Concepts from General Text? | Ilya Ilyankou et.al. | 2404.04169 | null |
2024-04-05 | Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | Xinrun Du et.al. | 2404.04167 | null |
2024-04-05 | Dwell in the Beginning: How Language Models Embed Long Documents for Dense Retrieval | João Coelho et.al. | 2404.04163 | null |
2024-04-05 | BEAR: A Unified Framework for Evaluating Relational Knowledge in Causal and Masked Language Models | Jacek Wiland et.al. | 2404.04113 | link |
2024-04-05 | Large language models as oracles for instantiating ontologies with domain-specific knowledge | Giovanni Ciatto et.al. | 2404.04108 | link |
2024-04-05 | Robust Preference Optimization with Provable Noise Tolerance for LLMs | Xize Liang et.al. | 2404.04102 | null |
2024-04-05 | Label Propagation for Zero-shot Classification with Vision-Language Models | Vladan Stojnić et.al. | 2404.04072 | link |
2024-04-05 | Assessing the quality of information extraction | Filip Seitl et.al. | 2404.04068 | null |
2024-04-05 | CLUE: A Clinical Language Understanding Evaluation for LLMs | Amin Dada et.al. | 2404.04067 | link |
2024-04-05 | VoicePilot: Harnessing LLMs as Speech Interfaces for Physically Assistive Robots | Akhil Padmanabha et.al. | 2404.04066 | null |
2024-04-05 | A Comparison of Methods for Evaluating Generative IR | Negar Arabzadeh et.al. | 2404.04044 | link |
2024-04-05 | Teaching Llama a New Language Through Cross-Lingual Knowledge Transfer | Hele-Andra Kuulmets et.al. | 2404.04042 | link |
2024-04-05 | Willkommens-Merkel, Chaos-Johnson, and Tore-Klose: Modeling the Evaluative Meaning of German Personal Name Compounds | Annerose Eichel et.al. | 2404.04031 | link |
2024-04-04 | OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views | Francis Engelmann et.al. | 2404.03650 | null |
2024-04-04 | AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | Hanyu Lai et.al. | 2404.03648 | link |
2024-04-04 | Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra | Darioush Kevian et.al. | 2404.03647 | null |
2024-04-04 | Locating and Editing Factual Associations in Mamba | Arnab Sen Sharma et.al. | 2404.03646 | link |
2024-04-04 | Training LLMs over Neurally Compressed Text | Brian Lester et.al. | 2404.03626 | null |
2024-04-04 | Standardizing Knowledge Engineering Practices with a Reference Architecture | Bradley P. Allen et.al. | 2404.03624 | null |
2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623 | null |
2024-04-04 | Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Wenshan Wu et.al. | 2404.03622 | null |
2024-04-04 | DeViDe: Faceted medical knowledge for improved medical vision-language pre-training | Haozhe Luo et.al. | 2404.03618 | null |
2024-04-04 | Sailor: Open Language Models for South-East Asia | Longxu Dou et.al. | 2404.03608 | link |
2024-04-04 | Mitigating the Impact of Outlier Channels for Language Model Quantization with Activation Regularization | Aniruddha Nrusimha et.al. | 2404.03605 | link |
2024-04-04 | Evaluating LLMs at Detecting Errors in LLM Responses | Ryo Kamoi et.al. | 2404.03602 | link |
2024-04-04 | Intent Detection and Entity Extraction from BioMedical Literature | Ankan Mullick et.al. | 2404.03598 | link |
2024-04-04 | ReFT: Representation Finetuning for Language Models | Zhengxuan Wu et.al. | 2404.03592 | link |
2024-04-04 | SemGrasp: Semantic Grasp Generation via Language Aligned Discretization | Kailin Li et.al. | 2404.03590 | null |
2024-04-04 | Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models | Yantao Liu et.al. | 2404.03577 | link |
2024-04-04 | Embodied AI with Two Arms: Zero-shot Learning, Safety and Modularity | Jake Varley et.al. | 2404.03570 | null |
2024-04-04 | Personalized LLM Response Generation with Parameterized Memory Injection | Kai Zhang et.al. | 2404.03565 | null |
2024-04-04 | Select and Summarize: Scene Saliency for Movie Script Summarization | Rohit Saxena et.al. | 2404.03561 | link |
2024-04-04 | How does Multi-Task Training Affect Transformer In-Context Capabilities? Investigations with Function Classes | Harmon Bhasin et.al. | 2404.03558 | link |
2024-04-03 | ALOHa: A New Measure for Hallucination in Captioning Models | Suzanne Petryk et.al. | 2404.02904 | null |
2024-04-03 | MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment | Duygu Ceylan et.al. | 2404.02899 | null |
2024-04-03 | ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Yifan Xu et.al. | 2404.02893 | link |
2024-04-03 | MODNO: Multi Operator Learning With Distributed Neural Operators | Zecheng Zhang et.al. | 2404.02892 | null |
2024-04-03 | Linear Attention Sequence Parallelism | Weigao Sun et.al. | 2404.02882 | link |
2024-04-03 | Integrating Explanations in Learning LTL Specifications from Demonstrations | Ashutosh Gupta et.al. | 2404.02872 | null |
2024-04-03 | Toward Inference-optimal Mixture-of-Expert Large Language Models | Longfei Yun et.al. | 2404.02852 | null |
2024-04-03 | I-Design: Personalized LLM Interior Designer | Ata Çelen et.al. | 2404.02838 | null |
2024-04-03 | Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models | Wanyun Cui et.al. | 2404.02837 | null |
2024-04-03 | Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison | Maxime Bouthors et.al. | 2404.02835 | null |
2024-04-03 | Empowering Biomedical Discovery with AI Agents | Shanghua Gao et.al. | 2404.02831 | null |
2024-04-03 | BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models | Qijun Luo et.al. | 2404.02827 | link |
2024-04-03 | Conifer: Improving Complex Constrained Instruction-Following Ability of Large Language Models | Haoran Sun et.al. | 2404.02823 | link |
2024-04-03 | A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches | Zhigen Zhao et.al. | 2404.02817 | null |
2024-04-03 | The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers | Hussein Mozannar et.al. | 2404.02806 | link |
2024-04-03 | Efficient Multi-Vector Dense Retrieval Using Bit Vectors | Franco Maria Nardini et.al. | 2404.02805 | link |
2024-04-03 | AI and personalized learning: bridging the gap with modern educational goals | Kristjan-Julius Laak et.al. | 2404.02798 | null |
2024-04-03 | CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech | Jaehyeon Kim et.al. | 2404.02781 | null |
2024-04-03 | FPT: Feature Prompt Tuning for Few-shot Readability Assessment | Ziyang Wang et.al. | 2404.02772 | link |
2024-04-03 | DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement | Hao Wu et.al. | 2404.02755 | null |
2024-04-02 | Segment Any 3D Object with Language | Seungjun Lee et.al. | 2404.02157 | null |
2024-04-02 | Iterated Learning Improves Compositionality in Large Vision-Language Models | Chenhao Zheng et.al. | 2404.02145 | null |
2024-04-02 | Topic-based Watermarks for LLM-Generated Text | Alexander Nemecek et.al. | 2404.02138 | null |
2024-04-02 | ViTamin: Designing Scalable Vision Models in the Vision-Language Era | Jienneg Chen et.al. | 2404.02132 | link |
2024-04-02 | FLawN-T5: An Empirical Examination of Effective Instruction-Tuning Data Mixtures for Legal Reasoning | Joel Niklaus et.al. | 2404.02127 | link |
2024-04-02 | Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Wanyong Feng et.al. | 2404.02124 | link |
2024-04-02 | GINopic: Topic Modeling with Graph Isomorphism Network | Suman Adhya et.al. | 2404.02115 | link |
2024-04-02 | CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems | Sara Rosenthal et.al. | 2404.02103 | link |
2024-04-02 | Advancing LLM Reasoning Generalists with Preference Trees | Lifan Yuan et.al. | 2404.02078 | link |
2024-04-02 | Red-Teaming Segment Anything Model | Krzysztof Jankowski et.al. | 2404.02067 | link |
2024-04-02 | Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | Alberto Blanco-Justicia et.al. | 2404.02062 | null |
2024-04-02 | Long-context LLMs Struggle with Long In-context Learning | Tianle Li et.al. | 2404.02060 | link |
2024-04-02 | IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT | Junchen Fu et.al. | 2404.02059 | link |
2024-04-02 | Deconstructing In-Context Learning: Understanding Prompts via Corruption | Namrata Shivagunde et.al. | 2404.02054 | link |
2024-04-02 | A Survey on Large Language Model-Based Game Agents | Sihao Hu et.al. | 2404.02039 | link |
2024-04-02 | MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages | Daryna Dementieva et.al. | 2404.02037 | null |
2024-04-02 | Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts | Zhuo Chen et.al. | 2404.02022 | link |
2024-04-02 | Large Language Models for Orchestrating Bimanual Robots | Kun Chu et.al. | 2404.02018 | null |
2024-04-02 | MuxServe: Flexible Multiplexing for Efficient Multiple LLM Serving | Jiangfei Duan et.al. | 2404.02015 | link |
2024-04-02 | Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models | Stephan Linzbach et.al. | 2404.01992 | null |
2024-03-29 | Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | Atsuyuki Miyai et.al. | 2403.20331 | link |
2024-03-29 | Are We on the Right Way for Evaluating Large Vision-Language Models? | Lin Chen et.al. | 2403.20330 | link |
2024-03-29 | ReALM: Reference Resolution As Language Modeling | Joel Ruben Antony Moniz et.al. | 2403.20329 | null |
2024-03-29 | Gecko: Versatile Text Embeddings Distilled from Large Language Models | Jinhyuk Lee et.al. | 2403.20327 | null |
2024-03-29 | Convolutional Prompting meets Language Models for Continual Learning | Anurag Roy et.al. | 2403.20317 | null |
2024-03-29 | Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations | Jaisidh Singh et.al. | 2403.20312 | link |
2024-03-29 | Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference | Jovan Stojkovic et.al. | 2403.20306 | null |
2024-03-29 | Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain | Burcu Sayin et.al. | 2403.20288 | link |
2024-03-29 | LUQ: Long-text Uncertainty Quantification for LLMs | Caiqi Zhang et.al. | 2403.20279 | null |
2024-04-01 | Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want | Weifeng Lin et.al. | 2403.20271 | link |
2024-03-29 | Latxa: An Open Language Model and Evaluation Suite for Basque | Julen Etxaniz et.al. | 2403.20266 | link |
2024-03-29 | ELITR-Bench: A Meeting Assistant Benchmark for Long-Context Language Models | Thibaut Thonet et.al. | 2403.20262 | null |
2024-03-29 | MedCLIP-SAM: Bridging Text and Image Towards Universal Medical Image Segmentation | Taha Koleilat et.al. | 2403.20253 | link |
2024-03-29 | Using LLMs to Model the Beliefs and Preferences of Targeted Populations | Keiichi Namikoshi et.al. | 2403.20252 | null |
2024-03-29 | Long-Tailed Anomaly Detection with Learnable Class Names | Chih-Hui Ho et.al. | 2403.20236 | null |
2024-03-29 | H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model | Chao Pang et.al. | 2403.20213 | link |
2024-03-29 | Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science | Yazheng Yang et.al. | 2403.20208 | null |
2024-03-29 | The Future of Combating Rumors? Retrieval, Discrimination, and Generation | Junhao Xu et.al. | 2403.20204 | null |
2024-03-29 | ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models | Shuo Liu et.al. | 2403.20194 | null |
2024-03-29 | HARMamba: Efficient Wearable Sensor Human Activity Recognition Based on Bidirectional Selective SSM | Shuangjian Li et.al. | 2403.20183 | null |
2024-03-28 | RSMamba: Remote Sensing Image Classification with State Space Model | Keyan Chen et.al. | 2403.19654 | link |
2024-03-28 | InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction | Sirui Xu et.al. | 2403.19652 | null |
2024-03-28 | MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions | Kai Zhang et.al. | 2403.19651 | null |
2024-03-28 | Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models | Samuel Marks et.al. | 2403.19647 | link |
2024-03-28 | Change-Agent: Towards Interactive Comprehensive Change Interpretation and Analysis from Change Detection and Change Captioning | Chenyang Liu et.al. | 2403.19646 | link |
2024-03-28 | Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models | Yucheng Shi et.al. | 2403.19631 | null |
2024-03-28 | RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents | Zeren Chen et.al. | 2403.19622 | null |
2024-03-28 | SAID-NeRF: Segmentation-AIDed NeRF for Depth Completion of Transparent Objects | Avinash Ummadisingu et.al. | 2403.19607 | null |
2024-03-28 | Img2Loc: Revisiting Image Geolocalization using Multi-modality Foundation Models and Image-based Retrieval-Augmented Generation | Zhongliang Zhou et.al. | 2403.19584 | link |
2024-03-28 | Keypoint Action Tokens Enable In-Context Imitation Learning in Robotics | Norman Di Palo et.al. | 2403.19578 | null |
2024-03-28 | WaterJudge: Quality-Detection Trade-off when Watermarking Large Language Models | Piotr Molenda et.al. | 2403.19548 | null |
2024-03-28 | Interpreting Key Mechanisms of Factual Recall in Transformer-Based Language Models | Ang Lv et.al. | 2403.19521 | link |
2024-03-28 | Improving Clinical NLP Performance through Language Model-Generated Synthetic Clinical Data | Shan Chen et.al. | 2403.19511 | link |
2024-03-28 | LLMs as Academic Reading Companions: Extending HCI Through Synthetic Personae | Celia Chen et.al. | 2403.19506 | null |
2024-03-28 | Evolving Assembly Code in an Adversarial Environment | Irina Maliukov et.al. | 2403.19489 | link |
2024-03-28 | JDocQA: Japanese Document Question Answering Dataset for Generative Language Models | Eri Onami et.al. | 2403.19454 | link |
2024-03-28 | Mixed Preference Optimization: Reinforcement Learning with Data Selection and Better Reference Model | Qi Gou et.al. | 2403.19443 | null |
2024-03-28 | OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion | Xinyu Zhan et.al. | 2403.19417 | null |
2024-03-28 | BP4ER: Bootstrap Prompting for Explicit Reasoning in Medical Dialogue Generation | Yuhong He et.al. | 2403.19414 | null |
2024-03-28 | Checkpoint Merging via Bayesian Optimization in LLM Pretraining | Deyuan Liu et.al. | 2403.19390 | null |
2024-03-27 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | Yanwei Li et.al. | 2403.18814 | link |
2024-03-27 | ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation | Suraj Patni et.al. | 2403.18807 | link |
2024-03-27 | Is Modularity Transferable? A Case Study through the Lens of Knowledge Distillation | Mateusz Klimaszewski et.al. | 2403.18804 | link |
2024-03-27 | Projective Methods for Mitigating Gender Bias in Pre-trained Language Models | Hillary Dawkins et.al. | 2403.18803 | link |
2024-03-27 | Long-form factuality in large language models | Jerry Wei et.al. | 2403.18802 | link |
2024-03-27 | Towards a World-English Language Model for On-Device Virtual Assistants | Rricha Jalota et.al. | 2403.18783 | null |
2024-03-27 | 3P-LLM: Probabilistic Path Planning using Large Language Model for Autonomous Robot Navigation | Ehsan Latif et.al. | 2403.18778 | null |
2024-03-27 | ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object | Chenshuang Zhang et.al. | 2403.18775 | link |
2024-03-27 | CheckEval: Robust Evaluation Framework using Large Language Model via Checklist | Yukyung Lee et.al. | 2403.18771 | null |
2024-03-27 | MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model | Yike Wu et.al. | 2403.18760 | link |
2024-03-27 | CYCLE: Learning to Self-Refine the Code Generation | Yangruibo Ding et.al. | 2403.18746 | link |
2024-03-27 | Understanding the Learning Dynamics of Alignment with Human Feedback | Shawn Im et.al. | 2403.18742 | link |
2024-03-27 | PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations | Ehsan Latif et.al. | 2403.18721 | null |
2024-03-27 | Mitigating Hallucinations in Large Vision-Language Models with Instruction Contrastive Decoding | Xintong Wang et.al. | 2403.18715 | null |
2024-03-27 | The Invalsi Benchmark: measuring Language Models Mathematical and Language understanding in Italian | Andrea Esuli et.al. | 2403.18697 | null |
2024-03-27 | NL-ITI: Optimizing Probing and Intervention for Improvement of ITI Method | Jakub Hoscilowicz et.al. | 2403.18680 | link |
2024-03-27 | An Exploratory Study on Upper-Level Computing Students' Use of Large Language Models as Tools in a Semester-Long Project | Ben Arie Tanay et.al. | 2403.18679 | null |
2024-03-27 | SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens | Chengbo Liu et.al. | 2403.18647 | link |
2024-03-27 | To Recommend or Not: Recommendability Identification in Conversations with Pre-trained Language Models | Zhefan Wang et.al. | 2403.18628 | link |
2024-03-27 | Vulnerability Detection with Code Language Models: How Far Are We? | Yangruibo Ding et.al. | 2403.18624 | link |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-26 | Track Everything Everywhere Fast and Robustly | Yunzhou Song et.al. | 2403.17931 | null |
2024-03-26 | MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution | Wei Tao et.al. | 2403.17927 | null |
2024-03-26 | LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Rui Pan et.al. | 2403.17919 | link |
2024-03-26 | Large scale paired antibody language models | Henry Kenlay et.al. | 2403.17889 | null |
2024-03-26 | Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation | Carlos Gomes et.al. | 2403.17886 | link |
2024-03-26 | MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation | Andreea Iana et.al. | 2403.17876 | link |
2024-03-26 | Addressing Social Misattributions of Large Language Models: An HCXAI-based Approach | Andrea Ferrario et.al. | 2403.17873 | null |
2024-03-26 | Exploring LLMs as a Source of Targeted Synthetic Textual Data to Minimize High Confidence Misclassifications | Philip Lippmann et.al. | 2403.17860 | null |
2024-03-26 | ChroniclingAmericaQA: A Large-scale Question Answering Dataset based on Historical American Newspaper Pages | Bhawna Piryani et.al. | 2403.17859 | link |
2024-03-26 | Verbing Weirds Language (Models): Evaluation of English Zero-Derivation in Five LLMs | David R. Mortensen et.al. | 2403.17856 | null |
2024-03-26 | ArabicaQA: A Comprehensive Dataset for Arabic Question Answering | Abdelrahman Abdallah et.al. | 2403.17848 | link |
2024-03-26 | Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation | Abdelrhman Werby et.al. | 2403.17846 | null |
2024-03-26 | Mechanistic Design and Scaling of Hybrid Architectures | Michael Poli et.al. | 2403.17844 | null |
2024-03-26 | ReMamber: Referring Image Segmentation with Mamba Twister | Yuhuan Yang et.al. | 2403.17839 | link |
2024-03-26 | A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities | Ibrahim Ethem Hamamci et.al. | 2403.17834 | link |
2024-03-26 | Assessment of Multimodal Large Language Models in Alignment with Human Values | Zhelun Shi et.al. | 2403.17830 | null |
2024-03-26 | Accelerating Radio Spectrum Regulation Workflows with Large Language Models (LLMs) | Amir Ghasemi et.al. | 2403.17819 | null |
2024-03-26 | Graph Language Model (GLM): A new graph-based approach to detect social instabilities | Wallyson Lemes de Oliveira et.al. | 2403.17816 | null |
2024-03-26 | Are Compressed Language Models Less Subgroup Robust? | Leonidas Gee et.al. | 2403.17811 | link |
2024-03-25 | Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making | Shuai Ma et.al. | 2403.16812 | null |
2024-03-25 | An LLM-Based Digital Twin for Optimizing Human-in-the Loop Systems | Hanqing Yang et.al. | 2403.16809 | link |
2024-03-25 | Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback | Zhangqian Bi et.al. | 2403.16792 | link |
2024-03-25 | All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification | Deepak Narayan Gadde et.al. | 2403.16750 | null |
2024-03-25 | A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models | Nils Ingelhag et.al. | 2403.16730 | null |
2024-03-25 | ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search | Zehan Li et.al. | 2403.16702 | link |
2024-03-25 | Synapse: Learning Preferential Concepts from Visual Demonstrations | Sadanand Modak et.al. | 2403.16689 | null |
2024-03-25 | Investigation of the effectiveness of applying ChatGPT in Dialogic Teaching Using Electroencephalography | Jiayue Zhang et.al. | 2403.16687 | null |
2024-03-25 | RU22Fact: Optimizing Evidence for Multilingual Explainable Fact-Checking on Russia-Ukraine Conflict | Yirong Zeng et.al. | 2403.16662 | link |
2024-03-25 | Grammatical vs Spelling Error Correction: An Investigation into the Responsiveness of Transformer-based Language Models using BART and MarianMT | Rohit Raju et.al. | 2403.16655 | null |
2024-03-25 | CLHA: A Simple yet Effective Contrastive Learning Framework for Human Alignment | Feiteng Fang et.al. | 2403.16649 | link |
2024-03-25 | Virtual Co-Pilot: Multimodal Large Language Model-enabled Quick-access Procedures for Single Pilot Operations | Fan Li et.al. | 2403.16645 | null |
2024-03-25 | Semantically Enriched Cross-Lingual Sentence Embeddings for Crisis-related Social Media Texts | Rabindra Lamsal et.al. | 2403.16614 | null |
2024-03-25 | Conversational Grounding: Annotation and Analysis of Grounding Acts and Grounding Units | Biswesh Mohapatra et.al. | 2403.16609 | null |
2024-03-25 | TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques | Ashok Urlana et.al. | 2403.16592 | null |
2024-03-25 | Can Large Language Models (or Humans) Distill Text? | Nicolas Audinet de Pieuchon et.al. | 2403.16584 | link |
2024-03-25 | NSINA: A News Corpus for Sinhala | Hansi Hettiarachchi et.al. | 2403.16571 | link |
2024-03-25 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558 | link |
2024-03-25 | DOrA: 3D Visual Grounding with Order-Aware Referring | Tung-Yu Wu et.al. | 2403.16539 | null |
2024-03-25 | Open-Set Recognition in the Age of Vision-Language Models | Dimity Miller et.al. | 2403.16528 | link |
2024-03-25 | Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art | Neeloy Chakraborty et.al. | 2403.16527 | null |
2024-03-25 | Harnessing the power of LLMs for normative reasoning in MASs | Bastin Tony Roy Savarimuthu et.al. | 2403.16524 | null |
2024-03-25 | Norm Violation Detection in Multi-Agent Systems using Large Language Models: A Pilot Study | Shawn He et.al. | 2403.16517 | null |
2024-03-25 | Linguistically Differentiating Acts and Recalls of Racial Microaggressions on Social Media | Uma Sushmitha Gunturi et.al. | 2403.16514 | null |
2024-03-22 | LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models | Yuzhang Shang et.al. | 2403.15388 | null |
2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378 | link |
2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | link |
2024-03-22 | Can large language models explore in-context? | Akshay Krishnamurthy et.al. | 2403.15371 | null |
2024-03-22 | CoLLEGe: Concept Embedding Generation for Large Language Models | Ryan Teehan et.al. | 2403.15362 | null |
2024-03-22 | Neural Plasticity-Inspired Foundation Model for Observing the Earth Crossing Modalities | Zhitong Xiong et.al. | 2403.15356 | link |
2024-03-22 | Controlled Training Data Generation with Diffusion Models | Teresa Yeo et.al. | 2403.15309 | null |
2024-03-22 | Sphere Neural-Networks for Rational Reasoning | Tiansi Dong et.al. | 2403.15297 | null |
2024-03-22 | Measuring Gender and Racial Biases in Large Language Models | Jiafu An et.al. | 2403.15281 | null |
2024-03-22 | Bioinformatics and Biomedical Informatics with ChatGPT: Year One Review | Jinge Wang et.al. | 2403.15274 | null |
2024-03-22 | Event Temporal Relation Extraction based on Retrieval-Augmented on LLMs | Xiaobin Zhang et.al. | 2403.15273 | null |
2024-03-22 | Imagination Augmented Generation: Learning to Imagine Richer Context for Question Answering over Large Language Models | Huanxuan Liao et.al. | 2403.15268 | link |
2024-03-22 | AI Exposure and Strategic Positioning on an Online Work Platform | Shun Yiu et.al. | 2403.15262 | null |
2024-03-22 | FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | Orion Weller et.al. | 2403.15246 | link |
2024-03-22 | Shadow Generation for Composite Image Using Diffusion model | Qingyang Liu et.al. | 2403.15234 | link |
2024-03-22 | An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets | Jonathan Katzy et.al. | 2403.15230 | link |
2024-03-22 | Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models | Qiong Wu et.al. | 2403.15226 | link |
2024-03-22 | Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations | Pranav Kulkarni et.al. | 2403.15218 | link |
2024-03-22 | InstaSynth: Opportunities and Challenges in Generating Synthetic Instagram Data with ChatGPT for Sponsored Content Detection | Thales Bertaglia et.al. | 2403.15214 | link |
2024-03-22 | MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection | Taeheon Kim et.al. | 2403.15209 | null |
2024-03-21 | MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | Renrui Zhang et.al. | 2403.14624 | null |
2024-03-21 | Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | Zeyu Han et.al. | 2403.14608 | null |
2024-03-21 | MyVLM: Personalizing VLMs for User-Specific Queries | Yuval Alaluf et.al. | 2403.14599 | null |
2024-03-21 | ReAct Meets ActRe: Autonomous Annotations of Agent Trajectories for Contrastive Self-Training | Zonghan Yang et.al. | 2403.14589 | null |
2024-03-21 | Large Language Models for Multi-Choice Question Classification of Medical Subjects | Víctor Ponce-López et.al. | 2403.14582 | null |
2024-03-21 | RAmBLA: A Framework for Evaluating the Reliability of LLMs as Assistants in the Biomedical Domain | William James Bolton et.al. | 2403.14578 | link |
2024-03-21 | A Chain-of-Thought Prompting Approach with LLMs for Evaluating Students' Formative Assessment Responses in Science | Clayton Cohn et.al. | 2403.14565 | null |
2024-03-21 | The Era of Semantic Decoding | Maxime Peyrard et.al. | 2403.14562 | null |
2024-03-21 | Lexicon-Level Contrastive Visual-Grounding Improves Language Modeling | Chengxu Zhuang et.al. | 2403.14551 | null |
2024-03-21 | EDT: Improving Large Language Models' Generation by Entropy-based Dynamic Temperature Sampling | Shimao Zhang et.al. | 2403.14541 | link |
2024-03-21 | Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | Han Zhao et.al. | 2403.14520 | link |
2024-03-21 | The Ethics of ChatGPT in Medicine and Healthcare: A Systematic Review on Large Language Models (LLMs) | Joschka Haltaufderheide et.al. | 2403.14473 | null |
2024-03-21 | Detoxifying Large Language Models via Knowledge Editing | Mengru Wang et.al. | 2403.14472 | link |
2024-03-21 | ChatGPT Alternative Solutions: Large Language Models Survey | Hanieh Alipour et.al. | 2403.14469 | null |
2024-03-21 | Recourse for reclamation: Chatting with generative language models | Jennifer Chien et.al. | 2403.14467 | null |
2024-03-21 | Towards Single-System Illusion in Software-Defined Vehicles -- Automated, AI-Powered Workflow | Krzysztof Lebioda et.al. | 2403.14460 | null |
2024-03-21 | Multi-Level Explanations for Generative Language Models | Lucas Monteiro Paes et.al. | 2403.14459 | null |
2024-03-21 | gTBLS: Generating Tables from Text by Conditional Question Answering | Anirudh Sundar et.al. | 2403.14457 | null |
2024-03-21 | Language Models Can Reduce Asymmetry in Information Markets | Nasim Rahaman et.al. | 2403.14443 | null |
2024-03-21 | A Multimodal Approach to Device-Directed Speech Detection with Large Language Models | Dominik Wager et.al. | 2403.14438 | null |
2024-03-20 | RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | Ziyu Liu et.al. | 2403.13805 | link |
2024-03-20 | Learning from Models and Data for Visual Grounding | Ruozhen He et.al. | 2403.13804 | null |
2024-03-20 | Reverse Training to Nurse the Reversal Curse | Olga Golovneva et.al. | 2403.13799 | null |
2024-03-20 | Bridge the Modality and Capacity Gaps in Vision-Language Model Selection | Chao Yi et.al. | 2403.13797 | null |
2024-03-20 | RewardBench: Evaluating Reward Models for Language Modeling | Nathan Lambert et.al. | 2403.13787 | link |
2024-03-20 | Chain-of-Interaction: Enhancing Large Language Models for Psychiatric Behavior Understanding by Dyadic Contexts | Guangzeng Han et.al. | 2403.13786 | link |
2024-03-20 | Information-Theoretic Distillation for Reference-less Summarization | Jaehun Jung et.al. | 2403.13780 | null |
2024-03-20 | Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation | Hugues Thomas et.al. | 2403.13777 | null |
2024-03-20 | Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models | Nicholas Bai et.al. | 2403.13771 | link |
2024-03-20 | Enhancing Gait Video Analysis in Neurodegenerative Diseases by Knowledge Augmentation in Vision Language Model | Diwei Wang et.al. | 2403.13756 | null |
2024-03-20 | Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement | Catherine Arnett et.al. | 2403.13754 | null |
2024-03-20 | EthioLLM: Multilingual Large Language Models for Ethiopian Languages with Task Evaluation | Atnafu Lambebo Tonja et.al. | 2403.13737 | null |
2024-03-20 | Large Language Models meet Network Slicing Management and Orchestration | Abdulhalim Dandoush et.al. | 2403.13721 | null |
2024-03-20 | SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning | Hongjun Wang et.al. | 2403.13684 | null |
2024-03-20 | PARAMANU-AYN: An Efficient Novel Generative and Instruction-tuned Language Model for Indian Legal Case Documents | Mitodru Niyogi et.al. | 2403.13681 | null |
2024-03-20 | RoleInteract: Evaluating the Social Interaction of Role-Playing Agents | Hongzhan Chen et.al. | 2403.13679 | link |
2024-03-20 | Grounding Spatial Relations in Text-Only Language Models | Gorka Azkune et.al. | 2403.13666 | link |
2024-03-20 | Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese | Meet Doshi et.al. | 2403.13638 | null |
2024-03-20 | VL-Mamba: Exploring State Space Models for Multimodal Learning | Yanyuan Qiao et.al. | 2403.13600 | null |
2024-03-20 | No more optimization rules: LLM-enabled policy-based multi-modal query optimizer (version 1) | Yifan Wang et.al. | 2403.13597 | null |
2024-03-19 | LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | Zhuoshi Pan et.al. | 2403.12968 | link |
2024-03-19 | Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models | Zuyan Liu et.al. | 2403.12966 | link |
2024-03-19 | Negative Yields Positive: Unified Dual-Path Adapter for Vision-Language Models | Ce Zhang et.al. | 2403.12964 | link |
2024-03-19 | Dated Data: Tracing Knowledge Cutoffs in Large Language Models | Jeffrey Cheng et.al. | 2403.12958 | null |
2024-03-19 | Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models | Elaine Sui et.al. | 2403.12952 | link |
2024-03-19 | Automatic Information Extraction From Employment Tribunal Judgements Using Large Language Models | Joana Ribeiro de Faria et.al. | 2403.12936 | null |
2024-03-19 | Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties | Efrain Torres-Lomas et.al. | 2403.12935 | null |
2024-03-19 | Rapid AIdeation: Generating Ideas With the Self and in Collaboration With Large Language Models | Gionnieve Lim et.al. | 2403.12928 | null |
2024-03-19 | Supporting Energy Policy Research with Large Language Models | Grant Buster et.al. | 2403.12924 | null |
2024-03-19 | Contextual AD Narration with Interleaved Multimodal Sequence | Hanlin Wang et.al. | 2403.12922 | null |
2024-03-19 | Semantic Layering in Room Segmentation via LLMs | Taehyeon Kim et.al. | 2403.12920 | null |
2024-03-19 | Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts | Sai Ashish Somayajula et.al. | 2403.12918 | link |
2024-03-19 | Yell At Your Robot: Improving On-the-Fly from Language Corrections | Lucy Xiaoyang Shi et.al. | 2403.12910 | null |
2024-03-19 | Toward Sustainable GenAI using Generation Directives for Carbon-Friendly Large Language Model Inference | Baolin Li et.al. | 2403.12900 | null |
2024-03-19 | mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | Anwen Hu et.al. | 2403.12895 | link |
2024-03-20 | MEDBind: Unifying Language and Multimodal Medical Data Embeddings | Yuan Gao et.al. | 2403.12894 | null |
2024-03-19 | HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning | Fucai Ke et.al. | 2403.12884 | null |
2024-03-19 | Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | Zehui Chen et.al. | 2403.12881 | link |
2024-03-19 | Epistemology of Language Models: Do Language Models Have Holistic Knowledge? | Minsu Kim et.al. | 2403.12862 | null |
2024-03-19 | RASP: A Drone-based Reconfigurable Actuation and Sensing Platform Towards Ambient Intelligent Systems | Minghui Zhao et.al. | 2403.12853 | null |
2024-03-18 | Modality-Agnostic fMRI Decoding of Vision and Language | Mitja Nikolaus et.al. | 2403.11771 | null |
2024-03-18 | Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs | M. Jehanzeb Mirza et.al. | 2403.11755 | link |
2024-03-18 | Revisiting The Classics: A Study on Identifying and Rectifying Gender Stereotypes in Rhymes and Poems | Aditya Narayan Sankaran et.al. | 2403.11752 | link |
2024-03-18 | Embedded Named Entity Recognition using Probing Classifiers | Nicholas Popovič et.al. | 2403.11747 | null |
2024-03-18 | TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models | Lisa Weijler et.al. | 2403.11691 | null |
2024-03-18 | HDLdebugger: Streamlining HDL debugging with Large Language Models | Xufeng Yao et.al. | 2403.11671 | null |
2024-03-18 | Prioritized Semantic Learning for Zero-shot Instance Navigation | Xander Sun et.al. | 2403.11650 | link |
2024-03-18 | Arc2Face: A Foundation Model of Human Faces | Foivos Paraperas Papantoniou et.al. | 2403.11641 | link |
2024-03-18 | Compositional Kronecker Context Optimization for Vision-Language Models | Kun Ding et.al. | 2403.11631 | null |
2024-03-18 | Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model | Haoyun Xu et.al. | 2403.11621 | null |
2024-03-18 | CRS-Diff: Controllable Generative Remote Sensing Foundation Model | Datao Tang et.al. | 2403.11614 | link |
2024-03-18 | Linguacodus: A Synergistic Framework for Transformative Code Generation in Machine Learning Pipelines | Ekaterina Trofimova et.al. | 2403.11585 | null |
2024-03-18 | Reinforcement Learning with Token-level Feedback for Controllable Text Generation | Wendi Li et.al. | 2403.11558 | link |
2024-03-18 | LLM^3:Large Language Model-based Task and Motion Planning with Motion Failure Reasoning | Shu Wang et.al. | 2403.11552 | link |
2024-03-18 | Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Jiazuo Yu et.al. | 2403.11549 | link |
2024-03-18 | DEE: Dual-stage Explainable Evaluation Method for Text Generation | Shenyu Zhang et.al. | 2403.11509 | null |
2024-03-18 | Do CLIPs Always Generalize Better than ImageNet Models? | Qizhou Wang et.al. | 2403.11497 | null |
2024-03-18 | VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | Yue Fan et.al. | 2403.11481 | null |
2024-03-18 | HateCOT: An Explanation-Enhanced Dataset for Generalizable Offensive Speech Detection via Large Language Models | Huy Nghiem et.al. | 2403.11456 | link |
2024-03-18 | Zero-shot Compound Expression Recognition with Visual Language Model at the 6th ABAW Challenge | Jiahe Wang et.al. | 2403.11450 | null |
2024-03-18 | LLM Guided Evolution - The Automation of Models Advancing Models | Clint Morris et.al. | 2403.11446 | link |
2024-03-18 | StyleChat: Learning Recitation-Augmented Memory in LLMs for Stylized Dialogue Generation | Jinpeng Li et.al. | 2403.11439 | null |
2024-03-18 | InsCL: A Data-efficient Continual Learning Paradigm for Fine-tuning Large Language Models with Instructions | Yifan Wang et.al. | 2403.11435 | null |
2024-03-18 | A Novel Paradigm Boosting Translation Capabilities of Large Language Models | Jiaxin Guo et.al. | 2403.11430 | null |
2024-03-15 | VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Xiaohan Wang et.al. | 2403.10517 | null |
2024-03-15 | Demystifying Faulty Code with LLM: Step-by-Step Reasoning for Explainable Fault Localization | Ratnadira Widyasari et.al. | 2403.10507 | null |
2024-03-15 | ATOM: Asynchronous Training of Massive Models for Deep Learning in a Decentralized Environment | Xiaofeng Wu et.al. | 2403.10504 | null |
2024-03-15 | Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study | Chenguang Wang et.al. | 2403.10499 | link |
2024-03-15 | Reconfigurable Robot Identification from Motion Data | Yuhang Hu et.al. | 2403.10496 | null |
2024-03-15 | Can a GPT4-Powered AI Agent Be a Good Enough Performance Attribution Analyst? | Bruno de Melo et.al. | 2403.10482 | null |
2024-03-15 | Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases | Jiarui Li et.al. | 2403.10446 | link |
2024-03-15 | Optimal Block-Level Draft Verification for Accelerating Speculative Decoding | Ziteng Sun et.al. | 2403.10444 | null |
2024-03-15 | Using an LLM to Turn Sign Spottings into Spoken Language Sentences | Ozge Mercanoglu Sincan et.al. | 2403.10434 | null |
2024-03-15 | SocialGenPod: Privacy-Friendly Generative AI Social Web Applications with Decentralised Personal Data Stores | Vidminas Vizgirda et.al. | 2403.10408 | link |
2024-03-15 | A Thorough Comparison of Cross-Encoders and LLMs for Reranking SPLADE | Hervé Déjean et.al. | 2403.10407 | null |
2024-03-15 | Monotonic Representation of Numeric Properties in Language Models | Benjamin Heinzerling et.al. | 2403.10381 | link |
2024-03-15 | EXAMS-V: A Multi-Discipline Multilingual Multimodal Exam Benchmark for Evaluating Vision Language Models | Rocktim Jyoti Das et.al. | 2403.10378 | link |
2024-03-15 | TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale | Pengcheng Jiang et.al. | 2403.10351 | null |
2024-03-15 | Investigating grammatical abstraction in language models using few-shot learning of novel noun gender | Priyanka Sukumaran et.al. | 2403.10338 | null |
2024-03-15 | CDGP: Automatic Cloze Distractor Generation based on Pre-trained Language Model | Shang-Hsuan Chiang et.al. | 2403.10326 | link |
2024-03-15 | NetBench: A Large-Scale and Comprehensive Network Traffic Benchmark Dataset for Foundation Models | Chen Qian et.al. | 2403.10319 | link |
2024-03-15 | Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | Hengxing Cai et.al. | 2403.10301 | null |
2024-03-15 | Few-Shot Image Classification and Segmentation as Visual Question Answering Using Vision-Language Models | Tian Meng et.al. | 2403.10287 | null |
2024-03-15 | Team Trifecta at Factify5WQA: Setting the Standard in Fact Verification with Fine-Tuning | Shang-Hsuan Chiang et.al. | 2403.10281 | link |
2024-03-14 | GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping | Yuhang Zheng et.al. | 2403.09637 | link |
2024-03-14 | Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference | Piotr Nawrot et.al. | 2403.09636 | null |
2024-03-14 | Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | Akhil Kedia et.al. | 2403.09635 | link |
2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634 | null |
2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | null |
2024-03-14 | Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | Eric Zelikman et.al. | 2403.09629 | link |
2024-03-14 | Explore In-Context Segmentation via Latent Diffusion Models | Chaoyang Wang et.al. | 2403.09616 | null |
2024-03-14 | MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Brandon McKinzie et.al. | 2403.09611 | null |
2024-03-14 | Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey | Xiaoyu Liu et.al. | 2403.09606 | null |
2024-03-14 | Logical Discrete Graphical Models Must Supplement Large Language Models for Information Synthesis | Gregory Coppola et.al. | 2403.09599 | null |
2024-03-14 | Renovating Names in Open-Vocabulary Segmentation Benchmarks | Haiwen Huang et.al. | 2403.09593 | null |
2024-03-14 | ExploRLLM: Guiding Exploration in Reinforcement Learning with Large Language Models | Runyu Ma et.al. | 2403.09583 | null |
2024-03-14 | Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation | Yunhao Gou et.al. | 2403.09572 | null |
2024-03-14 | Enhancing Trust in Autonomous Agents: An Architecture for Accountability and Explainability through Blockchain and Large Language Models | Laura Fernández-Becerra et.al. | 2403.09567 | null |
2024-03-14 | Welcome Your New AI Teammate: On Safety Analysis by Leashing Large Language Models | Ali Nouri et.al. | 2403.09565 | null |
2024-03-14 | PreCurious: How Innocent Pre-Trained Language Models Turn into Privacy Traps | Ruixuan Liu et.al. | 2403.09562 | null |
2024-03-14 | Less is More: Data Value Estimation for Visual Instruction Tuning | Zikang Liu et.al. | 2403.09559 | null |
2024-03-15 | Logits of API-Protected LLMs Leak Proprietary Information | Matthew Finlayson et.al. | 2403.09539 | null |
2024-03-14 | VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding | Chris Kelly et.al. | 2403.09530 | null |
2024-03-15 | WavCraft: Audio Editing and Generation with Natural Language Prompts | Jinhua Liang et.al. | 2403.09527 | link |
2024-03-13 | Simple and Scalable Strategies to Continually Pre-train Large Language Models | Adam Ibrahim et.al. | 2403.08763 | link |
2024-03-13 | Steering LLMs Towards Unbiased Responses: A Causality-Guided Debiasing Framework | Jingling Li et.al. | 2403.08743 | null |
2024-03-13 | The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models | Carlo Nicolini et.al. | 2403.08739 | null |
2024-03-13 | ILCiteR: Evidence-grounded Interpretable Local Citation Recommendation | Sayar Ghosh Roy et.al. | 2403.08737 | link |
2024-03-13 | Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization | Renjie Pi et.al. | 2403.08730 | null |
2024-03-14 | SOTOPIA- |
Ruiyi Wang et.al. | 2403.08715 | link |
2024-03-13 | Review of Generative AI Methods in Cybersecurity | Yagmur Yigit et.al. | 2403.08701 | null |
2024-03-13 | TeaMs-RL: Teaching LLMs to Teach Themselves Better Instructions via Reinforcement Learning | Shangding Gu et.al. | 2403.08694 | null |
2024-03-13 | Do Language Models Care About Text Quality? Evaluating Web-Crawled Corpora Across 11 Languages | Rik van Noord et.al. | 2403.08693 | null |
2024-03-13 | Zero-shot and Few-shot Generation Strategies for Artificial Clinical Records | Erlend Frayling et.al. | 2403.08664 | null |
2024-03-13 | Self-Supervised Learning for Covariance Estimation | Tzvi Diskin et.al. | 2403.08662 | null |
2024-03-13 | Human Alignment of Large Language Models through Online Preference Optimisation | Daniele Calandriello et.al. | 2403.08635 | null |
2024-03-13 | MedInsight: A Multi-Source Context Augmentation Framework for Generating Patient-Centric Medical Responses using Large Language Models | Subash Neupane et.al. | 2403.08607 | null |
2024-03-13 | Language-Grounded Dynamic Scene Graphs for Interactive Object Search with Mobile Manipulation | Daniel Honerkamp et.al. | 2403.08605 | link |
2024-03-13 | DevBench: A Comprehensive Benchmark for Software Development | Bowen Li et.al. | 2403.08604 | link |
2024-03-13 | Call Me When Necessary: LLMs can Efficiently and Faithfully Reason over Structured Environments | Sitao Cheng et.al. | 2403.08593 | null |
2024-03-13 | Non-discrimination Criteria for Generative Language Models | Sara Sterlie et.al. | 2403.08564 | null |
2024-03-13 | AIGCs Confuse AI Too: Investigating and Explaining Synthetic Image-induced Hallucinations in Large Vision-Language Models | Yifei Gao et.al. | 2403.08542 | null |
2024-03-13 | Language models scale reliably with over-training and on downstream tasks | Samir Yitzhak Gadre et.al. | 2403.08540 | link |
2024-03-13 | Masked Generative Story Transformer with Character Guidance and Caption Augmentation | Christos Papadimitriou et.al. | 2403.08502 | link |
2024-03-12 | Beyond Text: Frozen Large Language Models in Visual Signal Comprehension | Lei Zhu et.al. | 2403.07874 | link |
2024-03-12 | Rethinking Generative Large Language Model Evaluation for Semantic Comprehension | Fangyun Wei et.al. | 2403.07872 | null |
2024-03-12 | Exploring Safety Generalization Challenges of Large Language Models via Code | Qibing Ren et.al. | 2403.07865 | link |
2024-03-12 | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation | Shihao Zhao et.al. | 2403.07860 | link |
2024-03-12 | MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric | Haokun Lin et.al. | 2403.07839 | null |
2024-03-12 | DeliGrasp: Inferring Object Mass, Friction, and Compliance with LLMs for Adaptive and Minimally Deforming Grasp Policies | William Xie et.al. | 2403.07832 | null |
2024-03-12 | The Missing Piece in Model Editing: A Deep Dive into the Hidden Damage Brought By Model Editing | Jianchen Wang et.al. | 2403.07825 | null |
2024-03-12 | Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Sainbayar Sukhbaatar et.al. | 2403.07816 | null |
2024-03-12 | Chronos: Learning the Language of Time Series | Abdul Fatir Ansari et.al. | 2403.07815 | link |
2024-03-12 | Beyond Memorization: The Challenge of Random Memory Access in Language Models | Tongyao Zhu et.al. | 2403.07805 | link |
2024-03-12 | Fine-tuning Large Language Models with Sequential Instructions | Hanxu Hu et.al. | 2403.07794 | link |
2024-03-12 | Transforming Competition into Collaboration: The Revolutionary Role of Multi-Agent Systems and Language Models in Modern Organizations | Carlos Jose Xavier Cruz et.al. | 2403.07769 | link |
2024-03-12 | Synth |
Sahand Sharifzadeh et.al. | 2403.07750 | null |
2024-03-12 | FineMath: A Fine-Grained Mathematical Evaluation Benchmark for Chinese Large Language Models | Yan Liu et.al. | 2403.07747 | null |
2024-03-12 | Multi-modal Auto-regressive Modeling via Visual Words | Tianshuo Peng et.al. | 2403.07720 | link |
2024-03-12 | WorkArena: How Capable Are Web Agents at Solving Common Knowledge Work Tasks? | Alexandre Drouin et.al. | 2403.07718 | link |
2024-03-12 | StableToolBench: Towards Stable Large-Scale Benchmarking on Tool Learning of Large Language Models | Zhicheng Guo et.al. | 2403.07714 | link |
2024-03-12 | Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards | Wei Shen et.al. | 2403.07708 | null |
2024-03-12 | Large, Small or Both: A Novel Data Augmentation Framework Based on Language Models for Debiasing Opinion Summarization | Yanyue Zhang et.al. | 2403.07693 | null |
2024-03-12 | Reference-free Monolithic Preference Optimization with Odds Ratio | Jiwoo Hong et.al. | 2403.07691 | link |
2024-03-11 | Hybrid Human-LLM Corpus Construction and LLM Evaluation for Rare Linguistic Phenomena | Leonie Weissweiler et.al. | 2403.06965 | null |
2024-03-11 | Materials science in the era of large language models: a perspective | Ge Lei et.al. | 2403.06949 | null |
2024-03-11 | Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation | Xinyao Li et.al. | 2403.06946 | link |
2024-03-11 | Naming, Describing, and Quantifying Visual Objects in Humans and LLMs | Alberto Testoni et.al. | 2403.06935 | link |
2024-03-11 | ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis | Yanming Liu et.al. | 2403.06932 | link |
2024-03-11 | MEND: Meta dEmonstratioN Distillation for Efficient and Effective In-Context Learning | Yichuan Li et.al. | 2403.06914 | link |
2024-03-11 | Application of Quantum Tensor Networks for Protein Classification | Debarshi Kundu et.al. | 2403.06890 | null |
2024-03-11 | Exploring Large Language Models and Hierarchical Frameworks for Classification of Large Unstructured Legal Documents | Nishchal Prasad et.al. | 2403.06872 | link |
2024-03-11 | Semantic Residual Prompts for Continual Learning | Martin Menabue et.al. | 2403.06870 | link |
2024-03-11 | Learning with Noisy Foundation Models | Hao Chen et.al. | 2403.06869 | null |
2024-03-11 | A Geospatial Approach to Predicting Desert Locust Breeding Grounds in Africa | Ibrahim Salihu Yusuf et.al. | 2403.06860 | null |
2024-03-11 | Development of a Reliable and Accessible Caregiving Language Model (CaLM) | Bambang Parmanto et.al. | 2403.06857 | null |
2024-03-11 | DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation | Guosheng Zhao et.al. | 2403.06845 | null |
2024-03-11 | RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | Yanming Liu et.al. | 2403.06840 | link |
2024-03-11 | ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts | Lyuye Zhang et.al. | 2403.06838 | null |
2024-03-11 | Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? | Egor Zverev et.al. | 2403.06833 | link |
2024-03-11 | The Power of Noise: Toward a Unified Multi-modal Knowledge Graph Representation Framework | Zhuo Chen et.al. | 2403.06832 | link |
2024-03-11 | ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language Model | Zhiwei Liu et.al. | 2403.06765 | link |
2024-03-11 | An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | Liang Chen et.al. | 2403.06764 | link |
2024-03-11 | ALaRM: Align Language Models via Hierarchical Rewards Modeling | Yuhang Lai et.al. | 2403.06754 | link |
2024-03-08 | Bayesian Preference Elicitation with Language Models | Kunal Handa et.al. | 2403.05534 | null |
2024-03-08 | Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | Machel Reid et.al. | 2403.05530 | null |
2024-03-08 | GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM | Hao Kang et.al. | 2403.05527 | link |
2024-03-08 | DeepSeek-VL: Towards Real-World Vision-Language Understanding | Haoyu Lu et.al. | 2403.05525 | link |
2024-03-08 | Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola | Yijiang Li et.al. | 2403.05523 | null |
2024-03-08 | Authorship Attribution in Bangla Literature (AABL) via Transfer Learning using ULMFiT | Aisha Khatun et.al. | 2403.05519 | null |
2024-03-08 | Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought | James Chua et.al. | 2403.05518 | link |
2024-03-08 | To Err Is Human, but Llamas Can Learn It Too | Agnes Luhtaru et.al. | 2403.05493 | null |
2024-03-08 | Will GPT-4 Run DOOM? | Adrian de Wynter et.al. | 2403.05468 | null |
2024-03-08 | Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs | Arijit Nag et.al. | 2403.05434 | null |
2024-03-08 | Towards Real-World Stickers Use: A New Dataset for Multi-Tag Sticker Recognition | Bingbing Wang et.al. | 2403.05428 | null |
2024-03-08 | FedFMS: Exploring Federated Foundation Models for Medical Image Segmentation | Yuxi Liu et.al. | 2403.05408 | link |
2024-03-08 | Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery | Xavier Bou et.al. | 2403.05381 | link |
2024-03-08 | VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model | Junsu Kim et.al. | 2403.05346 | null |
2024-03-08 | Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings | Wei Zhou et.al. | 2403.05338 | null |
2024-03-08 | ChatASU: Evoking LLM's Reflexion to Truly Understand Aspect Sentiment in Dialogues | Yiding Liu et.al. | 2403.05326 | null |
2024-03-08 | RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | Zihao Wang et.al. | 2403.05313 | null |
2024-03-08 | Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents | Jinyang Li et.al. | 2403.05307 | link |
2024-03-08 | ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications | Sotaro Takeshita et.al. | 2403.05303 | link |
2024-03-08 | Modeling Dynamic (De)Allocations of Local Memory for Translation Validation | Abhishek Rose et.al. | 2403.05302 | null |
2024-03-07 | iScore: Visual Analytics for Interpreting How Language Models Automatically Score Summaries | Adam Coscia et.al. | 2403.04760 | link |
2024-03-07 | KnowledgeVIS: Interpreting Language Models by Comparing Fill-in-the-Blank Prompts | Adam Coscia et.al. | 2403.04758 | link |
2024-03-07 | LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | Boshi Wang et.al. | 2403.04746 | link |
2024-03-08 | How Far Are We from Intelligent Visual Deductive Reasoning? | Yizhe Zhang et.al. | 2403.04732 | link |
2024-03-07 | Common 7B Language Models Already Possess Strong Math Capabilities | Chen Li et.al. | 2403.04706 | link |
2024-03-07 | ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes | Hashmat Shadab Malik et.al. | 2403.04701 | link |
2024-03-07 | Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification | Ekaterina Fadeeva et.al. | 2403.04696 | link |
2024-03-07 | Telecom Language Models: Must They Be Large? | Nicola Piovesan et.al. | 2403.04666 | null |
2024-03-07 | Yi: Open Foundation Models by 01.AI | 01. AI et.al. | 2403.04652 | link |
2024-03-07 | Teaching Large Language Models to Reason with Reinforcement Learning | Alex Havrilla et.al. | 2403.04642 | null |
2024-03-07 | CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios | Qilang Ye et.al. | 2403.04640 | link |
2024-03-07 | A Detailed Audio-Text Data Simulation Pipeline using Single-Event Sounds | Xuenan Xu et.al. | 2403.04594 | link |
2024-03-07 | Embodied Understanding of Driving Scenarios | Yunsong Zhou et.al. | 2403.04593 | link |
2024-03-07 | Wiki-TabNER:Advancing Table Interpretation Through Named Entity Recognition | Aneta Koleva et.al. | 2403.04577 | link |
2024-03-07 | Reducing self-supervised learning complexity improves weakly-supervised classification performance in computational pathology | Tim Lenz et.al. | 2403.04558 | null |
2024-03-07 | Enhancing Data Quality in Federated Fine-Tuning of Foundation Models | Wanru Zhao et.al. | 2403.04529 | null |
2024-03-07 | Where does In-context Translation Happen in Large Language Models | Suzanna Sia et.al. | 2403.04510 | null |
2024-03-07 | GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability | Zihan Luo et.al. | 2403.04483 | link |
2024-03-08 | Do Large Language Model Understand Multi-Intent Spoken Language ? | Shangjian Yin et.al. | 2403.04481 | link |
2024-03-08 | Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset | Minjin Kim et.al. | 2403.04460 | link |
2024-03-06 | Backtracing: Retrieving the Cause of the Query | Rose E. Wang et.al. | 2403.03956 | link |
2024-03-06 | Bridging Language and Items for Retrieval and Recommendation | Yupeng Hou et.al. | 2403.03952 | link |
2024-03-06 | The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models | Adithya Bhaskar et.al. | 2403.03942 | link |
2024-03-06 | Did Translation Models Get More Robust Without Anyone Even Noticing? | Ben Peters et.al. | 2403.03923 | null |
2024-03-06 | Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | Asmita et.al. | 2403.03897 | link |
2024-03-06 | IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators | Indraneil Paul et.al. | 2403.03894 | link |
2024-03-06 | From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models | Luiza Pozzobon et.al. | 2403.03893 | link |
2024-03-06 | FaaF: Facts as a Function for the evaluation of RAG systems | Vasileios Katranidis et.al. | 2403.03888 | link |
2024-03-06 | SaulLM-7B: A pioneering Large Language Model for Law | Pierre Colombo et.al. | 2403.03883 | null |
2024-03-06 | Learning to Decode Collaboratively with Multiple Language Models | Shannon Zejiang Shen et.al. | 2403.03870 | link |
2024-03-06 | On the Origins of Linear Representations in Large Language Models | Yibo Jiang et.al. | 2403.03867 | null |
2024-03-06 | KIWI: A Dataset of Knowledge-Intensive Writing Instructions for Answering Research Questions | Fangyuan Xu et.al. | 2403.03866 | null |
2024-03-06 | Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning | Deepanway Ghosal et.al. | 2403.03864 | link |
2024-03-06 | X-Shot: A Unified System to Handle Frequent, Few-shot and Zero-shot Learning Simultaneously in Classification | Hanzi Xu et.al. | 2403.03863 | link |
2024-03-06 | Designing Informative Metrics for Few-Shot Example Selection | Rishabh Adiga et.al. | 2403.03861 | null |
2024-03-06 | Emojinize : Enriching Any Text with Emoji Translations | Lars Henning Klein et.al. | 2403.03857 | null |
2024-03-06 | ShortGPT: Layers in Large Language Models are More Redundant Than You Expect | Xin Men et.al. | 2403.03853 | null |
2024-03-06 | Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ | Carolin Holtermann et.al. | 2403.03814 | link |
2024-03-06 | Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery | Wei Zhang et.al. | 2403.03790 | null |
2024-03-06 | PPTC-R benchmark: Towards Evaluating the Robustness of Large Language Models for PowerPoint Task Completion | Zekai Zhang et.al. | 2403.03788 | link |
2024-03-05 | The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | Nathaniel Li et.al. | 2403.03218 | null |
2024-03-05 | CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments | Savitha Sam Abraham et.al. | 2403.03203 | null |
2024-03-05 | Towards Democratized Flood Risk Management: An Advanced AI Assistant Enabled by GPT-4 for Enhanced Interpretability and Public Engagement | Rafaela Martelo et.al. | 2403.03188 | link |
2024-03-05 | Reliable, Adaptable, and Attributable Language Models with Retrieval | Akari Asai et.al. | 2403.03187 | null |
2024-03-05 | MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting | Fangchen Liu et.al. | 2403.03174 | null |
2024-03-05 | SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Peng Qi et.al. | 2403.03170 | null |
2024-03-05 | PARADISE: Evaluating Implicit Planning Skills of Language Models with Procedural Warnings and Tips Dataset | Arda Uzunoğlu et.al. | 2403.03167 | link |
2024-03-05 | Quantum Many-Body Physics Calculations with Large Language Models | Haining Pan et.al. | 2403.03154 | null |
2024-03-05 | Language Guided Exploration for RL Agents in Text Environments | Hitesh Golchha et.al. | 2403.03141 | null |
2024-03-05 | CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following | Kaiyan Zhang et.al. | 2403.03129 | null |
2024-03-05 | Angry Men, Sad Women: Large Language Models Reflect Gendered Stereotypes in Emotion Attribution | Flor Miriam Plaza-del-Arco et.al. | 2403.03121 | link |
2024-03-05 | "In Dialogues We Learn": Towards Personalized Dialogue Without Pre-defined Profiles through In-Dialogue Learning | Chuanqi Cheng et.al. | 2403.03102 | null |
2024-03-05 | KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | Yuqi Zhu et.al. | 2403.03101 | link |
2024-03-05 | Learning to Use Tools via Cooperative and Interactive Agents | Zhengliang Shi et.al. | 2403.03031 | link |
2024-03-05 | Socratic Reasoning Improves Positive Text Rewriting | Anmol Goel et.al. | 2403.03029 | null |
2024-03-05 | Word Importance Explains How Prompts Affect Language Model Outputs | Stefan Hackmann et.al. | 2403.03028 | null |
2024-03-05 | OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following | Haochen Shi et.al. | 2403.03017 | null |
2024-03-05 | Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations | Hasan Abu-Rasheed et.al. | 2403.03008 | null |
2024-03-05 | Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models | Gen Luo et.al. | 2403.03003 | link |
2024-03-05 | Localized Zeroth-Order Prompt Optimization | Wenyang Hu et.al. | 2403.02993 | null |
2024-03-02 | LM4OPT: Unveiling the Potential of Large Language Models in Formulating Mathematical Optimization Problems | Tasnim Ahmed et.al. | 2403.01342 | null |
2024-03-02 | Making Hybrid Languages: A Recipe | Leif Andersen et.al. | 2403.01335 | null |
2024-03-02 | Chaining thoughts and LLMs to learn DNA structural biophysics | Tyler D. Ross et.al. | 2403.01332 | link |
2024-03-02 | VBART: The Turkish LLM | Meliksah Turker et.al. | 2403.01308 | null |
2024-03-02 | ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation | Moran Yanuka et.al. | 2403.01306 | link |
2024-03-02 | Improving the Validity of Automatically Generated Feedback via Reinforcement Learning | Alexander Scarlatos et.al. | 2403.01304 | link |
2024-03-02 | NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention | Tianyi Zhang et.al. | 2403.01273 | link |
2024-03-02 | Employing LLMs for Incident Response Planning and Review | Sam Hays et.al. | 2403.01271 | null |
2024-03-02 | Dissecting Language Models: Machine Unlearning via Selective Pruning | Nicholas Pochinkov et.al. | 2403.01267 | link |
2024-03-02 | Accelerating Greedy Coordinate Gradient via Probe Sampling | Yiran Zhao et.al. | 2403.01251 | link |
2024-03-02 | SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code | Ziniu Hu et.al. | 2403.01248 | null |
2024-03-02 | Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal | Jianheng Huang et.al. | 2403.01244 | link |
2024-03-02 | IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact | Ruikang Liu et.al. | 2403.01241 | link |
2024-03-02 | Inexact Unlearning Needs More Careful Evaluations to Avoid a False Sense of Privacy | Jamie Hayes et.al. | 2403.01218 | null |
2024-03-02 | API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access | Jiayuan Su et.al. | 2403.01216 | null |
2024-03-02 | Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning | Shuo Yang et.al. | 2403.01209 | null |
2024-03-02 | The Case for Animal-Friendly AI | Sankalpa Ghose et.al. | 2403.01199 | null |
2024-03-02 | DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling | Shanghaoran Quan et.al. | 2403.01197 | link |
2024-03-02 | RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots | Philip Feldman. James R. Foulds et.al. | 2403.01193 | null |
2024-03-02 | Balancing Exploration and Exploitation in LLM using Soft RLLF for Enhanced Negation Understanding | Ha-Thanh Nguyen et.al. | 2403.01185 | null |
2024-02-29 | The Counterfeit Conundrum: Can Code Language Models Grasp the Nuances of Their Incorrect Generations? | Alex Gu et.al. | 2402.19475 | null |
2024-02-29 | The All-Seeing Project V2: Towards General Relation Comprehension of the Open World | Weiyun Wang et.al. | 2402.19474 | link |
2024-02-29 | Retrieval-Augmented Generation for AI-Generated Content: A Survey | Penghao Zhao et.al. | 2402.19473 | link |
2024-02-29 | Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Sampling | Gabriel Grand et.al. | 2402.19471 | null |
2024-03-01 | TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning | Kate Sanders et.al. | 2402.19467 | null |
2024-02-29 | Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models | Chen Qian et.al. | 2402.19465 | link |
2024-02-29 | Curiosity-driven Red-teaming for Large Language Models | Zhang-Wei Hong et.al. | 2402.19464 | link |
2024-02-29 | Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | Saurabh Srivastava et.al. | 2402.19450 | link |
2024-02-29 | Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models | Frederik Kunstner et.al. | 2402.19449 | null |
2024-02-29 | ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL | Yifei Zhou et.al. | 2402.19446 | link |
2024-02-29 | Pushing the Limits of Cross-Embodiment Learning for Manipulation and Navigation | Jonathan Yang et.al. | 2402.19432 | null |
2024-02-29 | Compositional API Recommendation for Library-Oriented Code Generation | Zexiong Ma et.al. | 2402.19431 | null |
2024-02-29 | Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models | Soham De et.al. | 2402.19427 | null |
2024-02-29 | Crafting Knowledge: Exploring the Creative Mechanisms of Chat-Based Search Engines | Lijia Ma et.al. | 2402.19421 | null |
2024-02-29 | PaECTER: Patent-level Representation Learning using Citation-informed Transformers | Mainak Ghosh et.al. | 2402.19411 | null |
2024-02-29 | On the Scaling Laws of Geographical Representation in Language Models | Nathan Godey et.al. | 2402.19406 | null |
2024-02-29 | Entity-Aware Multimodal Alignment Framework for News Image Captioning | Junzhe Zhang et.al. | 2402.19404 | null |
2024-02-29 | Wisdom of the Silicon Crowd: LLM Ensemble Prediction Capabilities Match Human Crowd Accuracy | Philipp Schoenegger et.al. | 2402.19379 | null |
2024-02-29 | OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models | Jenish Maharjan et.al. | 2402.19371 | null |
2024-02-29 | SoK: Exploring the Potential of Large Language Models for Improving Digital Forensic Investigation Efficiency | Akila Wickramasekara et.al. | 2402.19366 | null |
2024-02-28 | Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards | Haoxiang Wang et.al. | 2402.18571 | link |
2024-02-28 | Diffusion Language Models Are Versatile Protein Learners | Xinyou Wang et.al. | 2402.18567 | null |
2024-02-28 | A Categorization of Complexity Classes for Information Retrieval and Synthesis Using Natural Logic | Gregory Coppola et.al. | 2402.18566 | null |
2024-02-28 | Approaching Human-Level Forecasting with Language Models | Danny Halawi et.al. | 2402.18563 | null |
2024-02-28 | Implicit Bias of Next-Token Prediction | Christos Thrampoulidis et.al. | 2402.18551 | null |
2024-02-28 | Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling | Mahdi Karami et.al. | 2402.18508 | null |
2024-02-28 | Few-Shot Fairness: Unveiling LLM's Potential for Fairness-Aware Classification | Garima Chhikara et.al. | 2402.18502 | null |
2024-02-28 | Language Models Represent Beliefs of Self and Others | Wentao Zhu et.al. | 2402.18496 | null |
2024-02-28 | IBD: Alleviating Hallucinations in Large Vision-Language Models via Image-Biased Decoding | Lanyun Zhu et.al. | 2402.18476 | null |
2024-02-28 | Meta-Task Prompting Elicits Embedding from Large Language Models | Yibin Lei et.al. | 2402.18458 | null |
2024-02-28 | Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization | Deng Li et.al. | 2402.18447 | null |
2024-02-28 | Beyond Natural Language: LLMs Leveraging Alternative Formats for Enhanced Reasoning and Communication | Weize Chen et.al. | 2402.18439 | link |
2024-02-28 | A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision Language Models | Xiujie Song et.al. | 2402.18409 | link |
2024-02-28 | Balanced Similarity with Auxiliary Prompts: Towards Alleviating Text-to-Image Retrieval Bias for CLIP in Zero-shot Learning | Hanyao Wang et.al. | 2402.18400 | null |
2024-02-28 | Decomposed Prompting: Unveiling Multilingual Linguistic Structure Knowledge in English-Centric Large Language Models | Ercong Nie et.al. | 2402.18397 | null |
2024-02-28 | The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA | Yiming Li et.al. | 2402.18385 | link |
2024-02-28 | Large Language Models As Evolution Strategies | Robert Tjarko Lange et.al. | 2402.18381 | null |
2024-02-28 | Tokenization Is More Than Compression | Craig W. Schmidt et.al. | 2402.18376 | null |
2024-02-28 | VerifiNER: Verification-augmented NER via Knowledge-grounded Reasoning with Large Language Models | Seoyeon Kim et.al. | 2402.18374 | link |
2024-02-28 | Focus on Your Question! Interpreting and Mitigating Toxic CoT Problems in Commonsense Reasoning | Jiachun Li et.al. | 2402.18344 | link |
2024-02-27 | ShapeLLM: Universal 3D Object Understanding for Embodied Interaction | Zekun Qi et.al. | 2402.17766 | link |
2024-02-27 | The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | Shuming Ma et.al. | 2402.17764 | null |
2024-02-27 | Massive Activations in Large Language Models | Mingjie Sun et.al. | 2402.17762 | link |
2024-02-27 | Towards Optimal Learning of Language Models | Yuxian Gu et.al. | 2402.17759 | null |
2024-02-27 | Evaluating Very Long-Term Conversational Memory of LLM Agents | Adyasha Maharana et.al. | 2402.17753 | null |
2024-02-27 | Tower: An Open Multilingual Large Language Model for Translation-Related Tasks | Duarte M. Alves et.al. | 2402.17733 | link |
2024-02-27 | AmbigNLG: Addressing Task Ambiguity in Instruction for NLG | Ayana Niwa et.al. | 2402.17717 | null |
2024-02-27 | Case-Based or Rule-Based: How Do Transformers Do the Math? | Yi Hu et.al. | 2402.17709 | link |
2024-02-27 | RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations | Jing Huang et.al. | 2402.17700 | link |
2024-02-27 | NextLevelBERT: Investigating Masked Language Modeling with Higher-Level Representations for Long Documents | Tamara Czinczoll et.al. | 2402.17682 | link |
2024-02-27 | The Emergence of Large Language Models in Static Analysis: A First Look through Micro-Benchmarks | Ashwin Prasad Shivarpatna Venkatesh et.al. | 2402.17679 | null |
2024-02-27 | CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention | Mohammad Sadil Khan et.al. | 2402.17678 | null |
2024-02-27 | Securing Reliability: A Brief Overview on Enhancing In-Context Learning for Foundation Models | Yunpeng Huang et.al. | 2402.17671 | null |
2024-02-27 | Beyond prompt brittleness: Evaluating the reliability and consistency of political worldviews in LLMs | Tanise Ceron et.al. | 2402.17649 | null |
2024-02-27 | SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation | Shuangrui Ding et.al. | 2402.17645 | link |
2024-02-27 | Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data | Xiao Liu et.al. | 2402.17644 | link |
2024-02-27 | Variational Learning is Effective for Large Deep Networks | Yuesong Shen et.al. | 2402.17641 | link |
2024-02-27 | Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling | David S. W. Williams et.al. | 2402.17622 | null |
2024-02-27 | Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization | Wenqi Zhang et.al. | 2402.17574 | link |
2024-02-27 | Unleashing the Potential of Large Language Models as Prompt Optimizers: An Analogical Analysis with Gradient-based Model Optimizers | Xinyu Tang et.al. | 2402.17564 | link |
2024-02-26 | Integrating Large Language Models with Graphical Session-Based Recommendation | Naicheng Guo et.al. | 2402.16539 | null |
2024-02-26 | LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | Junzhe Chen et.al. | 2402.16499 | link |
2024-02-26 | On Languaging a Simulation Engine | Han Liu et.al. | 2402.16482 | null |
2024-02-26 | Unveiling ChatGPT's Usage in Open Source Projects: A Mining-based Study | Rosalia Tufano et.al. | 2402.16480 | null |
2024-02-26 | mEdIT: Multilingual Text Editing via Instruction Tuning | Vipul Raheja et.al. | 2402.16472 | link |
2024-02-26 | Unveiling Vulnerability of Self-Attention | Khai Jiet Liong et.al. | 2402.16470 | link |
2024-02-26 | Defending LLMs against Jailbreaking Attacks via Backtranslation | Yihan Wang et.al. | 2402.16459 | link |
2024-02-26 | ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing | Liuzhenghao Lv et.al. | 2402.16445 | link |
2024-02-26 | ShieldLM: Empowering LLMs as Aligned, Customizable and Explainable Safety Detectors | Zhexin Zhang et.al. | 2402.16444 | link |
2024-02-26 | Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models | Tianyi Tang et.al. | 2402.16438 | null |
2024-02-26 | RoCoIns: Enhancing Robustness of Large Language Models through Code-Style Instructions | Yuansen Zhang et.al. | 2402.16431 | null |
2024-02-26 | Predicting Sustainable Development Goals Using Course Descriptions -- from LLMs to Conventional Foundation Models | Lev Kharlashkin et.al. | 2402.16420 | null |
2024-02-26 | From RAGs to riches: Using large language models to write documents for clinical trials | Nigel Markey et.al. | 2402.16406 | null |
2024-02-26 | MoZIP: A Multilingual Benchmark to Evaluate Large Language Models in Intellectual Property | Shiwen Ni et.al. | 2402.16389 | link |
2024-02-26 | Immunization against harmful fine-tuning attacks | Domenic Rosati et.al. | 2402.16382 | null |
2024-02-26 | Improving LLM-based Machine Translation with Systematic Self-Correction | Zhaopeng Feng et.al. | 2402.16379 | link |
2024-02-26 | Unraveling Babel: Exploring Multilingual Activation Patterns within Large Language Models | Weize Liu et.al. | 2402.16367 | null |
2024-02-26 | LLM Inference Unveiled: Survey and Roofline Model Insights | Zhihang Yuan et.al. | 2402.16363 | link |
2024-02-26 | Layer-wise Regularized Dropout for Neural Language Models | Shiwen Ni et.al. | 2402.16361 | null |
2024-02-26 | An Integrated Data Processing Framework for Pretraining Foundation Models | Yiding Sun et.al. | 2402.16358 | link |
2024-02-26 | Language-guided Skill Learning with Temporal Variational Inference | Haotian Fu et.al. | 2402.16354 | null |
2024-02-23 | AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | Jianguo Zhang et.al. | 2402.15506 | link |
2024-02-23 | API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | Kinjal Basu et.al. | 2402.15491 | link |
2024-02-23 | Prejudice and Caprice: A Statistical Framework for Measuring Social Discrimination in Large Language Models | Yiran Liu et.al. | 2402.15481 | null |
2024-02-23 | Leveraging Domain Knowledge for Efficient Reward Modelling in RLHF: A Case-Study in E-Commerce Opinion Summarization | Swaroop Nath et.al. | 2402.15473 | link |
2024-02-23 | Repetition Improves Language Model Embeddings | Jacob Mitchell Springer et.al. | 2402.15449 | link |
2024-02-23 | A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models | Stefan Hegselmann et.al. | 2402.15422 | link |
2024-02-23 | PREDILECT: Preferences Delineated with Zero-Shot Language-based Reasoning in Reinforcement Learning | Simon Holk et.al. | 2402.15420 | null |
2024-02-23 | Does Combining Parameter-efficient Modules Improve Few-shot Transfer Accuracy? | Nader Asadi et.al. | 2402.15414 | null |
2024-02-23 | Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior | Kechun Xu et.al. | 2402.15402 | link |
2024-02-23 | Explorations of Self-Repair in Language Models | Cody Rushing et.al. | 2402.15390 | link |
2024-02-23 | Safe Task Planning for Language-Instructed Multi-Robot Systems using Conformal Prediction | Jun Wang et.al. | 2402.15368 | null |
2024-02-23 | Farsight: Fostering Responsible AI Awareness During AI Application Prototyping | Zijie J. Wang et.al. | 2402.15350 | link |
2024-02-23 | NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data | Sergei Bogdanov et.al. | 2402.15343 | link |
2024-02-23 | Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies | Nitesh Kumar et.al. | 2402.15337 | null |
2024-02-23 | GPTVQ: The Blessing of Dimensionality for LLM Quantization | Mart van Baalen et.al. | 2402.15319 | null |
2024-02-23 | ArabianGPT: Native Arabic GPT-based Large Language | Anis Koubaa et.al. | 2402.15313 | null |
2024-02-23 | Counterfactual Generation with Identifiability Guarantees | Hanqi Yan et.al. | 2402.15309 | link |
2024-02-23 | Representing Online Handwriting for Recognition in Large Vision-Language Models | Anastasiia Fadeeva et.al. | 2402.15307 | null |
2024-02-23 | How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries | Somnath Banerjee et.al. | 2402.15302 | link |
2024-02-23 | Causal Graph Discovery with Retrieval-Augmented Generation based Large Language Models | Yuzhe Zhang et.al. | 2402.15301 | null |
2024-02-22 | PALO: A Polyglot Large Multimodal Model for 5B People | Muhammad Maaz et.al. | 2402.14818 | link |
2024-02-22 | Demographic Bias of Expert-Level Vision-Language Foundation Models in Medical Imaging | Yuzhe Yang et.al. | 2402.14815 | link |
2024-02-22 | WeakSAM: Segment Anything Meets Weakly-supervised Instance-level Recognition | Lianghui Zhu et.al. | 2402.14812 | link |
2024-02-22 | Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking | Nikhil Prakash et.al. | 2402.14811 | null |
2024-02-22 | CriticBench: Benchmarking LLMs for Critique-Correct Reasoning | Zicheng Lin et.al. | 2402.14809 | link |
2024-02-22 | RelayAttention for Efficient Large Language Model Serving with Long System Prompts | Lei Zhu et.al. | 2402.14808 | link |
2024-02-22 | A Decision-Language Model (DLM) for Dynamic Restless Multi-Armed Bandit Tasks in Public Health | Nikhil Behari et.al. | 2402.14807 | null |
2024-02-22 | Identifying Multiple Personalities in Large Language Models with External Evaluation | Xiaoyang Song et.al. | 2402.14805 | null |
2024-02-22 | Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | Xudong Lu et.al. | 2402.14800 | link |
2024-02-22 | Enhancing Systematic Decompositional Natural Language Inference Using Informal Logic | Nathaniel Weir et.al. | 2402.14798 | null |
2024-02-22 | Zero-shot cross-lingual transfer in instruction tuning of large language model | Nadezhda Chirkova et.al. | 2402.14778 | null |
2024-02-22 | 2D Matryoshka Sentence Embeddings | Xianming Li et.al. | 2402.14776 | link |
2024-02-22 | DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models | Yuhang Cao et.al. | 2402.14767 | link |
2024-02-22 | MT-Bench-101: A Fine-Grained Benchmark for Evaluating Large Language Models in Multi-Turn Dialogues | Ge Bai et.al. | 2402.14762 | link |
2024-02-22 | Generalizing Reward Modeling for Out-of-Distribution Preference Learning | Chen Jia et.al. | 2402.14760 | link |
2024-02-22 | Large Language Models as Urban Residents: An LLM Agent Framework for Personal Mobility Generation | Jiawei Wang et.al. | 2402.14744 | link |
2024-02-22 | Dependency Annotation of Ottoman Turkish with Multilingual BERT | Şaziye Betül Özateş et.al. | 2402.14743 | null |
2024-02-22 | Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs | Arash Ahmadian et.al. | 2402.14740 | null |
2024-02-22 | Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models | Seungduk Kim et.al. | 2402.14714 | link |
2024-02-22 | IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus | Honghao Gui et.al. | 2402.14710 | link |
2024-02-21 | Coercing LLMs to do and reveal (almost) anything | Jonas Geiping et.al. | 2402.14020 | link |
2024-02-21 | Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment | Vyas Raina et.al. | 2402.14016 | link |
2024-02-21 | OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems | Chaoqun He et.al. | 2402.14008 | link |
2024-02-21 | Can Watermarks Survive Translation? On the Cross-lingual Consistency of Text Watermark for Large Language Models | Zhiwei He et.al. | 2402.14007 | link |
2024-02-21 | Hallucinations or Attention Misdirection? The Path to Strategic Value Extraction in Business Using Large Language Models | Aline Ioste et.al. | 2402.14002 | null |
2024-02-21 | Analysing The Impact of Sequence Composition on Language Model Pre-Training | Yu Zhao et.al. | 2402.13991 | link |
2024-02-21 | Towards Building Multilingual Language Model for Medicine | Pengcheng Qiu et.al. | 2402.13963 | link |
2024-02-21 | Measuring Social Biases in Masked Language Models by Proxy of Prediction Quality | Rahul Zalkikar et.al. | 2402.13954 | null |
2024-02-21 | Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning | Debjit Paul et.al. | 2402.13950 | null |
2024-02-21 | Do Efficient Transformers Really Save Computation? | Kai Yang et.al. | 2402.13934 | null |
2024-02-21 | Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content | Federico Bianchi et.al. | 2402.13926 | null |
2024-02-21 | SYNFAC-EDIT: Synthetic Imitation Edit Feedback for Factual Alignment in Clinical Summarization | Prakamya Mishra et.al. | 2402.13919 | link |
2024-02-21 | What Linguistic Features and Languages are Important in LLM Translation? | Ryandito Diandaru et.al. | 2402.13917 | null |
2024-02-21 | Calibrating Large Language Models with Sample Consistency | Qing Lyu et.al. | 2402.13904 | null |
2024-02-21 | Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | Chenyang Lyu et.al. | 2402.13887 | null |
2024-02-21 | Haoyu Liu et.al. | 2402.13874 | link | |
2024-02-21 | An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach | Mohammad Amaz Uddin et.al. | 2402.13871 | null |
2024-02-21 | Kuaiji: the First Chinese Accounting Large Language Model | Jiayuan Luo et.al. | 2402.13866 | null |
2024-02-21 | RealDex: Towards Human-like Grasping for Robotic Dexterous Hand | Yumeng Liu et.al. | 2402.13853 | null |
2024-02-21 | VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models | Jiawei Liang et.al. | 2402.13851 | null |
2024-02-20 | Towards audio language modeling -- an overview | Haibin Wu et.al. | 2402.13236 | null |
2024-02-20 | Unlocking Insights: Semantic Search in Jupyter Notebooks | Lan Li et.al. | 2402.13234 | null |
2024-02-20 | A Touch, Vision, and Language Dataset for Multimodal Alignment | Letian Fu et.al. | 2402.13232 | link |
2024-02-20 | Investigating Cultural Alignment of Large Language Models | Badr AlKhamissi et.al. | 2402.13231 | link |
2024-02-20 | Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive | Arka Pal et.al. | 2402.13228 | link |
2024-02-20 | AgentMD: Empowering Language Agents for Risk Prediction with Large-Scale Clinical Tool Learning | Qiao Jin et.al. | 2402.13225 | null |
2024-02-20 | RoCode: A Dataset for Measuring Code Intelligence from Problem Definitions in Romanian | Adrian Cosma et.al. | 2402.13222 | link |
2024-02-20 | How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts | Yusu Qian et.al. | 2402.13220 | null |
2024-02-20 | Softmax Probabilities (Mostly) Predict Large Language Model Correctness on Multiple-Choice Q&A | Benjamin Plaut et.al. | 2402.13213 | link |
2024-02-20 | Soft Self-Consistency Improves Language Model Agents | Han Wang et.al. | 2402.13212 | link |
2024-02-20 | Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation | Dongjin Kang et.al. | 2402.13211 | null |
2024-02-20 | Bayesian Reward Models for LLM Alignment | Adam X. Yang et.al. | 2402.13210 | null |
2024-02-20 | How do Hyenas deal with Human Speech? Speech Recognition and Translation with ConfHyena | Marco Gaido et.al. | 2402.13208 | link |
2024-02-20 | Question Calibration and Multi-Hop Modeling for Temporal Question Answering | Chao Xue et.al. | 2402.13188 | null |
2024-02-20 | What if LLMs Have Different World Views: Simulating Alien Civilizations with LLM-based Agents | Mingyu Jin et.al. | 2402.13184 | null |
2024-02-20 | DINOBot: Robot Manipulation via Retrieval and Alignment with Vision Foundation Models | Norman Di Palo et.al. | 2402.13181 | null |
2024-02-20 | Benchmarking Retrieval-Augmented Generation for Medicine | Guangzhi Xiong et.al. | 2402.13178 | link |
2024-02-20 | Defending Jailbreak Prompts via In-Context Adversarial Game | Yujun Zhou et.al. | 2402.13148 | null |
2024-02-20 | OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog | Adnen Abdessaied et.al. | 2402.13146 | null |
2024-02-20 | The Hidden Space of Transformer Language Adapters | Jesujoba O. Alabi et.al. | 2402.13137 | link |
2024-02-19 | Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding | Zhuoming Chen et.al. | 2402.12374 | link |
2024-02-19 | AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies | Xiao Ye et.al. | 2402.12370 | link |
2024-02-19 | A Critical Evaluation of AI Feedback for Aligning Large Language Models | Archit Sharma et.al. | 2402.12366 | link |
2024-02-19 | Emergent Word Order Universals from Cognitively-Motivated Language Models | Tatsuki Kuribayashi et.al. | 2402.12363 | link |
2024-02-19 | Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge | Julien Delile et.al. | 2402.12352 | null |
2024-02-19 | GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations | Jinhao Duan et.al. | 2402.12348 | link |
2024-02-19 | Emulated Disalignment: Safety Alignment for Large Language Models May Backfire! | Zhanhui Zhou et.al. | 2402.12343 | link |
2024-02-19 | Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models | Christian Schlarmann et.al. | 2402.12336 | link |
2024-02-19 | Query-Based Adversarial Prompt Generation | Jonathan Hayase et.al. | 2402.12329 | null |
2024-02-19 | Shall We Talk: Exploring Spontaneous Collaborations of Competing LLM Agents | Zengqing Wu et.al. | 2402.12327 | link |
2024-02-19 | ARKS: Active Retrieval in Knowledge Soup for Code Generation | Hongjin Su et.al. | 2402.12317 | link |
2024-02-19 | Is Open-Source There Yet? A Comparative Study on Commercial and Open-Source LLMs in Their Ability to Label Chest X-Ray Reports | Felix J. Dorfner et.al. | 2402.12298 | null |
2024-02-19 | KARL: Knowledge-Aware Retrieval and Representations aid Retention and Learning in Students | Matthew Shu et.al. | 2402.12291 | null |
2024-02-19 | DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Xiaoyu Tian et.al. | 2402.12289 | null |
2024-02-19 | Adaptive Skeleton Graph Decoding | Shuowei Jin et.al. | 2402.12280 | null |
2024-02-19 | Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks | Nadezhda Chirkova et.al. | 2402.12279 | null |
2024-02-19 | Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models | Puxuan Yu et.al. | 2402.12276 | link |
2024-02-19 | High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models | Michela Lorandi et.al. | 2402.12267 | link |
2024-02-19 | Uncertainty quantification in fine-tuned LLMs using LoRA ensembles | Oleksandr Balabanov et.al. | 2402.12264 | null |
2024-02-19 | NEO-BENCH: Evaluating Robustness of Large Language Models with Neologisms | Jonathan Zheng et.al. | 2402.12261 | null |
2024-02-16 | PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter | Junfei Xiao et.al. | 2402.10896 | null |
2024-02-16 | RLVF: Learning from Verbal Feedback without Overgeneralization | Moritz Stephan et.al. | 2402.10893 | link |
2024-02-16 | Instruction Diversity Drives Generalization To Unseen Tasks | Dylan Zhang et.al. | 2402.10891 | null |
2024-02-16 | When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | Ziru Chen et.al. | 2402.10890 | link |
2024-02-16 | Multi-modal preference alignment remedies regression of visual instruction tuning on language model | Shengzhi Li et.al. | 2402.10884 | link |
2024-02-16 | EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models | Muhammad Shihab Rashid et.al. | 2402.10866 | link |
2024-02-16 | Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities | Mingyu Jin et.al. | 2402.10835 | null |
2024-02-16 | RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model | Jianhao Yuan et.al. | 2402.10828 | null |
2024-02-16 | Quantifying the Persona Effect in LLM Simulations | Tiancheng Hu et.al. | 2402.10811 | null |
2024-02-16 | Generative Cross-Modal Retrieval: Memorizing Images in Multimodal Language Models for Retrieval and Beyond | Yongqi Li et.al. | 2402.10805 | null |
2024-02-16 | EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge | Xuan Shen et.al. | 2402.10787 | link |
2024-02-16 | A Condensed Transition Graph Framework for Zero-shot Link Prediction with Large Language Models | Mingchen Li et.al. | 2402.10779 | null |
2024-02-16 | AutoGPT+P: Affordance-based Task Planning with Large Language Models | Timo Birr et.al. | 2402.10778 | null |
2024-02-16 | How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? | Ehsan Doostmohammadi et.al. | 2402.10770 | null |
2024-02-16 | Distillation Enhanced Generative Retrieval | Yongqi Li et.al. | 2402.10769 | null |
2024-02-16 | Inference to the Best Explanation in Large Language Models | Dhairya Dalal et.al. | 2402.10767 | null |
2024-02-16 | When Dataflow Analysis Meets Large Language Models | Chengpeng Wang et.al. | 2402.10754 | null |
2024-02-16 | ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages | Junjie Ye et.al. | 2402.10753 | link |
2024-02-16 | GenRES: Rethinking Evaluation for Generative Relation Extraction in the Era of Large Language Models | Pengcheng Jiang et.al. | 2402.10744 | link |
2024-02-16 | Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning | Yinpeng Liu et.al. | 2402.10738 | link |
2024-02-15 | Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation | Huizhuo Yuan et.al. | 2402.10210 | null |
2024-02-15 | Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment | Rui Yang et.al. | 2402.10207 | link |
2024-02-15 | Chain-of-Thought Reasoning Without Prompting | Xuezhi Wang et.al. | 2402.10200 | null |
2024-02-15 | A Trembling House of Cards? Mapping Adversarial Attacks against Language Agents | Lingbo Mo et.al. | 2402.10196 | link |
2024-02-15 | BitDelta: Your Fine-Tune May Only Be Worth One Bit | James Liu et.al. | 2402.10193 | link |
2024-02-15 | Uncertainty Decomposition and Quantification for In-Context Learning of Large Language Models | Chen Ling et.al. | 2402.10189 | link |
2024-02-15 | Rethinking Information Structures in RLHF: Reward Generalization from a Graph Theory Perspective | Tianyi Qiu et.al. | 2402.10184 | null |
2024-02-15 | TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation | Yaoxiang Wang et.al. | 2402.10178 | null |
2024-02-15 | OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | Shubham Toshniwal et.al. | 2402.10176 | link |
2024-02-15 | Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence | Yinhong Liu et.al. | 2402.10175 | link |
2024-02-15 | OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models | Ali AhmadiTeshnizi et.al. | 2402.10172 | null |
2024-02-15 | Data Engineering for Scaling Language Models to 128K Context | Yao Fu et.al. | 2402.10171 | link |
2024-02-15 | Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients | Mahyar Abbasian et.al. | 2402.10153 | null |
2024-02-15 | ControlLM: Crafting Diverse Personalities for Language Models | Yixuan Weng et.al. | 2402.10151 | link |
2024-02-15 | TOAD: Task-Oriented Automatic Dialogs with Diverse Response Styles | Yinhong Liu et.al. | 2402.10137 | null |
2024-02-15 | Zero-Shot Reasoning: Personalized Content Generation Without the Cold Start Problem | Davor Hafnar et.al. | 2402.10133 | link |
2024-02-15 | Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning | Ming Li et.al. | 2402.10110 | link |
2024-02-15 | Quantized Embedding Vectors for Controllable Diffusion Language Models | Cheng Kang et.al. | 2402.10107 | null |
2024-02-15 | GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-Solving | Jiaxin Zhang et.al. | 2402.10104 | link |
2024-02-15 | Any-Shift Prompting for Generalization over Distributions | Zehao Xiao et.al. | 2402.10099 | null |
2024-02-14 | AQA-Bench: An Interactive Benchmark for Evaluating LLMs' Sequential Reasoning Ability | Siwei Yang et.al. | 2402.09404 | link |
2024-02-14 | Reinforcement Learning from Human Feedback with Active Queries | Kaixuan Ji et.al. | 2402.09401 | null |
2024-02-14 | Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference | Harry Dong et.al. | 2402.09398 | link |
2024-02-14 | LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset | Botao Yu et.al. | 2402.09391 | link |
2024-02-14 | HGOT: Hierarchical Graph of Thoughts for Retrieval-Augmented In-Context Learning in Factuality Evaluation | Yihao Fang et.al. | 2402.09390 | link |
2024-02-14 | Transformers Can Achieve Length Generalization But Not Robustly | Yongchao Zhou et.al. | 2402.09371 | null |
2024-02-14 | Pseudorandom Error-Correcting Codes | Miranda Christ et.al. | 2402.09370 | null |
2024-02-14 | Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking | Yi Fung et.al. | 2402.09369 | link |
2024-02-14 | Copyright Traps for Large Language Models | Matthieu Meeus et.al. | 2402.09363 | link |
2024-02-14 | HiRE: High Recall Approximate Top- |
Yashas Samaga B L et.al. | 2402.09360 | null |
2024-02-14 | Developing a Framework for Auditing Large Language Models Using Human-in-the-Loop | Maryam Amirizaniani et.al. | 2402.09346 | null |
2024-02-14 | Mitigating Reward Hacking via Information-Theoretic Reward Modeling | Yuchun Miao et.al. | 2402.09345 | null |
2024-02-14 | AuditLLM: A Tool for Auditing Large Language Models Using Multiprobe Approach | Maryam Amirizaniani et.al. | 2402.09334 | null |
2024-02-14 | ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization | Feifan Song et.al. | 2402.09320 | link |
2024-02-14 | Embracing the black box: Heading towards foundation models for causal discovery from time series data | Gideon Stein et.al. | 2402.09305 | link |
2024-02-14 | Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code | Vahid Majdinasab et.al. | 2402.09299 | link |
2024-02-14 | Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey | Zhichen Dong et.al. | 2402.09283 | link |
2024-02-14 | Leveraging Large Language Models for Enhanced NLP Task Performance through Knowledge Distillation and Optimized Training Strategies | Yining Huang et.al. | 2402.09282 | null |
2024-02-14 | Personalized Large Language Models | Stanisław Woźniak et.al. | 2402.09269 | null |
2024-02-14 | Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation | Xiaoying Zhang et.al. | 2402.09267 | null |
2024-02-13 | Mitigating Object Hallucination in Large Vision-Language Models via Classifier-Free Guidance | Linxi Zhao et.al. | 2402.08680 | null |
2024-02-13 | COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability | Xingang Guo et.al. | 2402.08679 | link |
2024-02-13 | Human Curriculum Effects Emerge with In-Context Learning in Neural Networks | Jacob Russin et.al. | 2402.08674 | null |
2024-02-13 | Rec-GPT4V: Multimodal Recommendation with Large Vision-Language Models | Yuqing Liu et.al. | 2402.08670 | null |
2024-02-13 | Improving Generalization in Semantic Parsing by Increasing Natural Language Variation | Irina Saparina et.al. | 2402.08666 | link |
2024-02-13 | The Last JITAI? The Unreasonable Effectiveness of Large Language Models in Issuing Just-in-Time Adaptive Interventions: Fostering Physical Activity in a Prospective Cardiac Rehabilitation Setting | David Haag et.al. | 2402.08658 | null |
2024-02-13 | PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs | Michael Dorkenwald et.al. | 2402.08657 | null |
2024-02-13 | Tandem Transformers for Inference Efficient LLMs | Aishwarya P S et.al. | 2402.08644 | null |
2024-02-13 | SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages | Nedjma Ousidhoum et.al. | 2402.08638 | null |
2024-02-13 | Knowledge Editing on Black-box Large Language Models | Xiaoshuai Song et.al. | 2402.08631 | link |
2024-02-13 | Bayesian Multi-Task Transfer Learning for Soft Prompt Tuning | Haeju Lee et.al. | 2402.08594 | link |
2024-02-13 | Test-Time Backdoor Attacks on Multimodal Large Language Models | Dong Lu et.al. | 2402.08577 | link |
2024-02-13 | Online Foundation Model Selection in Robotics | Po-han Li et.al. | 2402.08570 | null |
2024-02-13 | Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | Xiangming Gu et.al. | 2402.08567 | link |
2024-02-13 | Artificial Intelligence for Literature Reviews: Opportunities and Challenges | Francisco Bolanos et.al. | 2402.08565 | null |
2024-02-13 | Higher Layers Need More LoRA Experts | Chongyang Gao et.al. | 2402.08562 | link |
2024-02-13 | Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback | Vineet Bhat et.al. | 2402.08546 | null |
2024-02-13 | The Application of ChatGPT in Responding to Questions Related to the Boston Bowel Preparation Scale | Xiaoqiang Liu et.al. | 2402.08492 | null |
2024-02-13 | Intriguing Differences Between Zero-Shot and Systematic Evaluations of Vision-Language Transformer Models | Shaeke Salman et.al. | 2402.08473 | null |
2024-02-13 | Large Language Models for the Automated Analysis of Optimization Algorithms | Camilo Chacón Sartori et.al. | 2402.08472 | link |
2024-02-12 | A systematic investigation of learnability from single child linguistic input | Yulu Qin et.al. | 2402.07899 | link |
2024-02-12 | Suppressing Pink Elephants with Direct Principle Feedback | Louis Castricato et.al. | 2402.07896 | null |
2024-02-12 | WildfireGPT: Tailored Large Language Model for Wildfire Analysis | Yangxinyu Xie et.al. | 2402.07877 | null |
2024-02-12 | Policy Improvement using Language Feedback Models | Victor Zhong et.al. | 2402.07876 | null |
2024-02-12 | PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Soroush Nasiriany et.al. | 2402.07872 | null |
2024-02-12 | Scaling Laws for Fine-Grained Mixture of Experts | Jakub Krajewski et.al. | 2402.07871 | link |
2024-02-12 | PoisonedRAG: Knowledge Poisoning Attacks to Retrieval-Augmented Generation of Large Language Models | Wei Zou et.al. | 2402.07867 | link |
2024-02-12 | Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | Siddharth Karamcheti et.al. | 2402.07865 | link |
2024-02-12 | AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy | Philipp Schoenegger et.al. | 2402.07862 | null |
2024-02-12 | Lissard: Long and Simple Sequential Reasoning Datasets | Mirelle Bueno et.al. | 2402.07859 | null |
2024-02-12 | Mercury: An Efficiency Benchmark for LLM Code Synthesis | Mingzhe Du et.al. | 2402.07844 | link |
2024-02-12 | Do Membership Inference Attacks Work on Large Language Models? | Michael Duan et.al. | 2402.07841 | link |
2024-02-12 | Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model | Ahmet Üstün et.al. | 2402.07827 | null |
2024-02-12 | Differentially Private Zeroth-Order Methods for Scalable Large Language Model Finetuning | Z Liu et.al. | 2402.07818 | null |
2024-02-12 | Injecting Wiktionary to improve token-level contextual representations using contrastive learning | Anna Mosolova et.al. | 2402.07817 | null |
2024-02-12 | Retrieval-Augmented Thought Process as Sequential Decision Making | Thomas Pouplin et.al. | 2402.07812 | null |
2024-02-12 | Empowering Federated Learning for Massive Models with NVIDIA FLARE | Holger R. Roth et.al. | 2402.07792 | null |
2024-02-12 | TELLER: A Trustworthy Framework for Explainable, Generalizable and Controllable Fake News Detection | Hui Liu et.al. | 2402.07776 | link |
2024-02-12 | Quantitative knowledge retrieval from large language models | David Selby et.al. | 2402.07770 | link |
2024-02-12 | Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model | Mikail Khona et.al. | 2402.07757 | null |
2024-02-09 | Feedback Loops With Language Models Drive In-Context Reward Hacking | Alexander Pan et.al. | 2402.06627 | link |
2024-02-09 | Understanding the Effects of Iterative Prompting on Truthfulness | Satyapriya Krishna et.al. | 2402.06625 | null |
2024-02-09 | Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning | Shivalika Singh et.al. | 2402.06619 | null |
2024-02-09 | FaBERT: Pre-training BERT on Persian Blogs | Mostafa Masumi et.al. | 2402.06617 | null |
2024-02-09 | On the Out-Of-Distribution Generalization of Multimodal Large Language Models | Xingxuan Zhang et.al. | 2402.06599 | null |
2024-02-09 | CigaR: Cost-efficient Program Repair with LLMs | Dávid Hidvégi et.al. | 2402.06598 | link |
2024-02-09 | Understanding the Weakness of Large Language Model Agents within a Complex Android Environment | Mingzhe Xing et.al. | 2402.06596 | link |
2024-02-09 | Self-consistent context aware conformer transducer for speech recognition | Konstantin Kolokolov et.al. | 2402.06592 | null |
2024-02-09 | G-SciEdBERT: A Contextualized LLM for Science Assessment Tasks in German | Ehsan Latif et.al. | 2402.06584 | null |
2024-02-09 | Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning | Amir Ziai et.al. | 2402.06560 | link |
2024-02-09 | The Quantified Boolean Bayesian Network: Theory and Experiments with a Logical Graphical Model | Gregory Coppola et.al. | 2402.06557 | link |
2024-02-09 | Bryndza at ClimateActivism 2024: Stance, Target and Hate Event Detection via Retrieval-Augmented GPT-4 and LLaMA | Marek Šuppa et.al. | 2402.06549 | link |
2024-02-09 | Calibrating Long-form Generations from Large Language Models | Yukun Huang et.al. | 2402.06544 | null |
2024-02-09 | Introspective Planning: Guiding Language-Enabled Agents to Refine Their Own Uncertainty | Kaiqu Liang et.al. | 2402.06529 | link |
2024-02-09 | Multimodal Clinical Trial Outcome Prediction with Large Language Models | Wenhao Zheng et.al. | 2402.06512 | link |
2024-02-09 | Iris-SAM: Iris Segmentation Using a Foundational Model | Parisa Farmanifard et.al. | 2402.06497 | link |
2024-02-09 | Large Language Models for Captioning and Retrieving Remote Sensing Images | João Daniel Silva et.al. | 2402.06475 | null |
2024-02-09 | V-STaR: Training Verifiers for Self-Taught Reasoners | Arian Hosseini et.al. | 2402.06457 | null |
2024-02-09 | StruQ: Defending Against Prompt Injection with Structured Queries | Sizhe Chen et.al. | 2402.06363 | null |
2024-02-09 | CoSearchAgent: A Lightweight Collaborative Search Agent with Large Language Models | Peiyuan Gong et.al. | 2402.06360 | link |
2024-02-08 | SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models | Peng Gao et.al. | 2402.05935 | link |
2024-02-08 | Driving Everywhere with Large Language Model Policy Adaptation | Boyi Li et.al. | 2402.05932 | null |
2024-02-08 | WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | Xing Han Lù et.al. | 2402.05930 | link |
2024-02-08 | An Interactive Agent Foundation Model | Zane Durante et.al. | 2402.05929 | null |
2024-02-08 | On the Convergence of Zeroth-Order Federated Tuning in Large Language Models | Zhenqing Ling et.al. | 2402.05926 | link |
2024-02-08 | Efficient Stagewise Pretraining via Progressive Subnetworks | Abhishek Panigrahi et.al. | 2402.05913 | null |
2024-02-08 | FACT-GPT: Fact-Checking Augmentation via Claim Matching with LLMs | Eun Cheol Choi et.al. | 2402.05904 | link |
2024-02-08 | Large Language Model Meets Graph Neural Network in Knowledge Distillation | Shengxiang Hu et.al. | 2402.05894 | null |
2024-02-08 | Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking | Nikhil Sharma et.al. | 2402.05880 | null |
2024-02-08 | PromptCrypt: Prompt Encryption for Secure Communication with Large Language Models | Guo Lin et.al. | 2402.05868 | link |
2024-02-08 | How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis | Federico Bianchi et.al. | 2402.05863 | link |
2024-02-08 | Let Your Graph Do the Talking: Encoding Structured Data for LLMs | Bryan Perozzi et.al. | 2402.05862 | null |
2024-02-08 | Learning to Route Among Specialized Experts for Zero-Shot Generalization | Mohammed Muqeeth et.al. | 2402.05859 | link |
2024-02-08 | Limitations of Agents Simulated by Predictive Models | Raymond Douglas et.al. | 2402.05829 | null |
2024-02-08 | Is it Possible to Edit Large Language Models Robustly? | Xinbei Ma et.al. | 2402.05827 | link |
2024-02-08 | Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models | Lingzhi Wang et.al. | 2402.05813 | null |
2024-02-08 | Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning | Zhiheng Xi et.al. | 2402.05808 | link |
2024-02-08 | How do Transformers perform In-Context Autoregressive Learning? | Michael E. Sander et.al. | 2402.05787 | null |
2024-02-08 | Limits of Transformer Language Models on Algorithmic Learning | Jonathan Thomm et.al. | 2402.05785 | null |
2024-02-08 | Text-to-Code Generation with Modality-relative Pre-training | Fenia Christopoulou et.al. | 2402.05783 | null |
2024-02-07 | Opening the AI black box: program synthesis via mechanistic interpretability | Eric J. Michaud et.al. | 2402.05110 | link |
2024-02-07 | You Can REST Now: Automated Specification Inference and Black-Box Testing of RESTful APIs with Large Language Models | Alix Decrop et.al. | 2402.05102 | null |
2024-02-07 | Hydragen: High-Throughput LLM Inference with Shared Prefixes | Jordan Juravsky et.al. | 2402.05099 | link |
2024-02-07 | Language-Based Augmentation to Address Shortcut Learning in Object Goal Navigation | Dennis Hoftijzer et.al. | 2402.05090 | link |
2024-02-07 | A Roadmap to Pluralistic Alignment | Taylor Sorensen et.al. | 2402.05070 | link |
2024-02-07 | SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models | Lijun Li et.al. | 2402.05044 | link |
2024-02-07 | How BERT Speaks Shakespearean English? Evaluating Historical Bias in Contextual Language Models | Miriam Cuscito et.al. | 2402.05034 | null |
2024-02-07 | A Sober Look at LLMs for Material Discovery: Are They Actually Good for Bayesian Optimization Over Molecules? | Agustinus Kristiadi et.al. | 2402.05015 | link |
2024-02-07 | Pedagogical Alignment of Large Language Models | Shashank Sonkar et.al. | 2402.05000 | null |
2024-02-07 | An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration | Yihao Li et.al. | 2402.04978 | null |
2024-02-07 | ChatScratch: An AI-Augmented System Toward Autonomous Visual Programming Learning for Children Aged 6-12 | Liuqing Chen et.al. | 2402.04975 | null |
2024-02-07 | Reconfidencing LLMs from the Grouping Loss Perspective | Lihu Chen et.al. | 2402.04957 | null |
2024-02-07 | Chatbots in Knowledge-Intensive Contexts: Comparing Intent and LLM-Based Systems | Samuel Kernan Freire et.al. | 2402.04955 | null |
2024-02-07 | Prompting Implicit Discourse Relation Annotation | Frances Yung et.al. | 2402.04918 | null |
2024-02-07 | Personalized Text Generation with Fine-Grained Linguistic Control | Bashar Alhafni et.al. | 2402.04914 | link |
2024-02-07 | L4Q: Parameter Efficient Quantization-Aware Training on Large Language Models via LoRA-wise LSQ | Hyesung Jeon et.al. | 2402.04902 | null |
2024-02-07 | Detecting Generated Native Ads in Conversational Search | Sebastian Schmidt et.al. | 2402.04889 | link |
2024-02-07 | Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback | Zheng Wang et.al. | 2402.04867 | null |
2024-02-07 | Automated Smart Contract Summarization via LLMs | Yingjie Mao et.al. | 2402.04863 | null |
2024-02-07 | CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay | Natasha Butt et.al. | 2402.04858 | link |
2024-02-06 | AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | Yu Du et.al. | 2402.04253 | link |
2024-02-06 | HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal | Mantas Mazeika et.al. | 2402.04249 | link |
2024-02-06 | Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks | Jongho Park et.al. | 2402.04248 | link |
2024-02-06 | Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science | Xiangru Tang et.al. | 2402.04247 | null |
2024-02-06 | CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations | Ji Qi et.al. | 2402.04236 | link |
2024-02-06 | Can Generative Agents Predict Emotion? | Ciaran Regan et.al. | 2402.04232 | null |
2024-02-06 | "Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors | Lin Guan et.al. | 2402.04210 | null |
2024-02-06 | Explaining Autonomy: Enhancing Human-Robot Interaction through Explanation Generation with Large Language Models | David Sobrín-Hidalgo et.al. | 2402.04206 | link |
2024-02-06 | SHIELD : An Evaluation Benchmark for Face Spoofing and Forgery Detection with Multimodal Large Language Models | Yichen Shi et.al. | 2402.04178 | link |
2024-02-06 | Scaling Laws for Downstream Task Performance of Large Language Models | Berivan Isik et.al. | 2402.04177 | null |
2024-02-06 | Harnessing the Plug-and-Play Controller by Prompting | Hao Wang et.al. | 2402.04160 | null |
2024-02-06 | Multi-line AI-assisted Code Authoring | Omer Dunay et.al. | 2402.04141 | null |
2024-02-06 | Advancing Legal Reasoning: The Integration of AI to Navigate Complexities and Biases in Global Jurisprudence with Semi-Automated Arbitration Processes (SAAPs) | Michael De'Shazer et.al. | 2402.04140 | null |
2024-02-06 | Scientific Language Modeling: A Quantitative Review of Large Language Models in Molecular Science | Pengfei Liu et.al. | 2402.04119 | link |
2024-02-06 | Measuring Implicit Bias in Explicitly Unbiased Large Language Models | Xuechunzi Bai et.al. | 2402.04105 | link |
2024-02-06 | The Use of a Large Language Model for Cyberbullying Detection | Bayode Ogunleye et.al. | 2402.04088 | null |
2024-02-06 | A Hard-to-Beat Baseline for Training-free CLIP-based Adaptation | Zhengbo Wang et.al. | 2402.04087 | link |
2024-02-06 | Provably learning a multi-head attention layer | Sitan Chen et.al. | 2402.04084 | null |
2024-02-06 | Iterative Prompt Refinement for Radiation Oncology Symptom Extraction Using Teacher-Student Large Language Models | Reza Khanmohammadi et.al. | 2402.04075 | null |
2024-02-06 | Retrieve to Explain: Evidence-driven Predictions with Language Models | Ravi Patel et.al. | 2402.04068 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-25 | Harnessing Temporal Causality for Advanced Temporal Action Detection | Shuming Liu et.al. | 2407.17792 | link |
2024-07-23 | EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval | Thomas Hummel et.al. | 2407.16658 | link |
2024-07-22 | LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding | Haoning Wu et.al. | 2407.15754 | link |
2024-07-23 | End-to-End Video Question Answering with Frame Scoring Mechanisms and Adaptive Sampling | Jianxin Liang et.al. | 2407.15047 | null |
2024-07-21 | Audio-visual training for improved grounding in video-text LLMs | Shivprasad Sagare et.al. | 2407.15046 | null |
2024-07-19 | EVLM: An Efficient Vision-Language Model for Visual Understanding | Kaibing Chen et.al. | 2407.14177 | null |
2024-07-19 | Reexamining Racial Disparities in Automatic Speech Recognition Performance: The Role of Confounding by Provenance | Changye Li et.al. | 2407.13982 | null |
2024-07-18 | Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data | Wufei Ma et.al. | 2407.13094 | null |
2024-07-17 | Goldfish: Vision-Language Understanding of Arbitrarily Long Videos | Kirolos Ataallah et.al. | 2407.12679 | null |
2024-07-16 | Scaling Sign Language Translation | Biao Zhang et.al. | 2407.11855 | null |
2024-07-23 | Video-Language Alignment via Spatio-Temporal Graph Transformer | Shi-Xue Zhang et.al. | 2407.11677 | link |
2024-07-04 | Purification Of Contaminated Convolutional Neural Networks Via Robust Recovery: An Approach with Theoretical Guarantee in One-Hidden-Layer Case | Hanxiao Lu et.al. | 2407.11031 | null |
2024-07-15 | TripletViNet: Mitigating Misinformation Video Spread Across Platforms | Petar Smolovic et.al. | 2407.10644 | null |
2024-07-12 | Open Vocabulary Multi-Label Video Classification | Rohit Gupta et.al. | 2407.09073 | null |
2024-07-11 | VideoMamba: Spatio-Temporal Selective State Space Model | Jinyoung Park et.al. | 2407.08476 | link |
2024-07-16 | Hypergraph Multi-modal Large Language Model: Exploiting EEG and Eye-tracking Modalities to Evaluate Heterogeneous Responses for Video Understanding | Minghui Wu et.al. | 2407.08150 | null |
2024-07-10 | Malicious Path Manipulations via Exploitation of Representation Vulnerabilities of Vision-Language Navigation Systems | Chashi Mahiul Islam et.al. | 2407.07392 | null |
2024-07-09 | Rethinking Image-to-Video Adaptation: An Object-centric Perspective | Rui Qian et.al. | 2407.06871 | null |
2024-07-09 | VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model | Xinhao Li et.al. | 2407.06491 | link |
2024-07-08 | Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision | Orr Zohar et.al. | 2407.06189 | link |
2024-07-06 | OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding | Tiancheng Zhao et.al. | 2407.04923 | null |
2024-07-20 | Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning | Thong Nguyen et.al. | 2407.03788 | null |
2024-07-04 | VDMA: Video Question Answering with Dynamically Generated Multi-Agents | Noriyuki Kugo et.al. | 2407.03610 | null |
2024-07-03 | InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | Pan Zhang et.al. | 2407.03320 | link |
2024-07-03 | KeyVideoLLM: Towards Large-scale Video Keyframe Selection | Hao Liang et.al. | 2407.03104 | null |
2024-07-03 | Align and Aggregate: Compositional Reasoning with Video Alignment and Answer Aggregation for Video Question-Answering | Zhaohe Liao et.al. | 2407.03008 | null |
2024-07-03 | PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition | Yanbin Hao et.al. | 2407.02934 | link |
2024-07-03 | Video Watermarking: Safeguarding Your Video from (Unauthorized) Annotations by Video-based LLMs | Jinmin Li et.al. | 2407.02411 | null |
2024-07-02 | The Solution for the ICCV 2023 Perception Test Challenge 2023 -- Task 6 -- Grounded videoQA | Hailiang Zhang et.al. | 2407.01907 | null |
2024-07-10 | Referring Atomic Video Action Recognition | Kunyu Peng et.al. | 2407.01872 | link |
2024-06-30 | Tarsier: Recipes for Training and Evaluating Large Video Description Models | Jiawei Wang et.al. | 2407.00634 | link |
2024-06-30 | Hierarchical Memory for Long Video QA | Yiqin Wang et.al. | 2407.00603 | null |
2024-06-28 | InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding | Kirolos Ataallah et.al. | 2406.19875 | link |
2024-06-27 | Fibottention: Inceptive Visual Representation Learning with Diverse Attention Across Heads | Ali Khaleghi Rahimian et.al. | 2406.19391 | link |
2024-06-27 | OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | Tao Zhang et.al. | 2406.19389 | null |
2024-06-27 | VideoMambaPro: A Leap Forward for Mamba in Video Understanding | Hui Lu et.al. | 2406.19006 | link |
2024-06-25 | Zero-Shot Long-Form Video Understanding through Screenplay | Yongliang Wu et.al. | 2406.17309 | null |
2024-06-24 | PVUW 2024 Challenge on Complex Video Understanding: Methods and Results | Henghui Ding et.al. | 2406.17005 | link |
2024-06-25 | OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer | Lu Zhang et.al. | 2406.16620 | link |
2024-06-24 | Directed Domain Fine-Tuning: Tailoring Separate Modalities for Specific Training Tasks | Daniel Wen et.al. | 2406.16346 | null |
2024-06-24 | VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models | Yuxuan Wang et.al. | 2406.16338 | null |
2024-06-22 | HCQA @ Ego4D EgoSchema Challenge 2024 | Haoyu Zhang et.al. | 2406.15771 | link |
2024-06-22 | video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | Guangzhi Sun et.al. | 2406.15704 | link |
2024-06-20 | MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | Xinyu Fang et.al. | 2406.14515 | link |
2024-06-20 | Live Video Captioning | Eduardo Blanco-Fernández et.al. | 2406.14206 | link |
2024-06-20 | Towards Event-oriented Long Video Understanding | Yifan Du et.al. | 2406.14129 | link |
2024-06-19 | Towards Holistic Language-video Representation: the language model-enhanced MSR-Video to Text Dataset | Yuchen Yang et.al. | 2406.13809 | null |
2024-06-21 | AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding | Alessandro Suglia et.al. | 2406.13807 | link |
2024-06-19 | GUI Action Narrator: Where and When Did That Action Take Place? | Qinchen Wu et.al. | 2406.13719 | null |
2024-06-19 | GVT2RPM: An Empirical Study for General Video Transformer Adaptation to Remote Physiological Measurement | Hao Wang et.al. | 2406.13136 | null |
2024-06-18 | DrVideo: Document Retrieval Based Long Video Understanding | Ziyu Ma et.al. | 2406.12846 | null |
2024-06-18 | VoCo-LLaMA: Towards Vision Compression with Large Language Models | Xubing Ye et.al. | 2406.12275 | link |
2024-06-26 | Slot State Space Models | Jindong Jiang et.al. | 2406.12272 | null |
2024-06-18 | Holmes-VAD: Towards Unbiased and Explainable Video Anomaly Detection via Multi-modal LLM | Huaxin Zhang et.al. | 2406.12235 | link |
2024-06-17 | Task Me Anything | Jieyu Zhang et.al. | 2406.11775 | link |
2024-06-17 | Hallucination Mitigation Prompts Long-term Video Understanding | Yiwei Sun et.al. | 2406.11333 | null |
2024-06-17 | VideoVista: A Versatile Benchmark for Video Understanding and Reasoning | Yunxin Li et.al. | 2406.11303 | null |
2024-06-17 | i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment | Daechul Ahn et.al. | 2406.11280 | link |
2024-06-16 | VELOCITI: Can Video-Language Models Bind Semantic Concepts through Time? | Darshana Saravanan et.al. | 2406.10889 | null |
2024-06-15 | EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos | Vineet Parikh et.al. | 2406.10750 | null |
2024-06-15 | Beyond Raw Videos: Understanding Edited Videos with Large Multimodal Model | Lu Xu et.al. | 2406.10484 | null |
2024-06-14 | Short Film Dataset (SFD): A Benchmark for Story-Level Video Understanding | Ridouane Ghermi et.al. | 2406.10221 | null |
2024-06-22 | Localizing Events in Videos with Multimodal Queries | Gengyuan Zhang et.al. | 2406.10079 | null |
2024-06-14 | GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding | Yiqi Wu et.al. | 2406.09781 | null |
2024-06-14 | A Survey of Video Datasets for Grounded Event Understanding | Kate Sanders et.al. | 2406.09646 | link |
2024-06-13 | VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding | Muhammad Maaz et.al. | 2406.09418 | link |
2024-06-17 | Too Many Frames, not all Useful:Efficient Strategies for Long-Form Video QA | Jongwoo Park et.al. | 2406.09396 | link |
2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
2024-06-13 | MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos | Xuehai He et.al. | 2406.08407 | link |
2024-06-12 | Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams | Haoji Zhang et.al. | 2406.08085 | link |
2024-06-12 | LVBench: An Extreme Long Video Understanding Benchmark | Weihan Wang et.al. | 2406.08035 | link |
2024-06-12 | Fewer Tokens and Fewer Videos: Extending Video Understanding Abilities in Large Vision-Language Models | Shimin Chen et.al. | 2406.08024 | null |
2024-06-12 | Labeling Comic Mischief Content in Online Videos with a Multimodal Hierarchical-Cross-Attention Model | Elaheh Baharlouei et.al. | 2406.07841 | link |
2024-06-17 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476 | link |
2024-06-11 | MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD | Ioanna Ntinou et.al. | 2406.07191 | null |
2024-06-10 | NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative | Asmar Nadeem et.al. | 2406.06499 | null |
2024-06-10 | Vript: A Video Is Worth Thousands of Words | Dongjie Yang et.al. | 2406.06040 | link |
2024-06-08 | 1st Place Winner of the 2024 Pixel-level Video Understanding in the Wild (CVPR'24 PVUW) Challenge in Video Panoptic Segmentation and Best Long Video Consistency of Video Semantic Segmentation | Qingfeng Liu et.al. | 2406.05352 | null |
2024-06-07 | Semantic Segmentation on VSPW Dataset through Masked Video Consistency | Chen Liang et.al. | 2406.04979 | null |
2024-06-06 | ShareGPT4Video: Improving Video Understanding and Generation with Better Captions | Lin Chen et.al. | 2406.04325 | null |
2024-06-06 | MLVU: A Comprehensive Benchmark for Multi-Task Long Video Understanding | Junjie Zhou et.al. | 2406.04264 | link |
2024-06-07 | 3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation | Ruipu Wu et.al. | 2406.04002 | null |
2024-06-04 | Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges | Daniel A. P. Oliveira et.al. | 2406.02748 | null |
2024-06-04 | Contrastive Language Video Time Pre-training | Hengyue Liu et.al. | 2406.02631 | null |
2024-05-21 | Backpropogation-Free Multi-modal On-Device Model Adaptation via Cloud-Device Collaboration | Wei Ji et.al. | 2406.01601 | null |
2024-06-03 | Differentiable Task Graph Learning: Procedural Activity Representation and Online Mistake Detection from Egocentric Videos | Luigi Seminara et.al. | 2406.01486 | link |
2024-06-02 | Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering | Xingrui Wang et.al. | 2406.00622 | link |
2024-06-01 | 2nd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation | Biao Wu et.al. | 2406.00500 | null |
2024-06-06 | HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model | Khoa Vo et.al. | 2406.00307 | null |
2024-05-31 | Shotluck Holmes: A Family of Efficient Small-Scale Large Language Vision Models For Video Captioning and Summarization | Richard Luo et.al. | 2405.20648 | link |
2024-05-30 | Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera | Inpyo Song et.al. | 2405.19794 | null |
2024-05-30 | Encoding and Controlling Global Semantics for Long-form Video Question Answering | Thong Thanh Nguyen et.al. | 2405.19723 | null |
2024-05-30 | EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos | Ryo Fujii et.al. | 2405.19644 | link |
2024-05-29 | VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | Ziyang Wang et.al. | 2405.19209 | link |
2024-05-28 | MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | Somnath Kumar et.al. | 2405.18358 | null |
2024-05-28 | Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions | Rui Zhang et.al. | 2405.17729 | null |
2024-05-27 | Video Enriched Retrieval Augmented Generation Using Aligned Video Captions | Kevin Dela Rosa et.al. | 2405.17706 | link |
2024-05-25 | Streaming Long Video Understanding with Large Language Models | Rui Qian et.al. | 2405.16009 | null |
2024-05-23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Jiuming Liu et.al. | 2405.14338 | null |
2024-05-22 | Synchronized Video Storytelling: Generating Video Narrations with Structured Storyline | Dingyi Yang et.al. | 2405.14040 | null |
2024-05-22 | TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment | Wei Li et.al. | 2405.13911 | null |
2024-05-22 | Dense Connector for MLLMs | Huanjin Yao et.al. | 2405.13800 | link |
2024-05-22 | VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | Yongxin Guo et.al. | 2405.13382 | link |
2024-05-21 | Anticipating Object State Changes | Victoria Manousaki et.al. | 2405.12789 | null |
2024-05-17 | Open-Vocabulary Spatio-Temporal Action Detection | Tao Wu et.al. | 2405.10832 | null |
2024-05-14 | Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis | Yao Fu et.al. | 2405.08944 | null |
2024-05-14 | CinePile: A Long Video Question Answering Dataset and Benchmark | Ruchit Rawal et.al. | 2405.08813 | null |
2024-05-14 | No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding | Yingjie Zhai et.al. | 2405.08344 | link |
2024-05-13 | FreeVA: Offline MLLM as Training-Free Video Assistant | Wenhao Wu et.al. | 2405.07798 | link |
2024-05-11 | Memory-Maze: Scenario Driven Benchmark and Visual Language Navigation Model for Guiding Blind People | Masaki Kuribayashi et.al. | 2405.07060 | null |
2024-05-11 | Retrieval Enhanced Zero-Shot Video Captioning | Yunchuan Ma et.al. | 2405.07046 | null |
2024-05-11 | Global Motion Understanding in Large-Scale Video Object Segmentation | Volodymyr Fedynyak et.al. | 2405.07031 | null |
2024-05-09 | A Survey on Backbones for Deep Video Action Recognition | Zixuan Tang et.al. | 2405.05584 | null |
2024-05-08 | Transfer-LMR: Heavy-Tail Driving Behavior Recognition in Diverse Traffic Scenarios | Chirag Parikh et.al. | 2405.05354 | null |
2024-05-07 | Vision Mamba: A Comprehensive Survey and Taxonomy | Xiao Liu et.al. | 2405.04404 | link |
2024-05-06 | Foundation Models for Video Understanding: A Survey | Neelu Madan et.al. | 2405.03770 | link |
2024-05-08 | How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690 | null |
2024-05-06 | WorldQA: Multimodal World Knowledge in Videos through Long-Chain Reasoning | Yuanhan Zhang et.al. | 2405.03272 | null |
2024-04-30 | Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition | Zhendong Liu et.al. | 2404.19383 | null |
2024-05-01 | Capabilities of Gemini Models in Medicine | Khaled Saab et.al. | 2404.18416 | null |
2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498 | null |
2024-04-26 | MovieChat+: Question-aware Sparse Memory for Long Video Question Answering | Enxin Song et.al. | 2404.17176 | link |
2024-04-26 | Open-Set Video-based Facial Expression Recognition with Human Expression-sensitive Prompting | Yuanyuan Liu et.al. | 2404.17100 | null |
2024-04-29 | PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning | Lin Xu et.al. | 2404.16994 | link |
2024-04-25 | SFMViT: SlowFast Meet ViT in Chaotic World | Jiaying Lin et.al. | 2404.16609 | link |
2024-04-23 | IPAD: Industrial Process Anomaly Detection Dataset | Jinfan Liu et.al. | 2404.15033 | null |
2024-04-23 | Pegasus-v1 Technical Report | Raehyuk Jung et.al. | 2404.14687 | null |
2024-04-26 | Narrative Action Evaluation with Prompt-Guided Multimodal Interaction | Shiyi Zhang et.al. | 2404.14471 | link |
2024-04-20 | Movie101v2: Improved Movie Narration Benchmark | Zihao Yue et.al. | 2404.13370 | null |
2024-04-18 | Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | Reka Team et.al. | 2404.12387 | null |
2024-04-18 | From Image to Video, what do we need in multimodal LLMs? | Suyuan Huang et.al. | 2404.11865 | null |
2024-04-17 | VG4D: Vision-Language Model Goes 4D Video Recognition | Zhichao Deng et.al. | 2404.11605 | link |
2024-04-15 | Leveraging Temporal Contextualization for Video Action Recognition | Minji Kim et.al. | 2404.09490 | null |
2024-04-15 | The 8th AI City Challenge | Shuo Wang et.al. | 2404.09432 | null |
2024-04-16 | Human-in-the-Loop Segmentation of Multi-species Coral Imagery | Scarlett Raine et.al. | 2404.09406 | link |
2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
2024-04-14 | TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning | Quang Minh Dinh et.al. | 2404.09275 | link |
2024-04-14 | Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection | Jin Yang et.al. | 2404.09263 | link |
2024-04-12 | Enhancing Traffic Safety with Parallel Dense Video Captioning for End-to-End Event Analysis | Maged Shoman et.al. | 2404.08229 | link |
2024-04-11 | Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval | Minkuk Kim et.al. | 2404.07610 | link |
2024-04-10 | A Transformer-Based Model for the Prediction of Human Gaze Behavior on Videos | Suleyman Ozdel et.al. | 2404.07351 | null |
2024-04-10 | Gaze-Guided Graph Neural Network for Action Anticipation Conditioned on Intention | Suleyman Ozdel et.al. | 2404.07347 | null |
2024-04-09 | MoReVQA: Exploring Modular Reasoning Models for Video Question Answering | Juhong Min et.al. | 2404.06511 | null |
2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332 | null |
2024-04-24 | MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Bo He et.al. | 2404.05726 | link |
2024-04-06 | SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos | Tao Wu et.al. | 2404.04565 | link |
2024-04-19 | Koala: Key frame-conditioned long video-LLM | Reuben Tan et.al. | 2404.04346 | null |
2024-04-05 | Neural-Symbolic VideoQA: Learning Compositional Spatio-Temporal Reasoning for Real-world Video Question Answering | Lili Liang et.al. | 2404.04007 | null |
2024-04-04 | OW-VISCap: Open-World Video Instance Segmentation and Captioning | Anwesa Choudhuri et.al. | 2404.03657 | null |
2024-04-04 | MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | Kirolos Ataallah et.al. | 2404.03413 | link |
2024-04-10 | LongVLM: Efficient Long Video Understanding via Large Language Models | Yuetian Weng et.al. | 2404.03384 | link |
2024-04-03 | DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement | Hao Wu et.al. | 2404.02755 | null |
2024-04-05 | SnAG: Scalable and Accurate Video Grounding | Fangzhou Mu et.al. | 2404.02257 | null |
2024-04-01 | TraveLER: A Multi-LMM Agent Framework for Video Question-Answering | Chuyi Shang et.al. | 2404.01476 | null |
2024-04-01 | CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes | Ting En Lam et.al. | 2404.01299 | link |
2024-04-01 | Streaming Dense Video Captioning | Xingyi Zhou et.al. | 2404.01297 | link |
2024-04-02 | Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward | Ruohong Zhang et.al. | 2404.01258 | link |
2024-04-01 | VideoDistill: Language-aware Vision Distillation for Video Question Answering | Bo Zou et.al. | 2404.00973 | null |
2024-03-31 | Ye Liu et.al. | 2404.00801 | link | |
2024-03-30 | Instrument-tissue Interaction Detection Framework for Surgical Video Understanding | Wenjun Lin et.al. | 2404.00322 | null |
2024-03-30 | ST-LLM: Large Language Models Are Effective Temporal Learners | Ruyang Liu et.al. | 2404.00308 | link |
2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
2024-03-28 | Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality | Sishuo Chen et.al. | 2403.19221 | link |
2024-03-27 | An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM | Wonkyun Kim et.al. | 2403.18406 | link |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-25 | Understanding Long Videos in One Multimodal Language Model Pass | Kanchana Ranasinghe et.al. | 2403.16998 | link |
2024-03-24 | AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue | Yunlong Tang et.al. | 2403.16276 | null |
2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | link |
2024-03-25 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743 | link |
2024-03-21 | Language Repository for Long Video Understanding | Kumara Kahatapitiya et.al. | 2403.14622 | link |
2024-03-21 | Ranking Distillation for Open-Ended Video Question Answering with Insufficient Labels | Tianming Liang et.al. | 2403.14430 | null |
2024-03-18 | Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation | Zixin Zhu et.al. | 2403.12042 | link |
2024-03-18 | Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Wangbo Zhao et.al. | 2403.11808 | link |
2024-03-27 | LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model | Yuxin Cao et.al. | 2403.11656 | null |
2024-03-18 | VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | Yue Fan et.al. | 2403.11481 | null |
2024-03-15 | VideoAgent: Long-form Video Understanding with Large Language Model as Agent | Xiaohan Wang et.al. | 2403.10517 | null |
2024-03-14 | Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding | Guo Chen et.al. | 2403.09626 | link |
2024-03-25 | Don't Judge by the Look: Towards Motion Coherent Video Representation | Yitian Zhang et.al. | 2403.09506 | link |
2024-03-13 | DAM: Dynamic Adapter Merging for Continual Video QA Learning | Feng Cheng et.al. | 2403.08755 | link |
2024-03-11 | Action Reimagined: Text-to-Pose Video Editing for Dynamic Human Actions | Lan Wang et.al. | 2403.07198 | null |
2024-03-12 | VideoMamba: State Space Model for Efficient Video Understanding | Kunchang Li et.al. | 2403.06977 | link |
2024-03-25 | An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | Liang Chen et.al. | 2403.06764 | link |
2024-03-08 | Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation | Joseph Cho et.al. | 2403.05131 | null |
2024-03-11 | Beyond MOT: Semantic Multi-Object Tracking | Yunhao Li et.al. | 2403.05021 | null |
2024-03-08 | Pix2Gif: Motion-Guided Diffusion for GIF Generation | Hitesh Kandala et.al. | 2403.04634 | null |
2024-03-05 | A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives | Simone Alberto Peirone et.al. | 2403.03037 | null |
2024-03-03 | MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies | Zhende Song et.al. | 2403.01422 | null |
2024-03-01 | Abductive Ego-View Accident Video Understanding for Safe Driving Perception | Jianwu Fang et.al. | 2403.00436 | null |
2024-02-29 | Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | Tsai-Shien Chen et.al. | 2402.19479 | null |
2024-03-11 | TV-TREES: Multimodal Entailment Trees for Neuro-Symbolic Video Reasoning | Kate Sanders et.al. | 2402.19467 | null |
2024-02-29 | Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition | Boyu Chen et.al. | 2402.18951 | null |
2024-02-27 | MCF-VC: Mitigate Catastrophic Forgetting in Class-Incremental Learning for Multimodal Video Captioning | Huiyu Xiong et.al. | 2402.17680 | null |
2024-02-25 | LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding | Yuxuan Wang et.al. | 2402.16050 | link |
2024-02-22 | Think before You Leap: Content-Aware Low-Cost Edge-Assisted Video Semantic Segmentation | Mingxuan Yan et.al. | 2402.14326 | null |
2024-02-21 | LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs | Yunxin Li et.al. | 2402.13546 | null |
2024-02-28 | Video ReCap: Recursive Captioning of Hour-Long Videos | Md Mohaiminul Islam et.al. | 2402.13250 | link |
2024-02-20 | VideoPrism: A Foundational Visual Encoder for Video Understanding | Long Zhao et.al. | 2402.13217 | null |
2024-02-20 | Slot-VLM: SlowFast Slots for Video-Language Modeling | Jiaqi Xu et.al. | 2402.13088 | null |
2024-02-19 | System Identification of Neural Systems: Going Beyond Images to Modelling Dynamics | Mai Gamal et.al. | 2402.12519 | null |
2024-02-19 | LVCHAT: Facilitating Long Video Comprehension | Yu Wang et.al. | 2402.12079 | link |
2024-02-28 | Are you Struggling? Dataset and Baselines for Struggle Determination in Assembly Videos | Shijia Feng et.al. | 2402.11057 | null |
2024-02-16 | Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering | David Romero et.al. | 2402.10698 | null |
2024-02-13 | World Model on Million-Length Video And Language With RingAttention | Hao Liu et.al. | 2402.08268 | link |
2024-02-12 | BDIQA: A New Dataset for Video Question Answering to Explore Cognitive Reasoning through Theory of Mind | Yuanyuan Mao et.al. | 2402.07402 | null |
2024-02-09 | Video Annotator: A framework for efficiently building video classifiers using vision-language models and active learning | Amir Ziai et.al. | 2402.06560 | link |
2024-02-09 | Dynamic swarms regulate the morphology and distribution of soft membrane domains | Aakanksha Gubbala et.al. | 2402.06518 | null |
2024-02-08 | Memory Consolidation Enables Long-Context Video Understanding | Ivana Balažević et.al. | 2402.05861 | null |
2024-02-06 | Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | Yang Jin et.al. | 2402.03161 | null |
2024-02-04 | Spatio-temporal Prompting Network for Robust Video Feature Extraction | Guanxiong Sun et.al. | 2402.02574 | link |
2024-02-02 | Simulator-Free Visual Domain Randomization via Video Games | Chintan Trivedi et.al. | 2402.01335 | link |
2024-01-30 | YTCommentQA: Video Question Answerability in Instructional Videos | Saelyne Yang et.al. | 2401.17343 | link |
2024-01-30 | Multi-granularity Correspondence Learning from Long-term Noisy Videos | Yijie Lin et.al. | 2401.16702 | null |
2024-01-29 | Cutup and Detect: Human Fall Detection on Cutup Untrimmed Videos Using a Large Foundational Video Understanding Model | Till Grutschus et.al. | 2401.16280 | null |
2024-01-25 | Knowledge Graph Supported Benchmark and Video Captioning for Basketball | Zeyu Xi et.al. | 2401.13888 | null |
2024-01-22 | ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition | Jiaming Zhou et.al. | 2401.11654 | null |
2024-01-21 | Exploring Missing Modality in Multimodal Egocentric Datasets | Merey Ramazanova et.al. | 2401.11470 | null |
2024-01-19 | Learning to Visually Connect Actions and their Effects | Eric Peh et.al. | 2401.10805 | null |
2024-01-28 | Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering | Haibo Wang et.al. | 2401.10711 | null |
2024-01-17 | CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding | Yunze Liu et.al. | 2401.09057 | null |
2024-01-16 | Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data | Yuhui Zhang et.al. | 2401.08567 | link |
2024-01-16 | Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization | Chongzhi Zhang et.al. | 2401.08232 | null |
2024-01-11 | Hierarchical Augmentation and Distillation for Class Incremental Audio-Visual Video Recognition | Yukun Zuo et.al. | 2401.06287 | link |
2024-01-10 | HaltingVT: Adaptive Token Halting Transformer for Efficient Video Recognition | Qian Wu et.al. | 2401.04975 | link |
2024-01-10 | SnapCap: Efficient Snapshot Compressive Video Captioning | Jianqiao Sun et.al. | 2401.04903 | null |
2024-01-08 | Efficient Selective Audio Masked Multimodal Bottleneck Transformer for Audio-Video Classification | Wentao Zhu et.al. | 2401.04154 | null |
2024-01-08 | Dr |
Chen Zhao et.al. | 2401.04105 | link |
2024-01-08 | STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question Answering | Yueqian Wang et.al. | 2401.03901 | link |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-07-25 | Harnessing Temporal Causality for Advanced Temporal Action Detection | Shuming Liu et.al. | 2407.17792 | link |
2024-07-24 | PrevPredMap: Exploring Temporal Modeling with Previous Predictions for Online Vectorized HD Map Construction | Nan Peng et.al. | 2407.17378 | link |
2024-07-24 | DVPE: Divided View Position Embedding for Multi-View 3D Object Detection | Jiasen Wang et.al. | 2407.16955 | link |
2024-07-22 | A divide-and-conquer approach for spatio-temporal analysis of large house price data from Greater London | Kapil Gupta et.al. | 2407.15905 | null |
2024-07-03 | Digital Twin-based Driver Risk-Aware Intelligent Mobility Analytics for Urban Transportation Management | Tao Li et.al. | 2407.15025 | null |
2024-07-18 | Physics-guided Active Sample Reweighting for Urban Flow Prediction | Wei Jiang et.al. | 2407.13605 | link |
2024-07-15 | Human-Centric Transformer for Domain Adaptive Action Recognition | Kun-Yu Lin et.al. | 2407.10860 | null |
2024-07-15 | Spatio-temporal neural distance fields for conditional generative modeling of the heart | Kristine Sørensen et.al. | 2407.10663 | link |
2024-07-12 | Open Vocabulary Multi-Label Video Classification | Rohit Gupta et.al. | 2407.09073 | null |
2024-07-09 | Rethinking Image-to-Video Adaptation: An Object-centric Perspective | Rui Qian et.al. | 2407.06871 | null |
2024-07-07 | Efficient Bayesian dynamic closed skew-normal model preserving mean and covariance for spatio-temporal data | Hajime Kuno et.al. | 2407.05288 | link |
2024-07-03 | Graph and Skipped Transformer: Exploiting Spatial and Temporal Modeling Capacities for Efficient 3D Human Pose Estimation | Mengmeng Cui et.al. | 2407.02990 | null |
2024-07-03 | PosMLP-Video: Spatial and Temporal Relative Position Encoding for Efficient Video Recognition | Yanbin Hao et.al. | 2407.02934 | link |
2024-07-16 | Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion | Bohan Li et.al. | 2407.02077 | link |
2024-06-25 | SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition | Guanghao Zhu et.al. | 2406.17538 | null |
2024-06-23 | Multi-Scale Temporal Difference Transformer for Video-Text Retrieval | Ni Wang et.al. | 2406.16111 | null |
2024-06-20 | ExVideo: Extending Video Diffusion Models via Parameter-Efficient Post-Tuning | Zhongjie Duan et.al. | 2406.14130 | link |
2024-06-20 | LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction | Kuang Wu et.al. | 2406.13988 | null |
2024-06-18 | RIGL: A Unified Reciprocal Approach for Tracing the Independent and Group Learning Processes | Xiaoshan Yu et.al. | 2406.12465 | null |
2024-06-18 | Translation Equivariant Transformer Neural Processes | Matthew Ashman et.al. | 2406.12409 | null |
2024-06-18 | LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition | Yunze Deng et.al. | 2406.12355 | null |
2024-06-15 | X-Ray spectral and temporal properties of LMXB 4U 1608-52- observed with AstroSat and NICER | Sree Bhattacherjee et.al. | 2406.10666 | null |
2024-06-13 | OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation | Junke Wang et.al. | 2406.09399 | link |
2024-06-13 | Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs | Zijia Zhao et.al. | 2406.09367 | link |
2024-06-17 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Zesen Cheng et.al. | 2406.07476 | link |
2024-06-11 | RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation | Mirgahney Mohamed et.al. | 2406.07169 | null |
2024-06-11 | AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding | Xing Zhang et.al. | 2406.07091 | null |
2024-06-07 | Joint Spatial-Temporal Modeling and Contrastive Learning for Self-supervised Heart Rate Measurement | Wei Qian et.al. | 2406.04942 | null |
2024-06-07 | Bayesian inference of Latent Spectral Shapes | Hiu Ching Yip et.al. | 2406.04915 | null |
2024-06-07 | MTS-Net: Dual-Enhanced Positional Multi-Head Self-Attention for 3D CT Diagnosis of May-Thurner Syndrome | Yixin Huang et.al. | 2406.04680 | link |
2024-06-05 | Non-stationary Spatio-Temporal Modeling Using the Stochastic Advection-Diffusion Equation | Martin Outzen Berild et.al. | 2406.03400 | link |
2024-06-04 | I4VGen: Image as Stepping Stone for Text-to-Video Generation | Xiefan Guo et.al. | 2406.02230 | null |
2024-06-03 | UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Xiang Wang et.al. | 2406.01188 | null |
2024-06-01 | DSCA: A Digital Subtraction Angiography Sequence Dataset and Spatio-Temporal Model for Cerebral Artery Segmentation | Qihang Xie et.al. | 2406.00341 | null |
2024-06-01 | A Review of Pulse-Coupled Neural Network Applications in Computer Vision and Image Processing | Nurul Rafi et.al. | 2406.00239 | null |
2024-05-31 | Streamflow Prediction with Uncertainty Quantification for Water Management: A Constrained Reasoning and Learning Approach | Mohammed Amine Gharsallaoui et.al. | 2406.00133 | null |
2024-05-31 | 4Diffusion: Multi-view Video Diffusion Model for 4D Generation | Haiyu Zhang et.al. | 2405.20674 | null |
2024-05-30 | Streaming Video Diffusion: Online Video Editing with Diffusion Models | Feng Chen et.al. | 2405.19726 | link |
2024-05-30 | Unlocking the Power of Spatial and Temporal Information in Medical Multimodal Pre-training | Jinxia Yang et.al. | 2405.19654 | link |
2024-05-30 | **F |