Skip to main content

Showing 1–50 of 136 results for author: Cui, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07543  [pdf, other

    cs.CV cs.CL

    MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

    Authors: Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchma… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2408.01122  [pdf, other

    cs.CL

    CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

    Authors: Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBen… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures

  3. arXiv:2407.20756  [pdf, other

    cs.CV cs.CL

    SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

    Authors: Zheng Liu, Hao Liang, Xijie Huang, Wentao Xiong, Qinhan Yu, Linzhuang Sun, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

    Abstract: Recently, with the rise of web images, managing and understanding large-scale image datasets has become increasingly important. Vision Large Language Models (VLLMs) have recently emerged due to their robust vision-understanding capabilities. However, training these models requires vast amounts of data, posing challenges to efficiency, effectiveness, data quality, and privacy. In this paper, we int… ▽ More

    Submitted 10 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  4. arXiv:2407.20213  [pdf, other

    cs.RO cs.CV

    Registering Neural 4D Gaussians for Endoscopic Surgery

    Authors: Yiming Huang, Beilei Cui, Ikemura Kei, Jiekai Zhang, Long Bai, Hongliang Ren

    Abstract: The recent advance in neural rendering has enabled the ability to reconstruct high-quality 4D scenes using neural networks. Although 4D neural reconstruction is popular, registration for such representations remains a challenging task, especially for dynamic scene registration in surgical planning and simulation. In this paper, we propose a novel strategy for dynamic surgical neural scene registra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  5. arXiv:2407.12820  [pdf, other

    cs.CL cs.AI cs.LG

    PQCache: Product Quantization-based KVCache for Long Context LLM Inference

    Authors: Hailin Zhang, Xiaodong Ji, Yilin Chen, Fangcheng Fu, Xupeng Miao, Xiaonan Nie, Weipeng Chen, Bin Cui

    Abstract: As the field of Large Language Models (LLMs) continues to evolve, the context length in inference is steadily growing. Key-Value Cache (KVCache), a crucial component in LLM inference, has now become the primary memory bottleneck due to limited GPU memory. Current methods selectively determine suitable keys and values for self-attention computation in LLMs to address the issue. However, they either… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.12117  [pdf, other

    cs.LG cs.DC

    Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs

    Authors: Pinxue Zhao, Hailin Zhang, Fangcheng Fu, Xiaonan Nie, Qibin Liu, Fang Yang, Yuanbo Peng, Dian Jiao, Shuaipeng Li, Jinbao Xue, Yangyu Tao, Bin Cui

    Abstract: Nowadays, Large Language Models (LLMs) have been trained using extended context lengths to foster more creative applications. However, long context training poses great challenges considering the constraint of GPU memory. It not only leads to substantial activation memory consumption during training, but also incurs considerable memory fragmentation. To facilitate long context training, existing f… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2407.06027  [pdf, other

    cs.CL

    PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

    Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More

    Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  8. arXiv:2407.03104  [pdf, other

    cs.CV cs.CL cs.MM

    KeyVideoLLM: Towards Large-scale Video Keyframe Selection

    Authors: Hao Liang, Jiapeng Li, Tianyi Bai, Xijie Huang, Linzhuang Sun, Zhengren Wang, Conghui He, Bin Cui, Chong Chen, Wentao Zhang

    Abstract: Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particular… ▽ More

    Submitted 10 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  9. arXiv:2407.02398  [pdf, other

    cs.CV

    Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

    Authors: Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, Bin Cui

    Abstract: Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/YangLing0818/consistency_flow_matching

  10. arXiv:2407.01937  [pdf, other

    cs.CL

    Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

    Authors: Linzhuang Sun, Hao Liang, Jingxuan Wei, Linkun Sun, Bihui Yu, Bin Cui, Wentao Zhang

    Abstract: In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computation… ▽ More

    Submitted 9 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  11. arXiv:2406.13200  [pdf, other

    cs.LG

    RobGC: Towards Robust Graph Condensation

    Authors: Xinyi Gao, Hongzhi Yin, Tong Chen, Guanhua Ye, Wentao Zhang, Bin Cui

    Abstract: Graph neural networks (GNNs) have attracted widespread attention for their impressive capability of graph representation learning. However, the increasing prevalence of large-scale graphs presents a significant challenge for GNN training due to their computational demands, limiting the applicability of GNNs in various scenarios. In response to this challenge, graph condensation (GC) is proposed as… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  12. arXiv:2406.13048  [pdf, other

    cs.CV

    Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings

    Authors: Ruijie Tang, Beilei Cui, Hongliang Ren

    Abstract: As the significance of simulation in medical care and intervention continues to grow, it is anticipated that a simplified and low-cost platform can be set up to execute personalized diagnoses and treatments. 3D Slicer can not only perform medical image analysis and visualization but can also provide surgical navigation and surgical planning functions. In this paper, we have chosen 3D Slicer as our… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by ICBIR 2024

  13. arXiv:2406.04277  [pdf, other

    cs.CV

    VideoTetris: Towards Compositional Text-to-Video Generation

    Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui

    Abstract: Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/YangLing0818/VideoTetris

  14. arXiv:2406.04271  [pdf, other

    cs.CL

    Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

    Authors: Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui

    Abstract: We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project: https://github.com/YangLing0818/buffer-of-thought-llm

  15. arXiv:2405.16640  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    A Survey of Multimodal Large Language Model from A Data-centric Perspective

    Authors: Tianyi Bai, Hao Liang, Binwang Wan, Yanran Xu, Xi Li, Shiyu Li, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Ping Huang, Jiulong Shan, Conghui He, Binhang Yuan, Wentao Zhang

    Abstract: Multimodal large language models (MLLMs) enhance the capabilities of standard large language models by integrating and processing data from multiple modalities, including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Spec… ▽ More

    Submitted 18 July, 2024; v1 submitted 26 May, 2024; originally announced May 2024.

  16. arXiv:2405.15193  [pdf, other

    cs.DB cs.DS

    CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs

    Authors: Zhuochen Fan, Yalun Cai, Zirui Liu, Jiarui Guo, Xin Fan, Tong Yang, Bin Cui

    Abstract: Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not require any prior knowledge of… ▽ More

    Submitted 3 August, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  17. arXiv:2405.14578  [pdf, other

    cs.LG

    Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

    Authors: Shuaipeng Li, Penghao Zhao, Hailin Zhang, Xingwu Sun, Hao Wu, Dian Jiao, Weiyan Wang, Chengjun Liu, Zheng Fang, Jinbao Xue, Yangyu Tao, Bin Cui, Di Wang

    Abstract: In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require c… ▽ More

    Submitted 4 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.08672  [pdf, other

    eess.IV cs.CV

    EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

    Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: early accepted by MICCAI 2024

  19. arXiv:2405.04114  [pdf, other

    cs.LG cs.AI

    Acceleration Algorithms in GNNs: A Survey

    Authors: Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

    Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the resear… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: 9 pages,3 figures

  20. arXiv:2405.00263  [pdf, other

    cs.CL cs.AI cs.LG

    Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

    Authors: Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, Weipeng Chen, Bin Cui

    Abstract: Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its ti… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  21. arXiv:2404.17360  [pdf, other

    cs.CV

    UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning

    Authors: Maoxun Yuan, Bo Cui, Tianyi Zhao, Xingxing Wei

    Abstract: Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  22. arXiv:2404.05648  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model

    Authors: Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo Wang, Xinyuan Zhang, Binbin Cui, Yi Li, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated st… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  23. arXiv:2403.07331  [pdf, other

    cs.IR cs.DB

    LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries

    Authors: Ziqi Yin, Shanshan Feng, Shang Liu, Gao Cong, Yew Soon Ong, Bin Cui

    Abstract: With the proliferation of spatio-textual data, Top-k KNN spatial keyword queries (TkQs), which return a list of objects based on a ranking function that evaluates both spatial and textual relevance, have found many real-life applications. Existing geo-textual indexes for TkQs use traditional retrieval models like BM25 to compute text relevance and usually exploit a simple linear function to comput… ▽ More

    Submitted 18 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  24. arXiv:2402.19473  [pdf, other

    cs.CV

    Retrieval-Augmented Generation for AI-Generated Content: A Survey

    Authors: Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui

    Abstract: Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training and inference costs. Retrieval-Augmented Generation… ▽ More

    Submitted 21 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Citing 353 papers, 22 pages, 1 table, 12 figures. Project: https://github.com/PKU-DAIR/RAG-Survey

  25. arXiv:2402.17563  [pdf, other

    cs.CV cs.AI cs.LG

    Structure-Guided Adversarial Training of Diffusion Models

    Authors: Ling Yang, Haotian Qian, Zhilong Zhang, Jingwei Liu, Bin Cui

    Abstract: Diffusion models have demonstrated exceptional efficacy in various generative applications. While existing models focus on minimizing a weighted sum of denoising score matching losses for data distribution modeling, their training primarily emphasizes instance-level optimization, overlooking valuable structural information within each mini-batch, indicative of pair-wise relationships among samples… ▽ More

    Submitted 4 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024

  26. arXiv:2402.16627  [pdf, other

    cs.CV cs.AI cs.LG

    Contextualized Diffusion Models for Text-Guided Image and Video Generation

    Authors: Ling Yang, Zhilong Zhang, Zhaochen Yu, Jingwei Liu, Minkai Xu, Stefano Ermon, Bin Cui

    Abstract: Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual relationships exclusively into the reverse process, often disregarding their relevance in the forward process. This inconsistency between forward and reverse processes m… ▽ More

    Submitted 3 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: ICLR 2024. Project: https://github.com/YangLing0818/ContextDiff

  27. arXiv:2402.12908  [pdf, other

    cs.CV cs.AI cs.LG

    RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

    Authors: Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui

    Abstract: Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-a… ▽ More

    Submitted 24 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: Project: https://github.com/YangLing0818/RealCompo

  28. arXiv:2401.16416  [pdf, other

    cs.CV

    Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

    Authors: Yiming Huang, Beilei Cui, Long Bai, Ziqi Guo, Mengya Xu, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previo… ▽ More

    Submitted 2 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

  29. arXiv:2401.11708  [pdf, other

    cs.CV cs.AI cs.LG

    Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

    Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui

    Abstract: Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessin… ▽ More

    Submitted 5 May, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: ICML 2024. Project: https://github.com/YangLing0818/RPG-DiffusionMaster

  30. arXiv:2401.06013  [pdf, other

    cs.CV cs.AI

    Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

    Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoR… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted by IPCAI 2024 (IJCAR Special Issue)

  31. arXiv:2401.02015  [pdf, other

    cs.CV cs.AI cs.LG

    Improving Diffusion-Based Image Synthesis with Context Prediction

    Authors: Ling Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin Cui

    Abstract: Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted by NeurIPS 2023

  32. arXiv:2312.10864  [pdf, ps, other

    cs.IR

    On-Device Recommender Systems: A Tutorial on The New-Generation Recommendation Paradigm

    Authors: Hongzhi Yin, Tong Chen, Liang Qu, Bin Cui

    Abstract: Given the sheer volume of contemporary e-commerce applications, recommender systems (RSs) have gained significant attention in both academia and industry. However, traditional cloud-based RSs face inevitable challenges, such as resource-intensive computation, reliance on network access, and privacy breaches. In response, a new paradigm called on-device recommender systems (ODRSs) has emerged recen… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Technical tutorial; to appear at The Web Conference 2024

  33. arXiv:2312.03256  [pdf, other

    cs.LG

    CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

    Authors: Hailin Zhang, Zirui Liu, Boxuan Chen, Yikai Zhao, Tong Zhao, Tong Yang, Bin Cui

    Abstract: Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embed… ▽ More

    Submitted 26 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  34. arXiv:2311.18244  [pdf, other

    cs.IR cs.CR cs.LG

    Unveiling Vulnerabilities of Contrastive Recommender Systems to Poisoning Attacks

    Authors: Zongwei Wang, Junliang Yu, Min Gao, Hongzhi Yin, Bin Cui, Shazia Sadiq

    Abstract: Contrastive learning (CL) has recently gained prominence in the domain of recommender systems due to its great ability to enhance recommendation accuracy and improve model robustness. Despite its advantages, this paper identifies a vulnerability of CL-based recommender systems that they are more susceptible to poisoning attacks aiming to promote individual items. Our analysis indicates that this v… ▽ More

    Submitted 25 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: 12 pages, 7 figures

  35. arXiv:2311.15578  [pdf, other

    cs.LG cs.DB cs.IR

    Experimental Analysis of Large-scale Learnable Vector Storage Compression

    Authors: Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

    Abstract: Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of mo… ▽ More

    Submitted 13 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

  36. arXiv:2311.15566  [pdf, other

    cs.DC cs.CL cs.LG

    SpotServe: Serving Generative Large Language Models on Preemptible Instances

    Authors: Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

    Abstract: The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on pre… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: ASPLOS 2024

  37. arXiv:2311.07164  [pdf, other

    cs.ET cs.AI cs.AR

    Pruning random resistive memory for optimizing analogue AI

    Authors: Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic device… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  38. arXiv:2310.10998  [pdf, other

    cs.LG cs.AI

    Accelerating Scalable Graph Neural Network Inference with Node-Adaptive Propagation

    Authors: Xinyi Gao, Wentao Zhang, Junliang Yu, Yingxia Shao, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

    Abstract: Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications. However, the sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs. Although existing Scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure, these methods still suffer from scalability is… ▽ More

    Submitted 9 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: 2024 IEEE 40th International Conference on Data Engineering (ICDE). arXiv admin note: substantial text overlap with arXiv:2211.00495

  39. arXiv:2309.15675  [pdf, other

    cs.CV

    SJTU-TMQA: A quality assessment database for static mesh with texture map

    Authors: Bingyang Cui, Qi Yang, Kaifa Yang, Yiling Xu, Xiaozhong Xu, Shan Liu

    Abstract: In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compressi… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  40. arXiv:2309.13335  [pdf, other

    cs.IR

    Model-enhanced Vector Index

    Authors: Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

    Abstract: Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to sup… ▽ More

    Submitted 9 November, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

  41. arXiv:2309.13169  [pdf, other

    cs.DC cs.NI

    Cloudy Forecast: How Predictable is Communication Latency in the Cloud?

    Authors: Owen Hilyard, Bocheng Cui, Marielle Webster, Abishek Bangalore Muralikrishna, Aleksey Charapko

    Abstract: Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components to communicate on time -- a delay in communication may trigger the failure detector or cause the syst… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  42. arXiv:2309.12239  [pdf, other

    cs.DB

    ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems

    Authors: Jinqing Lian, Xinyi Zhang, Yingxia Shao, Zenglin Pu, Qingfeng Xiang, Yawen Li, Bin Cui

    Abstract: The past decade has seen rapid growth of distributed stream data processing systems. Under these systems, a stream application is realized as a Directed Acyclic Graph (DAG) of operators, where the level of parallelism of each operator has a substantial impact on its overall performance. However, finding optimal levels of parallelism remains challenging. Most existing methods are heavily coupled wi… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  43. Towards General and Efficient Online Tuning for Spark

    Authors: Yang Li, Huaijun Jiang, Yu Shen, Yide Fang, Xiaofeng Yang, Danqing Huang, Xinyi Zhang, Wentao Zhang, Ce Zhang, Peng Chen, Bin Cui

    Abstract: The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general a… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Journal ref: Proceedings of the VLDB Endowment 2023

  44. arXiv:2308.09568  [pdf, other

    cs.CV cs.CL

    PUMGPT: A Large Vision-Language Model for Product Understanding

    Authors: Wei Xue, Zongyi Guo, Baoliang Cui, Zheng Xing, Xiaoyi Zeng, Xiufei Wang, Shuhui Wu, Weiming Lu

    Abstract: E-commerce platforms benefit from accurate product understanding to enhance user experience and operational efficiency. Traditional methods often focus on isolated tasks such as attribute extraction or categorization, posing adaptability issues to evolving tasks and leading to usability challenges with noisy data from the internet. Current Large Vision Language Models (LVLMs) lack domain-specific… ▽ More

    Submitted 16 June, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

  45. arXiv:2308.08823  [pdf, other

    cs.LG

    Mitigating Semantic Confusion from Hostile Neighborhood for Graph Active Learning

    Authors: Tianmeng Yang, Min Zhou, Yujing Wang, Zhengjie Lin, Lujia Pan, Bin Cui, Yunhai Tong

    Abstract: Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges. One major challenge is that existing GAL strategies may introduce semantic confusion to the selected training set, particularly when graphs are noisy. Specifically, most… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: Accepted by CIKM 2023

  46. arXiv:2308.02117  [pdf, other

    cs.LG cs.AI cs.CV

    VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs

    Authors: Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, Jure Leskovec

    Abstract: GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph str… ▽ More

    Submitted 6 March, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: ICLR 2024. Code: https://github.com/YangLing0818/VQGraph

  47. arXiv:2307.15244  [pdf, other

    cs.SI cs.AI

    BOURNE: Bootstrapped Self-supervised Learning Framework for Unified Graph Anomaly Detection

    Authors: Jie Liu, Mengting He, Xuequn Shang, Jieming Shi, Bin Cui, Hongzhi Yin

    Abstract: Graph anomaly detection (GAD) has gained increasing attention in recent years due to its critical application in a wide range of domains, such as social networks, financial risk management, and traffic analysis. Existing GAD methods can be categorized into node and edge anomaly detection models based on the type of graph objects being detected. However, these methods typically treat node and edge… ▽ More

    Submitted 19 November, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  48. arXiv:2307.05898  [pdf, other

    cs.CV

    Rectifying Noisy Labels with Sequential Prior: Multi-Scale Temporal Feature Affinity Learning for Robust Video Segmentation

    Authors: Beilei Cui, Minqing Zhang, Mengya Xu, An Wang, Wu Yuan, Hongliang Ren

    Abstract: Noisy label problems are inevitably in existence within medical image segmentation causing severe performance degradation. Previous segmentation methods for noisy label problems only utilize a single image while the potential of leveraging the correlation between images has been overlooked. Especially for video segmentation, adjacent frames contain rich contextual information beneficial in cognizi… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

    Comments: Accepted by MICCAI 2023

  49. arXiv:2307.02031  [pdf, other

    cs.LG cs.DB cs.DC

    Improving Automatic Parallel Training via Balanced Memory Workload Optimization

    Authors: Yujie Wang, Youhe Jiang, Xupeng Miao, Fangcheng Fu, Shenhan Zhu, Xiaonan Nie, Yaofeng Tu, Bin Cui

    Abstract: Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains, serving as the foundation for advanced large-scale deep learning (DL) models. However, efficiently training these models across multiple GPUs remains a complex challenge due to the abundance of parallelism options. Existing DL systems either require manual efforts… ▽ More

    Submitted 24 February, 2024; v1 submitted 5 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.13878

  50. arXiv:2306.15902  [pdf, other

    cs.LG cs.AI cs.CV

    Individual and Structural Graph Information Bottlenecks for Out-of-Distribution Generalization

    Authors: Ling Yang, Jiayi Zheng, Heyuan Wang, Zhongyi Liu, Zhilin Huang, Shenda Hong, Wentao Zhang, Bin Cui

    Abstract: Out-of-distribution (OOD) graph generalization are critical for many real-world applications. Existing methods neglect to discard spurious or noisy features of inputs, which are irrelevant to the label. Besides, they mainly conduct instance-level class-invariant graph learning and fail to utilize the structural class relationships between graph instances. In this work, we endeavor to address these… ▽ More

    Submitted 27 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE Transactions on Knowledge and Data Engineering (TKDE)