Skip to main content

Showing 1–50 of 1,050 results for author: Liu, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08862  [pdf, other

    cs.LG

    Visual Agents as Fast and Slow Thinkers

    Authors: Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu

    Abstract: Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident respon… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  2. arXiv:2408.08645  [pdf, other

    cs.CV

    Extracting polygonal footprints in off-nadir images with Segment Anything Model

    Authors: Kai Li, Jingbo Chen, Yupeng Deng, Yu Meng, Diyou Liu, Junxian Ma, Chenhao Wang

    Abstract: Building Footprint Extraction (BFE) in off-nadir aerial images often relies on roof segmentation and roof-to-footprint offset prediction, then drugging roof-to-footprint via the offset. However, the results from this multi-stage inference are not applicable in data production, because of the low quality of masks given by prediction. To solve this problem, we proposed OBMv2 in this paper, which sup… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  3. arXiv:2408.08604  [pdf, other

    cs.CV

    Bi-Directional Deep Contextual Video Compression

    Authors: Xihua Sheng, Li Li, Dong Liu, Shiqi Wang

    Abstract: Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind that of traditional bi-directional video codecs. In this paper, we introduce a bi-directional deep contextual video compression scheme tailored for B-frames, te… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  4. arXiv:2408.08585  [pdf, other

    cs.IR cs.LG

    OptDist: Learning Optimal Distribution for Customer Lifetime Value Prediction

    Authors: Yunpeng Weng, Xing Tang, Zhenhao Xu, Fuyuan Lyu, Dugang Liu, Zexu Sun, Xiuqiang He

    Abstract: Customer Lifetime Value (CLTV) prediction is a critical task in business applications. Accurately predicting CLTV is challenging in real-world business scenarios, as the distribution of CLTV is complex and mutable. Firstly, there is a large number of users without any consumption consisting of a long-tailed part that is too complex to fit. Secondly, the small set of high-value users spent orders o… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: CIKM 2024

  5. arXiv:2408.08105  [pdf, other

    cs.CV cs.AI

    Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

    Authors: Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai

    Abstract: Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 20 pages

  6. arXiv:2408.05533  [pdf, other

    cs.CV

    Radiance Field Learners As UAV First-Person Viewers

    Authors: Liqi Yan, Qifan Wang, Junhan Zhao, Qiang Guan, Zheng Tang, Jianhui Zhang, Dongfang Liu

    Abstract: First-Person-View (FPV) holds immense potential for revolutionizing the trajectory of Unmanned Aerial Vehicles (UAVs), offering an exhilarating avenue for navigating complex building structures. Yet, traditional Neural Radiance Field (NeRF) methods face challenges such as sampling single points per iteration and requiring an extensive array of views for supervision. UAV videos exacerbate these iss… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

    Journal ref: European Conference on Computer Vision (ECCV 2024)

  7. arXiv:2408.03220  [pdf, other

    cs.LG cs.DC

    Masked Random Noise for Communication Efficient Federaetd Learning

    Authors: Shiwei Li, Yingyi Cheng, Haozhao Wang, Xing Tang, Shijie Xu, Weihong Luo, Yuhua Li, Dugang Liu, Xiuqiang He, and Ruixuan Li

    Abstract: Federated learning is a promising distributed training paradigm that effectively safeguards data privacy. However, it may involve significant communication costs, which hinders training efficiency. In this paper, we aim to enhance communication efficiency from a new perspective. Specifically, we request the distributed clients to find optimal model updates relative to global model parameters withi… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by MM 2024

  8. arXiv:2408.03085  [pdf, ps, other

    quant-ph cs.LG

    Matrix Multiplication on Quantum Computer

    Authors: Jiaqi Yao, Ding Liu

    Abstract: This paper introduces an innovative and practical approach to universal quantum matrix multiplication. We designed optimized quantum adders and multipliers based on Quantum Fourier Transform (QFT), which significantly reduced the number of gates used compared to classical adders and multipliers. Subsequently, we construct a basic universal quantum matrix multiplication and extend it to the Strasse… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  9. arXiv:2408.02693  [pdf, other

    physics.comp-ph cs.AI

    Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models

    Authors: Chuan Liu, Chunshu Wu, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng

    Abstract: Sustainable energy is a crucial global challenge, and recent breakthroughs in nuclear fusion ignition underscore the potential of harnessing energy extracted from nuclear fusion in everyday life, thereby drawing significant attention to fusion ignition research, especially Laser-Plasma Interaction (LPI). Unfortunately, the complexity of LPI at ignition scale renders theory-based analysis nearly im… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  10. arXiv:2408.02657  [pdf, other

    cs.CV

    Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining

    Authors: Dongyang Liu, Shitian Zhao, Le Zhuo, Weifeng Lin, Yu Qiao, Hongsheng Li, Peng Gao

    Abstract: We present Lumina-mGPT, a family of multimodal autoregressive models capable of various vision and language tasks, particularly excelling in generating flexible photorealistic images from text descriptions. Unlike existing autoregressive image generation approaches, Lumina-mGPT employs a pretrained decoder-only transformer as a unified framework for modeling multimodal token sequences. Our key ins… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Code available at: https://github.com/Alpha-VLLM/Lumina-mGPT

  11. arXiv:2408.02634  [pdf, other

    cs.GT q-fin.MF q-fin.TR

    CLVR Ordering of Transactions on AMMs

    Authors: Robert McLaughlin, Nir Chemaya, Dingyue Liu, Dahlia Malkhi

    Abstract: Trading on decentralized exchanges via an Automated Market Maker (AMM) mechanism has been massively adopted, with a daily trading volume reaching $1B. This trading method has also received close attention from researchers, central banks, and financial firms, who have the potential to adopt it to traditional financial markets such as foreign exchanges and stock markets. A critical challenge of AMM-… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  12. arXiv:2408.01779  [pdf, other

    cs.CL

    MathLearner: A Large Language Model Agent Framework for Learning to Solve Mathematical Problems

    Authors: Wenbei Xie, Donglin Liu, Haoran Yan, Wenjie Wu, Zongyang Liu

    Abstract: With the development of artificial intelligence (AI), large language models (LLM) are widely used in many fields. However, the reasoning ability of LLM is still very limited when it comes to mathematical reasoning. Mathematics plays an important role in all aspects of human society and is a technical guarantee in the fields of healthcare, transport and aerospace, for this reason, the development o… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  13. arXiv:2408.01701  [pdf, other

    cs.CV

    Signal-SGN: A Spiking Graph Convolutional Network for Skeletal Action Recognition via Learning Temporal-Frequency Dynamics

    Authors: Naichuan Zheng, Hailun Xia, Dapeng Liu

    Abstract: In skeletal-based action recognition, Graph Convolutional Networks (GCNs) based methods face limitations due to their complexity and high energy consumption. Spiking Neural Networks (SNNs) have gained attention in recent years for their low energy consumption, but existing methods combining GCNs and SNNs fail to fully utilize the temporal characteristics of skeletal sequences, leading to increased… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  14. arXiv:2408.00790  [pdf, other

    cs.NE cs.AI

    Improving Air Mobility for Pre-Disaster Planning with Neural Network Accelerated Genetic Algorithm

    Authors: Kamal Acharya, Alvaro Velasquez, Yongxin Liu, Dahai Liu, Liang Sun, Houbing Song

    Abstract: Weather disaster related emergency operations pose a great challenge to air mobility in both aircraft and airport operations, especially when the impact is gradually approaching. We propose an optimized framework for adjusting airport operational schedules for such pre-disaster scenarios. We first, aggregate operational data from multiple airports and then determine the optimal count of evacuation… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 7 pages, 8 figures, ITSC 2024

  15. FlowGPT: Exploring Domains, Output Modalities, and Goals of Community-Generated AI Chatbots

    Authors: Xian Li, Yuanning Han, Di Liu, Pengcheng An, Shuo Niu

    Abstract: The advent of Generative AI and Large Language Models has not only enhanced the intelligence of interactive applications but also catalyzed the formation of communities passionate about customizing these AI capabilities. FlowGPT, an emerging platform for sharing AI prompts and use cases, exemplifies this trend, attracting many creators who develop and share chatbots with a broader community. Despi… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: To appear at CSCW Companion '24

  16. arXiv:2407.21714  [pdf, other

    cs.AI q-bio.QM

    UMMAN: Unsupervised Multi-graph Merge Adversarial Network for Disease Prediction Based on Intestinal Flora

    Authors: Dingkun Liu, Hongjie Zhou, Yilu Qu, Huimei Zhang, Yongdong Xu

    Abstract: The abundance of intestinal flora is closely related to human diseases, but diseases are not caused by a single gut microbe. Instead, they result from the complex interplay of numerous microbial entities. This intricate and implicit connection among gut microbes poses a significant challenge for disease prediction using abundance information from OTU data. Recently, several methods have shown pote… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  17. arXiv:2407.21500  [pdf, other

    cs.RO

    DIABLO: A 6-DoF Wheeled Bipedal Robot Composed Entirely of Direct-Drive Joints

    Authors: Dingchuan Liu, Fangfang Yang, Xuanhong Liao, Ximin Lyu

    Abstract: Wheeled bipedal robots offer the advantages of both wheeled and legged robots, combining the ability to traverse a wide range of terrains and environments with high efficiency. However, the conventional approach in existing wheeled bipedal robots involves motor-driven joints with high-ratio gearboxes. While this approach provides specific benefits, it also presents several challenges, including in… ▽ More

    Submitted 1 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: This paper has already been accepted by IROS 2024

  18. arXiv:2407.20103  [pdf, other

    cs.HC

    What Can Interactive Visualization do for Participatory Budgeting in Chicago?

    Authors: Alex Kale, Danni Liu, Maria Gabriela Ayala, Harper Schwab, Andrew McNutt

    Abstract: Participatory budgeting (PB) is a democratic approach to allocating municipal spending that has been adopted in many places in recent years, including in Chicago. Current PB voting resembles a ballot where residents are asked which municipal projects, such as school improvements and road repairs, to fund with a limited budget. In this work, we ask how interactive visualization can benefit PB by co… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  19. arXiv:2407.19460  [pdf, other

    cs.CV

    White Matter Geometry-Guided Score-Based Diffusion Model for Tissue Microstructure Imputation in Tractography Imaging

    Authors: Yui Lo, Yuqian Chen, Fan Zhang, Dongnan Liu, Leo Zekelman, Suheyla Cetin-Karayumak, Yogesh Rathi, Weidong Cai, Lauren J. O'Donnell

    Abstract: Parcellation of white matter tractography provides anatomical features for disease prediction, anatomical tract segmentation, surgical brain mapping, and non-imaging phenotype classifications. However, parcellation does not always reach 100% accuracy due to various factors, including inter-individual anatomical variability and the quality of neuroimaging scan data. The failure to identify parcels… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 12 pages, 3 figures, 2 tables

  20. arXiv:2407.19402  [pdf, other

    cs.CV eess.IV

    NVC-1B: A Large Neural Video Coding Model

    Authors: Xihua Sheng, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: The emerging large models have achieved notable progress in the fields of natural language processing and computer vision. However, large models for neural video coding are still unexplored. In this paper, we try to explore how to build a large neural video coding model. Based on a small baseline model, we gradually scale up the model sizes of its different coding parts, including the motion encod… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  21. arXiv:2407.19198  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Towards the Dynamics of a DNN Learning Symbolic Interactions

    Authors: Qihan Ren, Yang Xu, Junpeng Zhang, Yue Xin, Dongrui Liu, Quanshi Zhang

    Abstract: This study proves the two-phase dynamics of a deep neural network (DNN) learning interactions. Despite the long disappointing view of the faithfulness of post-hoc explanation of a DNN, in recent years, a series of theorems have been proven to show that given an input sample, a small number of interactions between input variables can be considered as primitive inference patterns, which can faithful… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  22. arXiv:2407.17172  [pdf, other

    cs.SD cs.CL eess.AS

    Speech Editing -- a Summary

    Authors: Tobias Kässmann, Yining Liu, Danni Liu

    Abstract: With the rise of video production and social media, speech editing has become crucial for creators to address issues like mispronunciations, missing words, or stuttering in audio recordings. This paper explores text-based speech editing methods that modify audio via text transcripts without manual waveform editing. These approaches ensure edited audio is indistinguishable from the original by alte… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  23. arXiv:2407.14367  [pdf, other

    cs.CV

    Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations

    Authors: Decheng Liu, Zongqi Wang, Chunlei Peng, Nannan Wang, Ruimin Hu, Xinbo Gao

    Abstract: Due to the successful development of deep image generation technology, forgery detection plays a more important role in social and economic security. Racial bias has not been explored thoroughly in the deep forgery detection field. In the paper, we first contribute a dedicated dataset called the Fair Forgery Detection (FairFD) dataset, where we prove the racial bias of public state-of-the-art (SOT… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  24. arXiv:2407.14245  [pdf, other

    cs.CV

    Dataset Distillation by Automatic Training Trajectories

    Authors: Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, Martin Schulz

    Abstract: Dataset Distillation is used to create a concise, yet informative, synthetic dataset that can replace the original dataset for training purposes. Some leading methods in this domain prioritize long-range matching, involving the unrolling of training trajectories with a fixed number of steps (NS) on the synthetic dataset to align with various expert training trajectories. However, traditional long-… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: The paper is accepted at ECCV 2024

  25. arXiv:2407.13246  [pdf, other

    cs.CV

    STS MICCAI 2023 Challenge: Grand challenge on 2D and 3D semi-supervised tooth segmentation

    Authors: Yaqi Wang, Yifan Zhang, Xiaodiao Chen, Shuai Wang, Dahong Qian, Fan Ye, Feng Xu, Hongyuan Zhang, Qianni Zhang, Chengyu Wu, Yunxiang Li, Weiwei Cui, Shan Luo, Chengkai Wang, Tianhao Li, Yi Liu, Xiang Feng, Huiyu Zhou, Dongyun Liu, Qixuan Wang, Zhouhao Lin, Wei Song, Yuanlin Li, Bing Wang, Chunshi Wang , et al. (2 additional authors not shown)

    Abstract: Computer-aided design (CAD) tools are increasingly popular in modern dental practice, particularly for treatment planning or comprehensive prognosis evaluation. In particular, the 2D panoramic X-ray image efficiently detects invisible caries, impacted teeth and supernumerary teeth in children, while the 3D dental cone beam computed tomography (CBCT) is widely used in orthodontics and endodontics d… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  26. arXiv:2407.12870  [pdf, other

    q-bio.QM cs.LG eess.IV

    Revisiting Adaptive Cellular Recognition Under Domain Shifts: A Contextual Correspondence View

    Authors: Jianan Fan, Dongnan Liu, Canran Li, Hang Chang, Heng Huang, Filip Braet, Mei Chen, Weidong Cai

    Abstract: Cellular nuclei recognition serves as a fundamental and essential step in the workflow of digital pathology. However, with disparate source organs and staining procedures among histology image clusters, the scanned tiles inherently conform to a non-uniform data distribution, which induces deteriorated promises for general cross-cohort usages. Despite the latest efforts leveraging domain adaptation… ▽ More

    Submitted 19 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 main conference

  27. arXiv:2407.12423  [pdf, other

    cs.HC cs.AI

    StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

    Authors: Zixin Chen, Jiachen Wang, Meng Xia, Kento Shigyo, Dingdong Liu, Rong Zhang, Huamin Qu

    Abstract: The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this e… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages. To be published at IEEE Visualization 2024

  28. arXiv:2407.12344  [pdf, other

    cs.CL cs.CY

    The Better Angels of Machine Personality: How Personality Relates to LLM Safety

    Authors: Jie Zhang, Dongrui Liu, Chen Qian, Ziyue Gan, Yong Liu, Yu Qiao, Jing Shao

    Abstract: Personality psychologists have analyzed the relationship between personality and safety behaviors in human society. Although Large Language Models (LLMs) demonstrate personality traits, the relationship between personality traits and safety abilities in LLMs still remains a mystery. In this paper, we discover that LLMs' personality traits are closely related to their safety abilities, i.e., toxici… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  29. arXiv:2407.12192  [pdf, other

    cs.HC

    Towards Dataset-scale and Feature-oriented Evaluation of Text Summarization in Large Language Model Prompts

    Authors: Sam Yu-Te Lee, Aryaman Bahukhandi, Dongyu Liu, Kwan-Liu Ma

    Abstract: Recent advancements in Large Language Models (LLMs) and Prompt Engineering have made chatbot customization more accessible, significantly reducing barriers to tasks that previously required programming skills. However, prompt evaluation, especially at the dataset scale, remains complex due to the need to assess prompts across thousands of test instances within a dataset. Our study, based on a comp… ▽ More

    Submitted 18 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  30. arXiv:2407.11541  [pdf, other

    eess.IV cs.CV

    Uniformly Accelerated Motion Model for Inter Prediction

    Authors: Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

    Abstract: Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear… ▽ More

    Submitted 21 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: 5 pages, 4 figures

  31. arXiv:2407.11098  [pdf, other

    cs.LG cs.AI

    Inertial Confinement Fusion Forecasting via LLMs

    Authors: Mingkai Chen, Taowen Wang, James Chenhao Liang, Chuan Liu, Chunshu Wu, Qifan Wang, Ying Nian Wu, Michael Huang, Chuang Ren, Ang Li, Tong Geng, Dongfang Liu

    Abstract: Controlled fusion energy is deemed pivotal for the advancement of human civilization. In this study, we introduce $\textbf{Fusion-LLM}$, a novel integration of Large Language Models (LLMs) with classical reservoir computing paradigms tailored to address challenges in Inertial Confinement Fusion ($\texttt{ICF}$). Our approach offers several key contributions: Firstly, we propose the… ▽ More

    Submitted 8 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  32. arXiv:2407.10988  [pdf, other

    cs.LG

    Residual resampling-based physics-informed neural network for neutron diffusion equations

    Authors: Heng Zhang, Yun-Ling He, Dong Liu, Qin Hang, He-Min Yao, Di Xiang

    Abstract: The neutron diffusion equation plays a pivotal role in the analysis of nuclear reactors. Nevertheless, employing the Physics-Informed Neural Network (PINN) method for its solution entails certain limitations. Traditional PINN approaches often utilize fully connected network (FCN) architecture, which is susceptible to overfitting, training instability, and gradient vanishing issues as the network d… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

  33. arXiv:2407.10954  [pdf, other

    cs.GR cs.AI cs.LG

    A Unified Differentiable Boolean Operator with Fuzzy Logic

    Authors: Hsueh-Ti Derek Liu, Maneesh Agrawala, Cem Yuksel, Tim Omernick, Vinith Misra, Stefano Corazza, Morgan McGuire, Victor Zordan

    Abstract: This paper presents a unified differentiable boolean operator for implicit solid shape modeling using Constructive Solid Geometry (CSG). Traditional CSG relies on min, max operators to perform boolean operations on implicit shapes. But because these boolean operators are discontinuous and discrete in the choice of operations, this makes optimization over the CSG representation challenging. Drawing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH'24

  34. arXiv:2407.10926  [pdf, other

    eess.IV cs.CV

    In-Loop Filtering via Trained Look-Up Tables

    Authors: Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

    Abstract: In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures

  35. arXiv:2407.10806  [pdf, other

    cs.CV

    Enhancing Robustness to Noise Corruption for Point Cloud Model via Spatial Sorting and Set-Mixing Aggregation Module

    Authors: Dingxin Zhang, Jianhui Yu, Tengfei Xue, Chaoyi Zhang, Dongnan Liu, Weidong Cai

    Abstract: Current models for point cloud recognition demonstrate promising performance on synthetic datasets. However, real-world point cloud data inevitably contains noise, impacting model robustness. While recent efforts focus on enhancing robustness through various strategies, there still remains a gap in comprehensive analyzes from the standpoint of network architecture design. Unlike traditional method… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 22 pages, 9 figures

  36. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  37. arXiv:2407.08093  [pdf, other

    eess.IV cs.AI cs.CV eess.SP

    MemWarp: Discontinuity-Preserving Cardiac Registration with Memorized Anatomical Filters

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Dongdong Liu, Gaolei Li, Rongguang Wang

    Abstract: Many existing learning-based deformable image registration methods impose constraints on deformation fields to ensure they are globally smooth and continuous. However, this assumption does not hold in cardiac image registration, where different anatomical regions exhibit asymmetric motions during respiration and movements due to sliding organs within the chest. Consequently, such global constraint… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 figure, 2 tables

  38. arXiv:2407.07667  [pdf, other

    cs.CV eess.IV

    VEnhancer: Generative Space-Time Enhancement for Video Generation

    Authors: Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin, Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

    Abstract: We present VEnhancer, a generative space-time enhancement framework that improves the existing text-to-video results by adding more details in spatial domain and synthetic detailed motion in temporal domain. Given a generated low-quality video, our approach can increase its spatial and temporal resolution simultaneously with arbitrary up-sampling space and time scales through a unified video diffu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: technical report

  39. arXiv:2407.07461  [pdf, other

    cs.CV

    Drantal-NeRF: Diffusion-Based Restoration for Anti-aliasing Neural Radiance Field

    Authors: Ganlin Yang, Kaidong Zhang, Jingjing Fu, Dong Liu

    Abstract: Aliasing artifacts in renderings produced by Neural Radiance Field (NeRF) is a long-standing but complex issue in the field of 3D implicit representation, which arises from a multitude of intricate causes and was mitigated by designing more advanced but complex scene parameterization methods before. In this paper, we present a Diffusion-based restoration method for anti-aliasing Neural Radiance Fi… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  40. arXiv:2407.07403  [pdf, other

    cs.CV

    A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

    Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

    Abstract: With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the compl… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  41. arXiv:2407.07328  [pdf, other

    cs.LG

    CATP: Context-Aware Trajectory Prediction with Competition Symbiosis

    Authors: Jiang Wu, Dongyu Liu, Yuchen Lin, Yingcai Wu

    Abstract: Contextual information is vital for accurate trajectory prediction. For instance, the intricate flying behavior of migratory birds hinges on their analysis of environmental cues such as wind direction and air pressure. However, the diverse and dynamic nature of contextual information renders it an arduous task for AI models to comprehend its impact on trajectories and consequently predict them acc… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  42. arXiv:2407.06531  [pdf, other

    cs.CV

    Decomposition Betters Tracking Everything Everywhere

    Authors: Rui Li, Dong Liu

    Abstract: Recent studies on motion estimation have advocated an optimized motion representation that is globally consistent across the entire video, preferably for every pixel. This is challenging as a uniform representation may not account for the complex and diverse motion and appearance of natural videos. We address this problem and propose a new test-time optimization method, named DecoMotion, for estim… ▽ More

    Submitted 16 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 Camera Ready. Code and models will be available at https://github.com/qianduoduolr/DecoMotion

  43. arXiv:2407.06460  [pdf, other

    cs.CL cs.AI

    MUSE: Machine Unlearning Six-Way Evaluation for Language Models

    Authors: Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang

    Abstract: Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approxim… ▽ More

    Submitted 14 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  44. arXiv:2407.04939  [pdf, ps, other

    cs.LG cs.CV

    Balance of Number of Embedding and their Dimensions in Vector Quantization

    Authors: Hang Chen, Sankepally Sainath Reddy, Ziwei Chen, Dianbo Liu

    Abstract: The dimensionality of the embedding and the number of available embeddings ( also called codebook size) are critical factors influencing the performance of Vector Quantization(VQ), a discretization process used in many models such as the Vector Quantized Variational Autoencoder (VQ-VAE) architecture. This study examines the balance between the codebook sizes and dimensions of embeddings in VQ, whi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  45. arXiv:2407.04208  [pdf, other

    cs.CV

    AMD: Automatic Multi-step Distillation of Large-scale Vision Models

    Authors: Cheng Han, Qifan Wang, Sohail A. Dianat, Majid Rabbani, Raghuveer M. Rao, Yi Fang, Qiang Guan, Lifu Huang, Dongfang Liu

    Abstract: Transformer-based architectures have become the de-facto standard models for diverse vision tasks owing to their superior performance. As the size of the models continues to scale up, model distillation becomes extremely important in various real applications, particularly on devices limited by computational resources. However, prevailing knowledge distillation methods exhibit diminished efficacy… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 19 pages, 5 figures

  46. arXiv:2407.04121  [pdf, other

    cs.CL cs.AI

    Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

    Authors: Yuyan Chen, Qiang Fu, Yichen Yuan, Zhihao Wen, Ge Fan, Dayiheng Liu, Dongmei Zhang, Zhixu Li, Yanghua Xiao

    Abstract: Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. In this paper, we propose a robust discriminator name… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2023 (Long Paper)

  47. arXiv:2407.04118  [pdf, other

    cs.CL cs.AI

    MAPO: Boosting Large Language Model Performance with Model-Adaptive Prompt Optimization

    Authors: Yuyan Chen, Zhihao Wen, Ge Fan, Zhengyu Chen, Wei Wu, Dayiheng Liu, Zhixu Li, Bang Liu, Yanghua Xiao

    Abstract: Prompt engineering, as an efficient and effective way to leverage Large Language Models (LLM), has drawn a lot of attention from the research community. The existing research primarily emphasizes the importance of adapting prompts to specific tasks, rather than specific LLMs. However, a good prompt is not solely defined by its wording, but also binds to the nature of the LLM in question. In this w… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to EMNLP 2023 (Findings)

  48. arXiv:2407.04078  [pdf, other

    cs.CL cs.AI cs.LG

    DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning

    Authors: Chengpeng Li, Guanting Dong, Mingfeng Xue, Ru Peng, Xiang Wang, Dayiheng Liu

    Abstract: Large language models (LLMs) have made impressive progress in handling simple math problems, yet they still struggle with more challenging and complex mathematical tasks. In this paper, we introduce a series of LLMs that employs the Decomposition of thought with code assistance and self-correction for mathematical reasoning, dubbed as DotaMath. DotaMath models tackle complex mathematical tasks by… ▽ More

    Submitted 17 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: Work in progress

  49. arXiv:2407.03130  [pdf, other

    cs.CV

    Towards Efficient Pixel Labeling for Industrial Anomaly Detection and Localization

    Authors: Hanxi Li, Jingqi Wu, Lin Yuanbo Wu, Hao Chen, Deyin Liu, Chunhua Shen

    Abstract: In the realm of practical Anomaly Detection (AD) tasks, manual labeling of anomalous pixels proves to be a costly endeavor. Consequently, many AD methods are crafted as one-class classifiers, tailored for training sets completely devoid of anomalies, ensuring a more cost-effective approach. While some pioneering work has demonstrated heightened AD accuracy by incorporating real anomaly samples in… ▽ More

    Submitted 4 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  50. arXiv:2407.00280  [pdf, other

    eess.IV cs.CV

    IVCA: Inter-Relation-Aware Video Complexity Analyzer

    Authors: Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

    Abstract: To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)