Skip to main content

Showing 1–50 of 745 results for author: Hu, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11085  [pdf, other

    cs.CV

    GSLoc: Efficient Camera Pose Refinement via 3D Gaussian Splatting

    Authors: Changkun Liu, Shuai Chen, Yash Bhalgat, Siyan Hu, Zirui Wang, Ming Cheng, Victor Adrian Prisacariu, Tristan Braud

    Abstract: We leverage 3D Gaussian Splatting (3DGS) as a scene representation and propose a novel test-time camera pose refinement framework, GSLoc. This framework enhances the localization accuracy of state-of-the-art absolute pose regression and scene coordinate regression methods. The 3DGS model renders high-quality synthetic images and depth maps to facilitate the establishment of 2D-3D correspondences.… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: The project page is available at https://gsloc.active.vision

  2. arXiv:2408.10613  [pdf, other

    cs.IR

    Task-level Distributionally Robust Optimization for Large Language Model-based Dense Retrieval

    Authors: Guangyuan Ma, Yongliang Ma, Xing Wu, Zhenpeng Su, Ming Zhou, Songlin Hu

    Abstract: Large Language Model-based Dense Retrieval (LLM-DR) optimizes over numerous heterogeneous fine-tuning collections from different domains. However, the discussion about its training data distribution is still minimal. Previous studies rely on empirically assigned dataset choices or sampling ratios, which inevitably leads to sub-optimal retrieval performances. In this paper, we propose a new task-le… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10555  [pdf, other

    cs.LG cs.IR

    Target-Prompt Online Graph Collaborative Learning for Temporal QoS Prediction

    Authors: Shengxiang Hu, Guobing Zou, Song Yang, Shiyi Lin, Bofeng Zhang, Yixin Chen

    Abstract: In service-oriented architecture, accurately predicting the Quality of Service (QoS) is vital for maintaining reliability and enhancing user satisfaction. However, current methods often neglect high-order latent collaborative relationships and fail to dynamically adjust feature learning for specific user-service invocations, which are critical for precise feature extraction. Moreover, relying on R… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

    MSC Class: 68T99 ACM Class: H.4.0; I.2.0

  4. arXiv:2408.09807  [pdf, other

    cs.AI

    World Models Increase Autonomy in Reinforcement Learning

    Authors: Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat, Edward S. Hu

    Abstract: Reinforcement learning (RL) is an appealing paradigm for training intelligent agents, enabling policy acquisition from the agent's own autonomously acquired experience. However, the training process of RL is far from automatic, requiring extensive human effort to reset the agent and environments. To tackle the challenging reset-free setting, we first demonstrate the superiority of model-based (MB)… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.08881  [pdf, other

    eess.IV cs.AI cs.CV

    U-MedSAM: Uncertainty-aware MedSAM for Medical Image Segmentation

    Authors: Xin Wang, Xiaoyu Liu, Peng Huang, Pu Huang, Shu Hu, Hongtu Zhu

    Abstract: Medical Image Foundation Models have proven to be powerful tools for mask prediction across various datasets. However, accurately assessing the uncertainty of their predictions remains a significant challenge. To address this, we propose a new model, U-MedSAM, which integrates the MedSAM model with an uncertainty-aware loss function and the Sharpness-Aware Minimization (SharpMin) optimizer. The un… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  6. arXiv:2408.08435  [pdf, other

    cs.AI

    Automated Design of Agentic Systems

    Authors: Shengran Hu, Cong Lu, Jeff Clune

    Abstract: Researchers are investing substantial effort in developing powerful general-purpose agents, wherein Foundation Models are used as modules within agentic systems (e.g. Chain-of-Thought, Self-Reflection, Toolformer). However, the history of machine learning teaches us that hand-designed solutions are eventually replaced by learned solutions. We formulate a new research area, Automated Design of Agen… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Website: https://shengranhu.com/ADAS

  7. arXiv:2408.05717  [pdf, other

    cs.CV cs.AI

    Deformable Image Registration with Multi-scale Feature Fusion from Shared Encoder, Auxiliary and Pyramid Decoders

    Authors: Hongchao Zhou, Shunbo Hu

    Abstract: In this work, we propose a novel deformable convolutional pyramid network for unsupervised image registration. Specifically, the proposed network enhances the traditional pyramid network by adding an additional shared auxiliary decoder for image pairs. This decoder provides multi-scale high-level feature information from unblended image pairs for the registration task. During the registration proc… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  8. arXiv:2408.05126  [pdf

    cs.HC cs.CL cs.SI

    Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media

    Authors: Petre Breazu, Miriam Schirmer, Songbo Hu, Napoleon Kastos

    Abstract: In the dynamic field of artificial intelligence (AI), the development and application of Large Language Models (LLMs) for text analysis are of significant academic interest. Despite the promising capabilities of various LLMs in conducting qualitative analysis, their use in the humanities and social sciences has not been thoroughly examined. This article contributes to the emerging literature on LL… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  9. arXiv:2408.04799  [pdf, other

    cs.HC

    Motion-based visual encoding can improve performance on perceptual tasks with dynamic time series

    Authors: Songwen Hu, Ouxun Jiang, Jeffrey Riedmiller, Cindy Xiong Bearfield

    Abstract: Dynamic data visualizations can convey large amounts of information over time, such as using motion to depict changes in data values for multiple entities. Such dynamic displays put a demand on our visual processing capacities, yet our perception of motion is limited. Several techniques have been shown to improve the processing of dynamic displays. Staging the animation to sequentially show steps… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE VIS 2024

  10. arXiv:2408.04300  [pdf, other

    eess.IV cs.CV

    An Explainable Non-local Network for COVID-19 Diagnosis

    Authors: Jingfu Yang, Peng Huang, Jing Hu, Shu Hu, Siwei Lyu, Xin Wang, Jun Guo, Xi Wu

    Abstract: The CNN has achieved excellent results in the automatic classification of medical images. In this study, we propose a novel deep residual 3D attention non-local network (NL-RAN) to classify CT images included COVID-19, common pneumonia, and normal to perform rapid and explainable COVID-19 diagnosis. We built a deep residual 3D attention non-local network that could achieve end-to-end training. The… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  11. arXiv:2408.04163  [pdf, other

    cs.SI

    Academic collaboration on large language model studies increases overall but varies across disciplines

    Authors: Lingyao Li, Ly Dinh, Songhua Hu, Libby Hemphill

    Abstract: Interdisciplinary collaboration is crucial for addressing complex scientific challenges. Recent advancements in large language models (LLMs) have shown significant potential in benefiting researchers across various fields. To explore the application of LLMs in scientific disciplines and their implications for interdisciplinary collaboration, we collect and analyze 50,391 papers from OpenAlex, an o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  12. arXiv:2408.03825  [pdf, other

    cs.RO cs.CV

    Towards Real-Time Gaussian Splatting: Accelerating 3DGS through Photometric SLAM

    Authors: Yan Song Hu, Dayou Mao, Yuhao Chen, John Zelek

    Abstract: Initial applications of 3D Gaussian Splatting (3DGS) in Visual Simultaneous Localization and Mapping (VSLAM) demonstrate the generation of high-quality volumetric reconstructions from monocular video streams. However, despite these promising advancements, current 3DGS integrations have reduced tracking performance and lower operating speeds compared to traditional VSLAM. To address these issues, w… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: This extended abstract has been submitted to be presented at an IEEE conference. It will be made available online by IEEE but will not be published in IEEE Xplore. Copyright may be transferred without notice, after which this version may no longer be accessible

  13. arXiv:2408.03624  [pdf, other

    cs.CV

    AgentsCoMerge: Large Language Model Empowered Collaborative Decision Making for Ramp Merging

    Authors: Senkang Hu, Zhengru Fang, Zihan Fang, Yiqin Deng, Xianhao Chen, Yuguang Fang, Sam Kwong

    Abstract: Ramp merging is one of the bottlenecks in traffic systems, which commonly cause traffic congestion, accidents, and severe carbon emissions. In order to address this essential issue and enhance the safety and efficiency of connected and autonomous vehicles (CAVs) at multi-lane merging zones, we propose a novel collaborative decision-making framework, named AgentsCoMerge, to leverage large language… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  14. arXiv:2408.03337  [pdf, other

    cs.HC cs.AI cs.CY cs.LG

    PsyDI: Towards a Personalized and Progressively In-depth Chatbot for Psychological Measurements

    Authors: Xueyan Li, Xinyan Chen, Yazhe Niu, Shuai Hu, Yu Liu

    Abstract: In the field of psychology, traditional assessment methods, such as standardized scales, are frequently critiqued for their static nature, lack of personalization, and reduced participant engagement, while comprehensive counseling evaluations are often inaccessible. The complexity of quantifying psychological traits further limits these methods. Despite advances with large language models (LLMs),… ▽ More

    Submitted 15 August, 2024; v1 submitted 22 July, 2024; originally announced August 2024.

    Comments: 29 pages, 15 figures

  15. arXiv:2408.02574  [pdf, other

    cs.HC

    DanModCap: Designing a Danmaku Moderation Tool for Video-Sharing Platforms that Leverages Impact Captions

    Authors: Siying Hu, Huanchen Wang, Yu Zhang, Piaohong Wang, Zhicong Lu

    Abstract: Online video platforms have gained increased popularity due to their ability to support information consumption and sharing and the diverse social interactions they afford. Danmaku, a real-time commentary feature that overlays user comments on a video, has been found to improve user engagement, however, the use of Danmaku can lead to toxic behaviors and inappropriate comments. To address these iss… ▽ More

    Submitted 19 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  16. arXiv:2408.02285  [pdf, other

    cs.CV

    Joint-Motion Mutual Learning for Pose Estimation in Videos

    Authors: Sifan Wu, Haipeng Chen, Yifang Yin, Sihao Hu, Runyang Feng, Yingying Jiao, Ziqi Yang, Zhenguang Liu

    Abstract: Human pose estimation in videos has long been a compelling yet challenging task within the realm of computer vision. Nevertheless, this task remains difficult because of the complex video scenes, such as video defocus and self-occlusion. Recent methods strive to integrate multi-frame visual features generated by a backbone network for pose estimation. However, they often ignore the useful joint in… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  17. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  18. arXiv:2408.01428  [pdf, other

    cs.CV cs.AI

    Transferable Adversarial Facial Images for Privacy Protection

    Authors: Minghui Li, Jiangxiong Wang, Hao Zhang, Ziqi Zhou, Shengshan Hu, Xiaobing Pei

    Abstract: The success of deep face recognition (FR) systems has raised serious privacy concerns due to their ability to enable unauthorized tracking of users in the digital world. Previous studies proposed introducing imperceptible adversarial noises into face images to deceive those face recognition models, thus achieving the goal of enhancing facial privacy protection. Nevertheless, they heavily rely on u… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  19. arXiv:2407.20242  [pdf, other

    cs.CY cs.AI cs.RO

    The Threats of Embodied Multimodal LLMs: Jailbreaking Robotic Manipulation in the Physical World

    Authors: Hangtao Zhang, Chenyu Zhu, Xianlong Wang, Ziqi Zhou, Yichen Wang, Lulu Xue, Minghui Li, Shengshan Hu, Leo Yu Zhang

    Abstract: Embodied artificial intelligence (AI) represents an artificial intelligence system that interacts with the physical world through sensors and actuators, seamlessly integrating perception and action. This design enables AI to learn from and operate within complex, real-world environments. Large Language Models (LLMs) deeply explore language instructions, playing a crucial role in devising plans for… ▽ More

    Submitted 15 August, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Preliminary version (17 pages, 4 figures). Work in progress, revisions ongoing. Appreciate understanding and welcome any feedback

  20. arXiv:2407.13975  [pdf, other

    cs.CV

    Personalized Privacy Protection Mask Against Unauthorized Facial Recognition

    Authors: Ka-Ho Chow, Sihao Hu, Tiansheng Huang, Ling Liu

    Abstract: Face recognition (FR) can be abused for privacy intrusion. Governments, private companies, or even individual attackers can collect facial images by web scraping to build an FR system identifying human faces without their consent. This paper introduces Chameleon, which learns to generate a user-centric personalized privacy protection mask, coined as P3-Mask, to protect facial images against unauth… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  21. arXiv:2407.13782  [pdf, other

    eess.AS cs.AI cs.SD

    Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

    Abstract: Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  22. arXiv:2407.13331  [pdf, other

    cs.LG

    Reconstruct the Pruned Model without Any Retraining

    Authors: Pingjie Wang, Ziqing Fan, Shengchao Hu, Zhe Chen, Yanfeng Wang, Yu Wang

    Abstract: Structured pruning is a promising hardware-friendly compression technique for large language models (LLMs), which is expected to be retraining-free to avoid the enormous retraining cost. This retraining-free paradigm involves (1) pruning criteria to define the architecture and (2) distortion reconstruction to restore performance. However, existing methods often emphasize pruning criteria while usi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages

  23. arXiv:2407.11421  [pdf, other

    cs.CL

    States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

    Authors: Junhao Chen, Shengding Hu, Zhiyuan Liu, Maosong Sun

    Abstract: Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output t… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  24. arXiv:2407.10474  [pdf, other

    cs.MM

    Multi-source Knowledge Enhanced Graph Attention Networks for Multimodal Fact Verification

    Authors: Han Cao, Lingwei Wei, Wei Zhou, Songlin Hu

    Abstract: Multimodal fact verification is an under-explored and emerging field that has gained increasing attention in recent years. The goal is to assess the veracity of claims that involve multiple modalities by analyzing the retrieved evidence. The main challenge in this area is to effectively fuse features from different modalities to learn meaningful multimodal representations. To this end, we propose… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ICME 2024

  25. arXiv:2407.09894  [pdf, other

    cs.SI cs.AI cs.CL

    Transferring Structure Knowledge: A New Task to Fake news Detection Towards Cold-Start Propagation

    Authors: Lingwei Wei, Dou Hu, Wei Zhou, Songlin Hu

    Abstract: Many fake news detection studies have achieved promising performance by extracting effective semantic and structure features from both content and propagation trees. However, it is challenging to apply them to practical situations, especially when using the trained propagation-based models to detect news with no propagation data. Towards this scenario, we study a new task named cold-start fake new… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: ICASSP 2024

  26. arXiv:2407.09816  [pdf, other

    cs.CL

    MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts

    Authors: Zhenpeng Su, Zijia Lin, Xue Bai, Xing Wu, Yizhe Xiong, Haoran Lian, Guangyuan Ma, Hui Chen, Guiguang Ding, Wei Zhou, Songlin Hu

    Abstract: Scaling the size of a model enhances its capabilities but significantly increases computation complexity. Mixture-of-Experts models (MoE) address the issue by allowing model size to scale up without substantially increasing training or inference costs. In MoE, there is an important module called the router, which is used to distribute each token to the experts. Currently, the mainstream routing me… ▽ More

    Submitted 19 August, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

    Comments: Work in progress

  27. arXiv:2407.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Autoregressive Speech Synthesis without Vector Quantization

    Authors: Lingwei Meng, Long Zhou, Shujie Liu, Sanyuan Chen, Bing Han, Shujie Hu, Yanqing Liu, Jinyu Li, Sheng Zhao, Xixin Wu, Helen Meng, Furu Wei

    Abstract: We present MELLE, a novel continuous-valued tokens based language modeling approach for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel-spectrogram frames directly from text condition, bypassing the need for vector quantization, which are originally designed for audio compression and sacrifice fidelity compared to mel-spectrograms. Specifically, (i) instead of cross… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  28. arXiv:2407.06310  [pdf, other

    cs.SD cs.AI cs.HC cs.LG eess.AS

    Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

    Authors: Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

    Abstract: The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: In submission to IEEE/ACM Transactions on Audio, Speech, and Language Processing

  29. arXiv:2407.05104  [pdf, other

    cs.CY

    Crowdsourced reviews reveal substantial disparities in public perceptions of parking

    Authors: Lingyao Li, Songhua Hu, Ly Dinh, Libby Hemphill

    Abstract: Due to increased reliance on private vehicles and growing travel demand, parking remains a longstanding urban challenge globally. Quantifying parking perceptions is paramount as it enables decision-makers to identify problematic areas and make informed decisions on parking management. This study introduces a cost-effective and widely accessible data source, crowdsourced online reviews, to investig… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  30. arXiv:2407.04688  [pdf, other

    cs.CV

    Enhancing Vehicle Re-identification and Matching for Weaving Analysis

    Authors: Mei Qiu, Wei Lin, Stanley Chien, Lauren Christopher, Yaobin Chen, Shu Hu

    Abstract: Vehicle weaving on highways contributes to traffic congestion, raises safety issues, and underscores the need for sophisticated traffic management systems. Current tools are inadequate in offering precise and comprehensive data on lane-specific weaving patterns. This paper introduces an innovative method for collecting non-overlapping video data in weaving zones, enabling the generation of quantit… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  31. arXiv:2407.02165  [pdf, other

    cs.CV

    WildAvatar: Web-scale In-the-wild Video Dataset for 3D Avatar Creation

    Authors: Zihao Huang, Shoukang Hu, Guangcong Wang, Tianqi Liu, Yuhang Zang, Zhiguo Cao, Wei Li, Ziwei Liu

    Abstract: Existing human datasets for avatar creation are typically limited to laboratory environments, wherein high-quality annotations (e.g., SMPL estimation from 3D scans or multi-view images) can be ideally provided. However, their annotating requirements are impractical for real-world images or videos, posing challenges toward real-world applications on current avatar creation methods. To this end, we… ▽ More

    Submitted 14 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Project page: https://wildavatar.github.io/

  32. arXiv:2407.00466  [pdf, other

    cs.CL cs.AI

    BioKGBench: A Knowledge Graph Checking Benchmark of AI Agent for Biomedical Science

    Authors: Xinna Lin, Siqi Ma, Junjie Shan, Xiaojing Zhang, Shell Xu Hu, Tiannan Guo, Stan Z. Li, Kaicheng Yu

    Abstract: Pursuing artificial intelligence for biomedical science, a.k.a. AI Scientist, draws increasing attention, where one common approach is to build a copilot agent driven by Large Language Models (LLMs). However, to evaluate such systems, people either rely on direct Question-Answering (QA) to the LLM itself, or in a biomedical experimental manner. How to precisely benchmark biomedical agents from an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  33. arXiv:2406.17338  [pdf, other

    eess.IV cs.CV cs.LG

    Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

    Authors: Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

    Abstract: Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propos… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: MICCAI 2024

  34. arXiv:2406.15718  [pdf, other

    cs.CL

    Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

    Authors: Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

    Abstract: As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can lis… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  35. arXiv:2406.15093  [pdf, other

    cs.CR cs.CV eess.IV

    ECLIPSE: Expunging Clean-label Indiscriminate Poisons via Sparse Diffusion Purification

    Authors: Xianlong Wang, Shengshan Hu, Yechao Zhang, Ziqi Zhou, Leo Yu Zhang, Peng Xu, Wei Wan, Hai Jin

    Abstract: Clean-label indiscriminate poisoning attacks add invisible perturbations to correctly labeled training images, thus dramatically reducing the generalization capability of the victim models. Recently, some defense mechanisms have been proposed such as adversarial training, image transformation techniques, and image purification. However, these schemes are either susceptible to adaptive attacks, bui… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ESORICS 2024

  36. arXiv:2406.13356  [pdf, other

    cs.LG

    Jogging the Memory of Unlearned Model Through Targeted Relearning Attack

    Authors: Shengyuan Hu, Yiwei Fu, Zhiwei Steven Wu, Virginia Smith

    Abstract: Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to r… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, 12 tables

  37. arXiv:2406.13294  [pdf, other

    cs.MM cs.LG

    Enhancing Cross-Prompt Transferability in Vision-Language Models through Contextual Injection of Target Tokens

    Authors: Xikang Yang, Xuehai Tang, Fuqing Zhu, Jizhong Han, Songlin Hu

    Abstract: Vision-language models (VLMs) seamlessly integrate visual and textual data to perform tasks such as image classification, caption generation, and visual question answering. However, adversarial images often struggle to deceive all prompts effectively in the context of cross-prompt migration attacks, as the probability distribution of the tokens in these images tends to favor the semantics of the o… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  38. arXiv:2406.12293  [pdf, other

    cs.CV

    Unleashing the Potential of Open-set Noisy Samples Against Label Noise for Medical Image Classification

    Authors: Zehui Liao, Shishuai Hu, Yong Xia

    Abstract: The challenge of addressing mixed closed-set and open-set label noise in medical image classification remains largely unexplored. Unlike natural image classification where there is a common practice of segregation and separate processing of closed-set and open-set noisy samples from clean ones, medical image classification faces difficulties due to high inter-class similarity which complicates the… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 1 figure

  39. arXiv:2406.11077  [pdf, other

    cs.CV

    Learning Relighting and Intrinsic Decomposition in Neural Radiance Fields

    Authors: Yixiong Yang, Shilin Hu, Haoyu Wu, Ramon Baldrich, Dimitris Samaras, Maria Vanrell

    Abstract: The task of extracting intrinsic components, such as reflectance and shading, from neural radiance fields is of growing interest. However, current methods largely focus on synthetic scenes and isolated objects, overlooking the complexities of real scenes with backgrounds. To address this gap, our research introduces a method that combines relighting with intrinsic decomposition. By leveraging ligh… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted by CVPR 2024 Workshop Neural Rendering Intelligence(NRI)

  40. arXiv:2406.10246  [pdf, other

    cs.IR cs.AI

    Semantic-Enhanced Relational Metric Learning for Recommender Systems

    Authors: Mingming Li, Fuqing Zhu, Feng Yuan, Songlin Hu

    Abstract: Recently, relational metric learning methods have been received great attention in recommendation community, which is inspired by the translation mechanism in knowledge graph. Different from the knowledge graph where the entity-to-entity relations are given in advance, historical interactions lack explicit relations between users and items in recommender systems. Currently, many researchers have s… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  41. arXiv:2406.10160  [pdf, other

    cs.SD cs.AI eess.AS

    One-pass Multiple Conformer and Foundation Speech Systems Compression and Quantization Using An All-in-one Neural Model

    Authors: Zhaoqing Li, Haoning Xu, Tianzi Wang, Shoukang Hu, Zengrui Jin, Shujie Hu, Jiajun Deng, Mingyu Cui, Mengzhe Geng, Xunying Liu

    Abstract: We propose a novel one-pass multiple ASR systems joint compression and quantization approach using an all-in-one neural model. A single compression cycle allows multiple nested systems with varying Encoder depths, widths, and quantization precision settings to be simultaneously constructed without the need to train and store individual target systems separately. Experiments consistently demonstrat… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  42. arXiv:2406.10152  [pdf, other

    cs.SD eess.AS

    Joint Speaker Features Learning for Audio-visual Multichannel Speech Separation and Recognition

    Authors: Guinan Li, Jiajun Deng, Youjun Chen, Mengzhe Geng, Shujie Hu, Zhe Li, Zengrui Jin, Tianzi Wang, Xurong Xie, Helen Meng, Xunying Liu

    Abstract: This paper proposes joint speaker feature learning methods for zero-shot adaptation of audio-visual multichannel speech separation and recognition systems. xVector and ECAPA-TDNN speaker encoders are connected using purpose-built fusion blocks and tightly integrated with the complete system training. Experiments conducted on LRS3-TED data simulated multichannel overlapped speech suggest that joint… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  43. arXiv:2406.10034  [pdf, other

    cs.SD cs.AI eess.AS

    Towards Effective and Efficient Non-autoregressive Decoding Using Block-based Attention Mask

    Authors: Tianzi Wang, Xurong Xie, Zhaoqing Li, Shoukang Hu, Zengrui Jing, Jiajun Deng, Mingyu Cui, Shujie Hu, Mengzhe Geng, Guinan Li, Helen Meng, Xunying Liu

    Abstract: This paper proposes a novel non-autoregressive (NAR) block-based Attention Mask Decoder (AMD) that flexibly balances performance-efficiency trade-offs for Conformer ASR systems. AMD performs parallel NAR inference within contiguous blocks of output labels that are concealed using attention masks, while conducting left-to-right AR prediction and history context amalgamation between blocks. A beam s… ▽ More

    Submitted 16 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    Comments: 5 pages, 2 figures, 2 tables, Interspeech24 conference

  44. arXiv:2406.08751  [pdf, other

    cs.AI

    3D Building Generation in Minecraft via Large Language Models

    Authors: Shiying Hu, Zengrong Huang, Chengpeng Hu, Jialin Liu

    Abstract: Recently, procedural content generation has exhibited considerable advancements in the domain of 2D game level generation such as Super Mario Bros. and Sokoban through large language models (LLMs). To further validate the capabilities of LLMs, this paper explores how LLMs contribute to the generation of 3D buildings in a sandbox game, Minecraft. We propose a Text to Building in Minecraft (T2BM) mo… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by IEEE Conference on Games

  45. arXiv:2406.06544  [pdf, other

    cs.AR cs.AI

    TSB: Tiny Shared Block for Efficient DNN Deployment on NVCIM Accelerators

    Authors: Yifan Qin, Zheyu Yan, Zixuan Pan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

    Abstract: Compute-in-memory (CIM) accelerators using non-volatile memory (NVM) devices offer promising solutions for energy-efficient and low-latency Deep Neural Network (DNN) inference execution. However, practical deployment is often hindered by the challenge of dealing with the massive amount of model weight parameters impacted by the inherent device variations within non-volatile computing-in-memory (NV… ▽ More

    Submitted 21 August, 2024; v1 submitted 8 May, 2024; originally announced June 2024.

    Comments: 9 pages, accepted to IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2024)

  46. arXiv:2406.05510  [pdf, other

    cs.LG cs.CL

    Representation Learning with Conditional Information Flow Maximization

    Authors: Dou Hu, Lingwei Wei, Wei Zhou, Songlin Hu

    Abstract: This paper proposes an information-theoretic representation learning framework, named conditional information flow maximization, to extract noise-invariant sufficient representations for the input data and target task. It promotes the learned representations have good feature uniformity and sufficient predictive ability, which can enhance the generalization of pre-trained language models (PLMs) fo… ▽ More

    Submitted 12 August, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: 16 pages, accepted to ACL 2024 (main conference), the code is available at https://github.com/zerohd4869/CIFM

  47. arXiv:2406.01489  [pdf, other

    cs.CV

    DA-HFNet: Progressive Fine-Grained Forgery Image Detection and Localization Based on Dual Attention

    Authors: Yang Liu, Xiaofei Li, Jun Zhang, Shengze Hu, Jun Lei

    Abstract: The increasing difficulty in accurately detecting forged images generated by AIGC(Artificial Intelligence Generative Content) poses many risks, necessitating the development of effective methods to identify and further locate forged areas. In this paper, to facilitate research efforts, we construct a DA-HFNet forged image dataset guided by text or image-assisted GAN and Diffusion model. Our goal i… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

  48. arXiv:2406.01151  [pdf, other

    cs.AR

    A 0.96pJ/SOP, 30.23K-neuron/mm^2 Heterogeneous Neuromorphic Chip With Fullerene-like Interconnection Topology for Edge-AI Computing

    Authors: P. J. Zhou, Q. Yu, M. Chen, Y. C. Wang, L. W. Meng, Y. Zuo, N. Ning, Y. Liu, S. G. Hu, G. C. Qiao

    Abstract: Edge-AI computing requires high energy efficiency, low power consumption, and relatively high flexibility and compact area, challenging the AI-chip design. This work presents a 0.96 pJ/SOP heterogeneous neuromorphic system-on-chip (SoC) with fullerene-like interconnection topology for edge-AI computing. The neuromorphic core integrates different technologies to augment computing energy efficiency,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 5 pages, 8 figures

  49. arXiv:2406.00783  [pdf, other

    cs.CV

    AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

    Authors: Li Lin, Santosh, Xin Wang, Shu Hu

    Abstract: AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets… ▽ More

    Submitted 4 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  50. arXiv:2406.00337  [pdf, other

    cs.HC

    The Odyssey Journey: Hemifacial Spasm Patients' Top-Tier Medical Resource Seeking in China from an Actor-Network Perspective

    Authors: Ka I Chan, Yuntao Wang, Siying Hu, Bo Hei, Zhicong Lu, Pei-Luen Patrick Rau, Yuanchun Shi

    Abstract: Health information-seeking behaviors are critical for individuals managing illnesses, especially in cases like hemifacial spasm (HFS), a condition familiar to specialists but not to general practitioners and the broader public. The limited awareness of HFS often leads to scarce online resources for self-diagnosis and a heightened risk of misdiagnosis. In China, the imbalance in the doctor-to-patie… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.