Skip to main content

Showing 1–50 of 145 results for author: Cai, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.17397  [pdf, other

    cs.IT eess.SP

    End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework

    Authors: Chang Cai, Xiaojun Yuan, Ying-Jun Angela Zhang

    Abstract: This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

    Comments: major revision in IEEE JSAC

  2. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  3. arXiv:2408.07104  [pdf, other

    cs.LG

    Model Based and Physics Informed Deep Learning Neural Network Structures

    Authors: Ali Mohammad-Djafari, Ning Chu, Li Wang, Caifang Cai, Liang Yu

    Abstract: Neural Networks (NN) has been used in many areas with great success. When a NN's structure (Model) is given, during the training steps, the parameters of the model are determined using an appropriate criterion and an optimization algorithm (Training). Then, the trained model can be used for the prediction or inference step (Testing). As there are also many hyperparameters, related to the optimizat… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: key words: Deep Neural Network, Inverse problems; Bayesian inference; Model based DNN structure, MaxEnt2024 conference, Gent University, Gent, Belgium, July 1-5, 2024

  4. arXiv:2408.00802  [pdf, other

    cs.IR cs.AI cs.CL cs.LG

    Leveraging LLM Reasoning Enhances Personalized Recommender Systems

    Authors: Alicia Y. Tsai, Adam Kraft, Long Jin, Chenwei Cai, Anahita Hosseini, Taibai Xu, Zemin Zhang, Lichan Hong, Ed H. Chi, Xinyang Yi

    Abstract: Recent advancements have showcased the potential of Large Language Models (LLMs) in executing reasoning tasks, particularly facilitated by Chain-of-Thought (CoT) prompting. While tasks like arithmetic reasoning involve clear, definitive answers and logical chains of thought, the application of LLM reasoning in recommendation systems (RecSys) presents a distinct challenge. RecSys tasks revolve arou… ▽ More

    Submitted 22 July, 2024; originally announced August 2024.

    Comments: To be published at ACL 2024

  5. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  6. arXiv:2407.19753  [pdf, other

    cs.CV eess.SP

    PredIN: Towards Open-Set Gesture Recognition via Prediction Inconsistency

    Authors: Chen Liu, Can Han, Chengfeng Zhou, Crystal Cai, Dahong Qian

    Abstract: Gesture recognition based on surface electromyography (sEMG) has achieved significant progress in human-machine interaction (HMI). However, accurately recognizing predefined gestures within a closed set is still inadequate in practice; a robust open-set system needs to effectively reject unknown gestures while correctly classifying known ones. To handle this challenge, we first report prediction i… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Under review

  7. arXiv:2407.15050  [pdf, other

    cs.LG cs.AI cs.CR cs.MM

    Arondight: Red Teaming Large Vision Language Models with Auto-generated Multi-modal Jailbreak Prompts

    Authors: Yi Liu, Chengjun Cai, Xiaoli Zhang, Xingliang Yuan, Cong Wang

    Abstract: Large Vision Language Models (VLMs) extend and enhance the perceptual abilities of Large Language Models (LLMs). Despite offering new possibilities for LLM applications, these advancements raise significant security and ethical concerns, particularly regarding the generation of harmful content. While LLMs have undergone extensive security evaluations with the aid of red teaming frameworks, VLMs cu… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: To be published in ACM MM 2024

  8. arXiv:2407.12274  [pdf, other

    cs.CV

    MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

    Authors: Cong Cai, Shan Liang, Xuefei Liu, Kang Zhu, Zhengqi Wen, Jianhua Tao, Heng Xie, Jizhou Cui, Yiming Ma, Zhenhua Cheng, Hanzhe Xu, Ruibo Fu, Bin Liu, Yongwei Li

    Abstract: Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Code and data are available; Submitted to NeurIPS 2024 Datasets and Benchmarks Track

  9. arXiv:2407.05623  [pdf, other

    cs.CV

    Momentum Auxiliary Network for Supervised Local Learning

    Authors: Junhao Su, Changpeng Cai, Feiyu Zhu, Chenghao He, Xiaojie Xu, Dongzhi Guan, Chenyang Si

    Abstract: Deep neural networks conventionally employ end-to-end backpropagation for their training process, which lacks biological credibility and triggers a locking dilemma during network parameter updates, leading to significant GPU memory use. Supervised local learning, which segments the network into multiple local blocks updated by independent auxiliary networks. However, these methods cannot replace e… ▽ More

    Submitted 12 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024(Oral)

  10. arXiv:2407.03177  [pdf, other

    cs.HC eess.SP

    EDPNet: An Efficient Dual Prototype Network for Motor Imagery EEG Decoding

    Authors: Can Han, Chen Liu, Crystal Cai, Jun Wang, Dahong Qian

    Abstract: Motor imagery electroencephalograph (MI-EEG) decoding plays a crucial role in developing motor imagery brain-computer interfaces (MI-BCIs). However, decoding intentions from MI remains challenging due to the inherent complexity of EEG signals relative to the small-sample size. In this paper, we propose an Efficient Dual Prototype Network (EDPNet) to enable accurate and fast MI decoding. EDPNet emp… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  11. arXiv:2407.00980  [pdf, other

    cs.AI cs.RO

    Acceleration method for generating perception failure scenarios based on editing Markov process

    Authors: Canjie Cai

    Abstract: With the rapid advancement of autonomous driving technology, self-driving cars have become a central focus in the development of future transportation systems. Scenario generation technology has emerged as a crucial tool for testing and verifying the safety performance of autonomous driving systems. Current research in scenario generation primarily focuses on open roads such as highways, with rela… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  12. arXiv:2406.19905  [pdf, other

    cs.CV

    Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

    Authors: Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Size Li, Di Zhang, Xi Li

    Abstract: The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLMs encourage different experts to handle different tokens, and they usually em… ▽ More

    Submitted 5 August, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

  13. arXiv:2406.16633  [pdf, other

    cs.CV

    MLAAN: Scaling Supervised Local Learning with Multilaminar Leap Augmented Auxiliary Network

    Authors: Yuming Zhang, Shouxin Zhang, Peizhe Wang, Feiyu Zhu, Dongzhi Guan, Junhao Su, Jiabin Liu, Changpeng Cai

    Abstract: Deep neural networks (DNNs) typically employ an end-to-end (E2E) training paradigm which presents several challenges, including high GPU memory consumption, inefficiency, and difficulties in model parallelization during training. Recent research has sought to address these issues, with one promising approach being local learning. This method involves partitioning the backbone network into gradient… ▽ More

    Submitted 15 August, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  14. arXiv:2406.11340  [pdf, other

    cs.CV cs.LG

    CM2-Net: Continual Cross-Modal Mapping Network for Driver Action Recognition

    Authors: Ruoyu Wang, Chen Cai, Wenqian Wang, Jianjun Gao, Dan Lin, Wenyang Liu, Kim-Hui Yap

    Abstract: Driver action recognition has significantly advanced in enhancing driver-vehicle interactions and ensuring driving safety by integrating multiple modalities, such as infrared and depth. Nevertheless, compared to RGB modality only, it is always laborious and costly to collect extensive data for all types of non-RGB modalities in car cabin environments. Therefore, previous works have suggested indep… ▽ More

    Submitted 3 August, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  15. arXiv:2406.03150  [pdf, other

    cs.LG cs.CV

    Sample-specific Masks for Visual Reprogramming-based Prompting

    Authors: Chengyi Cai, Zesheng Ye, Lei Feng, Jianzhong Qi, Feng Liu

    Abstract: Visual reprogramming (VR) is a prompting technique that aims to re-purpose a pre-trained model (e.g., a classifier on ImageNet) to target tasks (e.g., medical data prediction) by learning a small-scale pattern added into input images instead of tuning considerable parameters within the model. The location of the pattern within input samples is usually determined by a pre-defined mask shared across… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  16. arXiv:2406.01900  [pdf, other

    cs.CV

    Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

    Authors: Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences. The main challenge of portrait animation is to preserve the identity of the reference portrait and transfer the target expression to this portrait while maintaining temporal consistency and fidelity. To address these challenges, Follow-Your-Emoji equ… ▽ More

    Submitted 6 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: Project Page: https://follow-your-emoji.github.io/

  17. arXiv:2406.00446  [pdf, other

    cs.CV cs.AI

    GLCAN: Global-Local Collaborative Auxiliary Network for Local Learning

    Authors: Feiyu Zhu, Yuming Zhang, Changpeng Cai, Guinan Guo, Jiao Li, Xiuyuan Guo, Quanwei Zhang, Peizhe Wang, Chenghao He, Junhao Su

    Abstract: Traditional deep neural networks typically use end-to-end backpropagation, which often places a big burden on GPU memory. Another promising training method is local learning, which involves splitting the network into blocks and training them in parallel with the help of an auxiliary network. Local learning has been widely studied and applied to image classification tasks, and its performance is co… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  18. arXiv:2405.14075  [pdf, other

    cs.CL cs.AI cs.LG

    $T^2$ of Thoughts: Temperature Tree Elicits Reasoning in Large Language Models

    Authors: Chengkun Cai, Xu Zhao, Yucheng Du, Haoliang Liu, Lei Li

    Abstract: Large Language Models (LLMs) have emerged as powerful tools in artificial intelligence, especially in complex decision-making scenarios, but their static problem-solving strategies often limit their adaptability to dynamic environments. We explore the enhancement of reasoning capabilities in LLMs through Temperature Tree ($T^2$) prompting via Particle Swarm Optimization, termed as $T^2$ of Thought… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 10 pages, 5 figures

  19. arXiv:2405.10570  [pdf

    eess.IV cs.AI

    Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

    Authors: Yirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo Qu

    Abstract: In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features… ▽ More

    Submitted 29 May, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 6 tables

  20. arXiv:2405.07257   

    cs.CV

    Listen, Disentangle, and Control: Controllable Speech-Driven Talking Head Generation

    Authors: Changpeng Cai, Guinan Guo, Jiao Li, Junhao Su, Chenghao He, Jing Xiao, Yuanxu Chen, Lei Dai, Feiyu Zhu

    Abstract: Most earlier investigations on talking face generation have focused on the synchronization of lip motion and speech content. However, human head pose and facial emotions are equally important characteristics of natural human faces. While audio-driven talking face generation has seen notable advancements, existing methods either overlook facial emotions or are limited to specific individuals and ca… ▽ More

    Submitted 27 August, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Due to our negligence, there are factual errors in the experimental results, so we are considering resubmitting the paper after an overhaul

    ACM Class: I.4.5; I.4.9

  21. arXiv:2405.03806  [pdf, other

    cs.HC

    In Situ AI Prototyping: Infusing Multimodal Prompts into Mobile Settings with MobileMaker

    Authors: Savvas Petridis, Michael Xieyang Liu, Alexander J. Fiannaca, Vivian Tsai, Michael Terry, Carrie J. Cai

    Abstract: Recent advances in multimodal large language models (LLMs) have lowered the barriers to rapidly prototyping AI-powered features via prompting, especially for mobile-intended use cases. Despite the value of situated user feedback, the process of soliciting early, mobile-situated user feedback on AI prototypes remains challenging. The broad scope and flexibility of LLMs means that, for a given use-c… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  22. arXiv:2404.18249  [pdf, other

    cs.PL

    Tenspiler: A Verified Lifting-Based Compiler for Tensor Operations

    Authors: Jie Qiu, Colin Cai, Sahil Bhatia, Niranjan Hasabnis, Sanjit A. Seshia, Alvin Cheung

    Abstract: Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifical… ▽ More

    Submitted 28 July, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  23. "We Need Structured Output": Towards User-centered Constraints on Large Language Model Output

    Authors: Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, Carrie J. Cai

    Abstract: Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Journal ref: "We Need Structured Output": Towards User-centered Constraints on LLM Output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '24), May 11-16, 2024, Honolulu, HI, USA

  24. arXiv:2403.08268  [pdf, other

    cs.CV

    Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts

    Authors: Yue Ma, Yingqing He, Hongfa Wang, Andong Wang, Chenyang Qi, Chengfei Cai, Xiu Li, Zhifeng Li, Heung-Yeung Shum, Wei Liu, Qifeng Chen

    Abstract: Despite recent advances in image-to-video generation, better controllability and local animation are less explored. Most existing image-to-video methods are not locally aware and tend to move the entire scene. However, human artists may need to control the movement of different objects or regions. Additionally, current I2V methods require users not only to describe the target motion but also to pr… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Project Page: https://follow-your-click.github.io/ Github Page: https://github.com/mayuelala/FollowYourClick

  25. arXiv:2402.18771  [pdf, other

    cs.CV cs.RO

    NARUTO: Neural Active Reconstruction from Uncertain Target Observations

    Authors: Ziyue Feng, Huangying Zhan, Zheng Chen, Qingan Yan, Xiangyu Xu, Changjiang Cai, Bing Li, Qilun Zhu, Yi Xu

    Abstract: We present NARUTO, a neural active reconstruction system that combines a hybrid neural representation with uncertainty learning, enabling high-fidelity surface reconstruction. Our approach leverages a multi-resolution hash-grid as the mapping backbone, chosen for its exceptional convergence speed and capacity to capture high-frequency local features.The centerpiece of our work is the incorporation… ▽ More

    Submitted 16 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR2024. Project page: https://oppo-us-research.github.io/NARUTO-website/. Code: https://github.com/oppo-us-research/NARUTO

  26. arXiv:2402.15939  [pdf

    eess.IV cs.LG

    Deep Separable Spatiotemporal Learning for Fast Dynamic Cardiac MRI

    Authors: Zi Wang, Min Xiao, Yirong Zhou, Chengyan Wang, Naiming Wu, Yi Li, Yiwen Gong, Shufu Chang, Yinyin Chen, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Di Guo, Guang Yang, Xiaobo Qu

    Abstract: Dynamic magnetic resonance imaging (MRI) plays an indispensable role in cardiac diagnosis. To enable fast imaging, the k-space data can be undersampled but the image reconstruction poses a great challenge of high-dimensional processing. This challenge leads to necessitate extensive training data in many deep learning reconstruction methods. This work proposes a novel and efficient approach, levera… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 10 pages, 11 figures, 3 tables

  27. arXiv:2401.01065  [pdf, other

    cs.CV cs.AI

    BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving

    Authors: Tao Tang, Dafeng Wei, Zhengyu Jia, Tian Gao, Changwei Cai, Chengkai Hou, Peng Jia, Kun Zhan, Haiyang Sun, Jingchen Fan, Yixing Zhao, Fu Liu, Xiaodan Liang, Xianpeng Lang, Yang Wang

    Abstract: The rapid development of the autonomous driving industry has led to a significant accumulation of autonomous driving data. Consequently, there comes a growing demand for retrieving data to provide specialized optimization. However, directly applying previous image retrieval methods faces several challenges, such as the lack of global feature representation and inadequate text retrieval ability for… ▽ More

    Submitted 18 June, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  28. arXiv:2401.00871  [pdf, other

    cs.CV

    PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields

    Authors: Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu, Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu

    Abstract: Identifying spatially complete planar primitives from visual data is a crucial task in computer vision. Prior methods are largely restricted to either 2D segment recovery or simplifying 3D structures, even with extensive plane annotations. We present PlanarNeRF, a novel framework capable of detecting dense 3D planes through online learning. Drawing upon the neural field representation, PlanarNeRF… ▽ More

    Submitted 29 December, 2023; originally announced January 2024.

  29. arXiv:2312.14375  [pdf, ps, other

    cs.CR

    R-Pool and Settlement Markets for Recoverable ERC-20R Tokens

    Authors: Kaili Wang, Qinchen Wang, Calvin Cai, Dan Boneh

    Abstract: ERC-20R is a wrapper around ERC-20 that supports asset recovery within a limited time window after an asset is transferred. It is designed to reduce theft and losses on the blockchain by allowing a victim to recover their stolen or lost assets during the recovery window. When an honest recipient receives an ERC-20R asset, they must wait until the recovery windows elapses (say, 24 hours), before th… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: in 2023 ACM Workshop on Decentralized Finance and Security (ACM DeFi 2023)

  30. arXiv:2312.02535  [pdf, other

    cs.CV

    Towards Open-set Gesture Recognition via Feature Activation Enhancement and Orthogonal Prototype Learning

    Authors: Chen Liu, Can Han, Chengfeng Zhou, Crystal Cai, Suncheng Xiang, Hualiang Ni, Dahong Qian

    Abstract: Gesture recognition is a foundational task in human-machine interaction (HMI). While there has been significant progress in gesture recognition based on surface electromyography (sEMG), accurate recognition of predefined gestures only within a closed set is still inadequate in practice. It is essential to effectively discern and reject unknown gestures of disinterest in a robust system. Numerous m… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  31. arXiv:2311.08687  [pdf, other

    cs.CL cs.AI cs.LG

    An Eye on Clinical BERT: Investigating Language Model Generalization for Diabetic Eye Disease Phenotyping

    Authors: Keith Harrigian, Tina Tang, Anthony Gonzales, Cindy X. Cai, Mark Dredze

    Abstract: Diabetic eye disease is a major cause of blindness worldwide. The ability to monitor relevant clinical trajectories and detect lapses in care is critical to managing the disease and preventing blindness. Alas, much of the information necessary to support these goals is found only in the free text of the electronic medical record. To fill this information gap, we introduce a system for extracting e… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 24 pages

  32. Gradient-Based Dovetail Joint Shape Optimization for Stiffness

    Authors: Xingyuan Sun, Chenyue Cai, Ryan P. Adams, Szymon Rusinkiewicz

    Abstract: It is common to manufacture an object by decomposing it into parts that can be assembled. This decomposition is often required by size limits of the machine, the complex structure of the shape, etc. To make it possible to easily assemble the final object, it is often desirable to design geometry that enables robust connections between the subcomponents. In this project, we study the task of doveta… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: ACM SCF 2023: Proceedings of the 8th Annual ACM Symposium on Computational Fabrication

  33. arXiv:2310.15435  [pdf, other

    cs.HC cs.AI

    PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers' Workflows

    Authors: Savvas Petridis, Michael Terry, Carrie J. Cai

    Abstract: Prototyping AI applications is notoriously difficult. While large language model (LLM) prompting has dramatically lowered the barriers to AI prototyping, designers are still prototyping AI functionality and UI separately. We investigate how coupling prompt and UI design affects designers' workflows. Grounding this research, we developed PromptInfuser, a Figma plugin that enables users to create se… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  34. arXiv:2310.15428  [pdf, other

    cs.HC cs.AI

    ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

    Authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

    Abstract: Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  35. arXiv:2310.07464  [pdf

    eess.IV cs.LG q-bio.QM

    Deep Learning Predicts Biomarker Status and Discovers Related Histomorphology Characteristics for Low-Grade Glioma

    Authors: Zijie Fang, Yihan Liu, Yifeng Wang, Xiangyang Zhang, Yang Chen, Changjing Cai, Yiyang Lin, Ying Han, Zhi Wang, Shan Zeng, Hong Shen, Jun Tan, Yongbing Zhang

    Abstract: Biomarker detection is an indispensable part in the diagnosis and treatment of low-grade glioma (LGG). However, current LGG biomarker detection methods rely on expensive and complex molecular genetic testing, for which professionals are required to analyze the results, and intra-rater variability is often reported. To overcome these challenges, we propose an interpretable deep learning pipeline, a… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: 47 pages, 6 figures

  36. arXiv:2310.03661  [pdf, other

    cs.CV

    Robustness-Guided Image Synthesis for Data-Free Quantization

    Authors: Jianhong Bai, Yuchen Yang, Huanpeng Chu, Hualiang Wang, Zuozhu Liu, Ruizhe Chen, Xiaoxuan He, Lianrui Mu, Chengfei Cai, Haoji Hu

    Abstract: Quantization has emerged as a promising direction for model compression. Recently, data-free quantization has been widely studied as a promising method to avoid privacy concerns, which synthesizes images as an alternative to real training data. Existing methods use classification loss to ensure the reliability of the synthesized images. Unfortunately, even if these images are well-classified by th… ▽ More

    Submitted 20 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted at AAAI 2024

  37. arXiv:2308.00929  [pdf, other

    cs.CV

    Towards Discriminative Representation with Meta-learning for Colonoscopic Polyp Re-Identification

    Authors: Suncheng Xiang, Qingzhong Chen, Shilun Cai, Chengfeng Zhou, Crystal Cai, Sijia Du, Zhengjie Zhang, Yunshi Zhong, Dahong Qian

    Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, traditional methods for object ReID directly adopting CNN models trained on the ImageNet dataset usually produce unsatisfactory ret… ▽ More

    Submitted 28 November, 2023; v1 submitted 2 August, 2023; originally announced August 2023.

  38. arXiv:2307.13220  [pdf

    eess.IV cs.AI physics.med-ph

    One for Multiple: Physics-informed Synthetic Data Boosts Generalizable Deep Learning for Fast MRI Reconstruction

    Authors: Zi Wang, Xiaotong Yu, Chengyan Wang, Weibo Chen, Jiazheng Wang, Ying-Hua Chu, Hongwei Sun, Rushuai Li, Peiyong Li, Fan Yang, Haiwei Han, Taishan Kang, Jianzhong Lin, Chen Yang, Shufu Chang, Zhang Shi, Sha Hua, Yan Li, Juan Hu, Liuhong Zhu, Jianjun Zhou, Meijing Lin, Jiefeng Guo, Congbo Cai, Zhong Chen , et al. (3 additional authors not shown)

    Abstract: Magnetic resonance imaging (MRI) is a widely used radiological modality renowned for its radiation-free, comprehensive insights into the human body, facilitating medical diagnoses. However, the drawback of prolonged scan times hinders its accessibility. The k-space undersampling offers a solution, yet the resultant artifacts necessitate meticulous removal during image reconstruction. Although Deep… ▽ More

    Submitted 28 February, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 38 pages, 19 figures, 5 tables

  39. arXiv:2307.08991  [pdf, other

    cs.CV cs.RO

    EgoVM: Achieving Precise Ego-Localization using Lightweight Vectorized Maps

    Authors: Yuzhe He, Shuang Liang, Xiaofei Rui, Chengying Cai, Guowei Wan

    Abstract: Accurate and reliable ego-localization is critical for autonomous driving. In this paper, we present EgoVM, an end-to-end localization network that achieves comparable localization accuracy to prior state-of-the-art methods, but uses lightweight vectorized maps instead of heavy point-based maps. To begin with, we extract BEV features from online multi-view images and LiDAR point cloud. Then, we em… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

    Comments: 8 pages

  40. arXiv:2307.07135  [pdf, other

    cs.CL

    MMSD2.0: Towards a Reliable Multi-modal Sarcasm Detection System

    Authors: Libo Qin, Shijue Huang, Qiguang Chen, Chenran Cai, Yudi Zhang, Bin Liang, Wanxiang Che, Ruifeng Xu

    Abstract: Multi-modal sarcasm detection has attracted much recent attention. Nevertheless, the existing benchmark (MMSD) has some shortcomings that hinder the development of reliable multi-modal sarcasm detection system: (1) There are some spurious cues in MMSD, leading to the model bias learning; (2) The negative samples in MMSD are not always reasonable. To solve the aforementioned issues, we introduce MM… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

    Comments: Accepted by ACL2023 Findings

  41. arXiv:2306.07490  [pdf, other

    cs.CV

    Top-Down Framework for Weakly-supervised Grounded Image Captioning

    Authors: Chen Cai, Suchen Wang, Kim-hui Yap, Yi Wang

    Abstract: Weakly-supervised grounded image captioning (WSGIC) aims to generate the caption and ground (localize) predicted object words in the input image without using bounding box supervision. Recent two-stage solutions mostly apply a bottom-up pipeline: (1) encode the input image into multiple region features using an object detector; (2) leverage region features for captioning and grounding. However, ut… ▽ More

    Submitted 2 March, 2024; v1 submitted 12 June, 2023; originally announced June 2023.

  42. arXiv:2306.07096  [pdf, other

    cs.CV

    Global and Local Semantic Completion Learning for Vision-Language Pre-training

    Authors: Rong-Cheng Tu, Yatai Ji, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu

    Abstract: Cross-modal alignment plays a crucial role in vision-language pre-training (VLP) models, enabling them to capture meaningful associations across different modalities. For this purpose, numerous masked modeling tasks have been proposed for VLP to further promote cross-modal interactions. The core idea of previous masked modeling tasks is to focus on reconstructing the masked tokens based on visible… ▽ More

    Submitted 5 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.13437

  43. arXiv:2306.06547  [pdf, other

    cs.LG stat.ML

    Local-to-global Perspectives on Graph Neural Networks

    Authors: Chen Cai

    Abstract: This thesis presents a local-to-global perspective on graph neural networks (GNN), the leading architecture to process graph-structured data. After categorizing GNN into local Message Passing Neural Networks (MPNN) and global Graph transformers, we present three pieces of work: 1) study the convergence property of a type of global GNN, Invariant Graph Networks, 2) connect the local MPNN and global… ▽ More

    Submitted 18 June, 2023; v1 submitted 10 June, 2023; originally announced June 2023.

    Comments: Ph.D. thesis from UC San Diego that includes three works arXiv:2201.10129, arXiv:2301.11956, arXiv:2102.01350

  44. arXiv:2306.04874  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Expanding Scope: Adapting English Adversarial Attacks to Chinese

    Authors: Hanyu Liu, Chengyuan Cai, Yanjun Qi

    Abstract: Recent studies have revealed that NLP predictive models are vulnerable to adversarial attacks. Most existing studies focused on designing attacks to evaluate the robustness of NLP models in the English language alone. Literature has seen an increasing need for NLP solutions for other languages. We, therefore, ask one natural question: whether state-of-the-art (SOTA) attack methods generalize to ot… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

    Comments: 11 pages; in ACL23 TrustNLP 2023: TrustNLP: Third Workshop on Trustworthy Natural Language Processing Colocated with the Annual Conference of the Association for Computational Linguistics (ACL 2023)

  45. PolarDB-IMCI: A Cloud-Native HTAP Database System at Alibaba

    Authors: Jianying Wang, Tongliang Li, Haoze Song, Xinjun Yang, Wenchao Zhou, Feifei Li, Baoyue Yan, Qianqian Wu, Yukun Liang, Chengjun Ying, Yujie Wang, Baokai Chen, Chang Cai, Yubin Ruan, Xiaoyi Weng, Shibin Chen, Liang Yin, Chengzhong Yang, Xin Cai, Hongyan Xing, Nanlong Yu, Xiaofei Chen, Dapeng Huang, Jianling Sun

    Abstract: Cloud-native databases have become the de-facto choice for mission-critical applications on the cloud due to the need for high availability, resource elasticity, and cost efficiency. Meanwhile, driven by the increasing connectivity between data generation and analysis, users prefer a single database to efficiently process both OLTP and OLAP workloads, which enhances data freshness and reduces the… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: 14 pages, 16 figures, to be published in ACM SIGMOD 2023

  46. arXiv:2304.10547  [pdf, ps, other

    cs.AI cs.HC

    The Design Space of Generative Models

    Authors: Meredith Ringel Morris, Carrie J. Cai, Jess Holbrook, Chinmay Kulkarni, Michael Terry

    Abstract: Card et al.'s classic paper "The Design Space of Input Devices" established the value of design spaces as a tool for HCI analysis and invention. We posit that developing design spaces for emerging pre-trained, generative AI models is necessary for supporting their integration into human-centered systems and practices. We explore what it means to develop an AI model design space by proposing two de… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Journal ref: NeurIps 2022 Human-Centered AI Workshop

  47. arXiv:2304.06178  [pdf, other

    cs.CV cs.GR

    Dynamic Voxel Grid Optimization for High-Fidelity RGB-D Supervised Surface Reconstruction

    Authors: Xiangyu Xu, Lichang Chen, Changjiang Cai, Huangying Zhan, Qingan Yan, Pan Ji, Junsong Yuan, Heng Huang, Yi Xu

    Abstract: Direct optimization of interpolated features on multi-resolution voxel grids has emerged as a more efficient alternative to MLP-like modules. However, this approach is constrained by higher memory expenses and limited representation capabilities. In this paper, we introduce a novel dynamic grid optimization method for high-fidelity 3D surface reconstruction that incorporates both RGB and depth obs… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: For the project, see https://yanqingan.github.io/

  48. arXiv:2304.03442  [pdf, other

    cs.HC cs.AI cs.LG

    Generative Agents: Interactive Simulacra of Human Behavior

    Authors: Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, Michael S. Bernstein

    Abstract: Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; t… ▽ More

    Submitted 5 August, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

  49. arXiv:2303.15671  [pdf, other

    cs.CV

    Colo-SCRL: Self-Supervised Contrastive Representation Learning for Colonoscopic Video Retrieval

    Authors: Qingzhong Chen, Shilun Cai, Crystal Cai, Zefang Yu, Dahong Qian, Suncheng Xiang

    Abstract: Colonoscopic video retrieval, which is a critical part of polyp treatment, has great clinical significance for the prevention and treatment of colorectal cancer. However, retrieval models trained on action recognition datasets usually produce unsatisfactory retrieval results on colonoscopic datasets due to the large domain gap between them. To seek a solution to this problem, we construct a large-… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by ICME 2023

  50. arXiv:2303.12647  [pdf, other

    cs.HC

    A Word is Worth a Thousand Pictures: Prompts as AI Design Material

    Authors: Chinmay Kulkarni, Stefania Druga, Minsuk Chang, Alex Fiannaca, Carrie Cai, Michael Terry

    Abstract: Recent advances in Machine-Learning have led to the development of models that generate images based on a text description.Such large prompt-based text to image models (TTIs), trained on a considerable amount of data, allow the creation of high-quality images by users with no graphics or design training. This paper examines the role such TTI models can playin collaborative, goal-oriented design. T… ▽ More

    Submitted 22 March, 2023; originally announced March 2023.

    Comments: 22 pages, 5 figures