Skip to main content

Showing 1–50 of 110 results for author: Ren, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01615  [pdf, other

    cs.RO

    Three-dimensional Morphological Reconstruction of Millimeter-Scale Soft Continuum Robots based on Dual-Stereo-Vision

    Authors: Tian-Ao Ren, Wenyan Liu, Tao Zhang, Lei Zhao, Hongliang Ren, Jiewen Lai

    Abstract: Continuum robots can be miniaturized to just a few millimeters in diameter. Among these, notched tubular continuum robots (NTCR) show great potential in many delicate applications. Existing works in robotic modeling focus on kinematics and dynamics but still face challenges in reproducing the robot's morphology -- a significant factor that can expand the research landscape of continuum robots, esp… ▽ More

    Submitted 15 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 6 pages, 6 figures, submitted to Robio 2024

  2. arXiv:2407.16291  [pdf, other

    cs.CV cs.RO

    TAPTRv2: Attention-based Position Update Improves Tracking Any Point

    Authors: Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang

    Abstract: In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cos… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  3. arXiv:2407.10448  [pdf, other

    cs.LG stat.ML

    Spectral Representation for Causal Estimation with Hidden Confounders

    Authors: Tongzheng Ren, Haotian Sun, Antoine Moulin, Arthur Gretton, Bo Dai

    Abstract: We address the problem of causal effect estimation where hidden confounders are present, with a focus on two settings: instrumental variable regression with additional observed confounders, and proxy causal learning. Our approach uses a singular value decomposition of a conditional expectation operator, followed by a saddle-point optimization problem, which, in the context of IV regression, can be… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. OMR-NET: a two-stage octave multi-scale residual network for screen content image compression

    Authors: Shiqi Jiang, Ting Ren, Congrui Fu, Shuai Li, Hui Yuan

    Abstract: Screen content (SC) differs from natural scene (NS) with unique characteristics such as noise-free, repetitive patterns, and high contrast. Aiming at addressing the inadequacies of current learned image compression (LIC) methods for SC, we propose an improved two-stage octave convolutional residual blocks (IToRB) for high and low-frequency feature extraction and a cascaded two-stage multi-scale re… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 7 figures, 2 tables

    Journal ref: IEEE Signal Processing Letters, 2024

  5. arXiv:2405.10300  [pdf, other

    cs.CV

    Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection

    Authors: Tianhe Ren, Qing Jiang, Shilong Liu, Zhaoyang Zeng, Wenlong Liu, Han Gao, Hongjie Huang, Zhengyu Ma, Xiaoke Jiang, Yihao Chen, Yuda Xiong, Hao Zhang, Feng Li, Peijun Tang, Kent Yu, Lei Zhang

    Abstract: This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection models developed by IDEA Research, which aims to advance the "Edge" of open-set object detection. The suite encompasses two models: Grounding DINO 1.5 Pro, a high-performance model designed for stronger generalization capability across a wide range of scenarios, and Grounding DINO 1.5 Edge, an efficient model o… ▽ More

    Submitted 31 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: homepage: https://deepdataspace.com/home

  6. arXiv:2405.02654  [pdf, ps, other

    cs.MA cs.AI cs.GT

    Enhancing Cooperation through Selective Interaction and Long-term Experiences in Multi-Agent Reinforcement Learning

    Authors: Tianyu Ren, Xiao-Jun Zeng

    Abstract: The significance of network structures in promoting group cooperation within social dilemmas has been widely recognized. Prior studies attribute this facilitation to the assortment of strategies driven by spatial interactions. Although reinforcement learning has been employed to investigate the impact of dynamic interaction on the evolution of cooperation, there remains a lack of understanding abo… ▽ More

    Submitted 18 August, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: Accepted at IJCAI 2024 (33rd International Joint Conference on Artificial Intelligence - Jeju)

    Journal ref: IJCAI (2024) 193-201;

  7. arXiv:2403.20014  [pdf, other

    cs.DB cs.AI cs.CL

    PURPLE: Making a Large Language Model a Better SQL Writer

    Authors: Tonghui Ren, Yuankai Fan, Zhenying He, Ren Huang, Jiaqi Dai, Can Huang, Yinan Jing, Kai Zhang, Yifan Yang, X. Sean Wang

    Abstract: Large Language Model (LLM) techniques play an increasingly important role in Natural Language to SQL (NL2SQL) translation. LLMs trained by extensive corpora have strong natural language understanding and basic SQL generation abilities without additional tuning specific to NL2SQL tasks. Existing LLMs-based NL2SQL approaches try to improve the translation by enhancing the LLMs with an emphasis on us… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: 12 pages, accepted by ICDE 2024 (40th IEEE International Conference on Data Engineering)

  8. arXiv:2403.14610  [pdf, other

    cs.CV

    T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

    Authors: Qing Jiang, Feng Li, Zhaoyang Zeng, Tianhe Ren, Shilong Liu, Lei Zhang

    Abstract: We present T-Rex2, a highly practical model for open-set object detection. Previous open-set object detection methods relying on text prompts effectively encapsulate the abstract concept of common objects, but struggle with rare or complex object representation due to data scarcity and descriptive limitations. Conversely, visual prompts excel in depicting novel objects through concrete visual exam… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Technical Report

  9. arXiv:2403.13042  [pdf, other

    cs.CV cs.RO

    TAPTR: Tracking Any Point with Transformers as Detection

    Authors: Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Lei Zhang

    Abstract: In this paper, we propose a simple and strong framework for Tracking Any Point with TRansformers (TAPTR). Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP. In the proposed framework, in each video frame, each tracking point is represented as a point query, which consists o… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  10. arXiv:2403.05525  [pdf, other

    cs.AI

    DeepSeek-VL: Towards Real-World Vision-Language Understanding

    Authors: Haoyu Lu, Wen Liu, Bo Zhang, Bingxuan Wang, Kai Dong, Bo Liu, Jingxiang Sun, Tongzheng Ren, Zhuoshu Li, Hao Yang, Yaofeng Sun, Chengqi Deng, Hanwei Xu, Zhenda Xie, Chong Ruan

    Abstract: We present DeepSeek-VL, an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications. Our approach is structured around three key dimensions: We strive to ensure our data is diverse, scalable, and extensively covers real-world scenarios including web screenshots, PDFs, OCR, charts, and knowledge-based content, aiming for a comprehensive represe… ▽ More

    Submitted 11 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: https://github.com/deepseek-ai/DeepSeek-VL

  11. arXiv:2403.00318  [pdf, other

    cs.AI cs.LG

    Deep Reinforcement Learning for Solving Management Problems: Towards A Large Management Mode

    Authors: Jinyang Jiang, Xiaotian Liu, Tao Ren, Qinghao Wang, Yi Zheng, Yufu Du, Yijie Peng, Cheng Zhang

    Abstract: We introduce a deep reinforcement learning (DRL) approach for solving management problems including inventory management, dynamic pricing, and recommendation. This DRL approach has the potential to lead to a large management model based on certain transformer neural network structures, resulting in an artificial general intelligence paradigm for various management tasks. Traditional methods have l… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  12. arXiv:2402.17144  [pdf, other

    cs.DB cs.AI

    Metasql: A Generate-then-Rank Framework for Natural Language to SQL Translation

    Authors: Yuankai Fan, Zhenying He, Tonghui Ren, Can Huang, Yinan Jing, Kai Zhang, X. Sean Wang

    Abstract: The Natural Language Interface to Databases (NLIDB) empowers non-technical users with database access through intuitive natural language (NL) interactions. Advanced approaches, utilizing neural sequence-to-sequence models or large-scale language models, typically employ auto-regressive decoding to generate unique SQL queries sequentially. While these translation models have greatly improved the ov… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  13. arXiv:2402.15813  [pdf, other

    cs.CL cs.GT

    Measuring Bargaining Abilities of LLMs: A Benchmark and A Buyer-Enhancement Method

    Authors: Tian Xia, Zhiwei He, Tong Ren, Yibo Miao, Zhuosheng Zhang, Yang Yang, Rui Wang

    Abstract: Bargaining is an important and unique part of negotiation between humans. As LLM-driven agents learn to negotiate and act like real humans, how to evaluate agents' bargaining abilities remains an open problem. For the first time, we formally described the Bargaining task as an asymmetric incomplete information game, defining the gains of the Buyer and Seller in multiple bargaining processes. It al… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 Findings. The dataset AmazonHistoryPrice and our code are available at https://github.com/TianXiaSJTU/AmazonPriceHistory

  14. arXiv:2402.07354  [pdf, other

    eess.IV cs.CV

    Re-DiffiNet: Modeling discrepancies in tumor segmentation using diffusion models

    Authors: Tianyi Ren, Abhishek Sharma, Juampablo Heras Rivera, Harshitha Rebala, Ethan Honey, Agamdeep Chopra, Jacob Ruzevick, Mehmet Kurt

    Abstract: Identification of tumor margins is essential for surgical decision-making for glioblastoma patients and provides reliable assistance for neurosurgeons. Despite improvements in deep learning architectures for tumor segmentation over the years, creating a fully autonomous system suitable for clinical floors remains a formidable challenge because the model predictions have not yet reached the desired… ▽ More

    Submitted 10 April, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

  15. arXiv:2402.07008  [pdf, other

    eess.IV cs.CV cs.LG

    An Optimization Framework for Processing and Transfer Learning for the Brain Tumor Segmentation

    Authors: Tianyi Ren, Ethan Honey, Harshitha Rebala, Abhishek Sharma, Agamdeep Chopra, Mehmet Kurt

    Abstract: Tumor segmentation from multi-modal brain MRI images is a challenging task due to the limited samples, high variance in shapes and uneven distribution of tumor morphology. The performance of automated medical image segmentation has been significant improvement by the recent advances in deep learning. However, the model predictions have not yet reached the desired level for clinical use in terms of… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

  16. arXiv:2402.04485  [pdf, other

    cs.LG cs.GT

    Incentivized Truthful Communication for Federated Bandits

    Authors: Zhepei Wei, Chuanhao Li, Tianze Ren, Haifeng Xu, Hongning Wang

    Abstract: To enhance the efficiency and practicality of federated bandit learning, recent advances have introduced incentives to motivate communication among clients, where a client participates only when the incentive offered by the server outweighs its participation cost. However, existing incentive mechanisms naively assume the clients are truthful: they all report their true cost and thus the higher cos… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 20 pages, 2 figures. Accepted at ICLR 2024

  17. arXiv:2401.14159  [pdf, other

    cs.CV

    Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

    Authors: Tianhe Ren, Shilong Liu, Ailing Zeng, Jing Lin, Kunchang Li, He Cao, Jiayu Chen, Xinyu Huang, Yukang Chen, Feng Yan, Zhaoyang Zeng, Hao Zhang, Feng Li, Jie Yang, Hongyang Li, Qing Jiang, Lei Zhang

    Abstract: We introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment anything model (SAM). This integration enables the detection and segmentation of any regions based on arbitrary text inputs and opens a door to connecting various vision models. As shown in Fig.1, a wide range of vision tasks can be achieved by using the versatile Grounded SAM pipeline.… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  18. arXiv:2401.11782  [pdf, other

    physics.soc-ph cs.SI q-bio.PE

    Temporal Interaction and its Role in the Evolution of Cooperation

    Authors: Yujie He, Tianyu Ren, Xiao-Jun Zeng, Huawen Liang, Liukai Yu, Junjun Zheng

    Abstract: This research investigates the impact of dynamic, time-varying interactions on cooperative behaviour in social dilemmas. Traditional research has focused on deterministic rules governing pairwise interactions, yet the impact of interaction frequency and synchronization in groups on cooperation remains underexplored. Addressing this gap, our work introduces two temporal interaction mechanisms to mo… ▽ More

    Submitted 18 August, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted at Physical Review E

    Journal ref: Physical Review E (2024), 110, 024210

  19. arXiv:2401.02954  [pdf, other

    cs.CL cs.AI cs.LG

    DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

    Authors: DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, Jianzhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li , et al. (63 additional authors not shown)

    Abstract: The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over scaling LLMs. We delve into the study of scaling laws and present our distinctive findings that facilitate scaling of large scale models in two commonly used open-source configurations, 7B… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

  20. arXiv:2401.01189  [pdf, other

    cs.RO cs.AI

    NID-SLAM: Neural Implicit Representation-based RGB-D SLAM in dynamic environments

    Authors: Ziheng Xu, Jianwei Niu, Qingfeng Li, Tao Ren, Chen Chen

    Abstract: Neural implicit representations have been explored to enhance visual SLAM algorithms, especially in providing high-fidelity dense map. Existing methods operate robustly in static scenes but struggle with the disruption caused by moving objects. In this paper we present NID-SLAM, which significantly improves the performance of neural SLAM in dynamic environments. We propose a new approach to enhanc… ▽ More

    Submitted 16 May, 2024; v1 submitted 2 January, 2024; originally announced January 2024.

  21. arXiv:2312.04547  [pdf, other

    cs.CV cs.AI cs.GR cs.HC

    Digital Life Project: Autonomous 3D Characters with Social Intelligence

    Authors: Zhongang Cai, Jianping Jiang, Zhongfei Qing, Xinying Guo, Mingyuan Zhang, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang, Ziwei Liu

    Abstract: In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models perso… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: Homepage: https://digital-life-project.com/

  22. arXiv:2312.02949  [pdf, other

    cs.CV

    LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

    Authors: Hao Zhang, Hongyang Li, Feng Li, Tianhe Ren, Xueyan Zou, Shilong Liu, Shijia Huang, Jianfeng Gao, Lei Zhang, Chunyuan Li, Jianwei Yang

    Abstract: With the recent significant advancements in large multi-modal models (LMMs), the importance of their grounding capability in visual chat is increasingly recognized. Despite recent efforts to enable LMMs to support grounding, their capabilities for grounding and chat are usually separate, and their chat performance drops dramatically when asked to ground. The problem is the lack of a dataset for gr… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

  23. arXiv:2311.13601  [pdf, other

    cs.CV cs.AI cs.LG

    Visual In-Context Prompting

    Authors: Feng Li, Qing Jiang, Hao Zhang, Tianhe Ren, Shilong Liu, Xueyan Zou, Huaizhe Xu, Hongyang Li, Chunyuan Li, Jianwei Yang, Lei Zhang, Jianfeng Gao

    Abstract: In-context prompting in large language models (LLMs) has become a prevalent approach to improve zero-shot capabilities, but this idea is less explored in the vision domain. Existing visual prompting methods focus on referring segmentation to segment the most relevant object, falling short of addressing many generic vision tasks like open-set segmentation and detection. In this paper, we introduce… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: technical report

  24. arXiv:2311.13596  [pdf, other

    cs.CV

    T-Rex: Counting by Visual Prompting

    Authors: Qing Jiang, Feng Li, Tianhe Ren, Shilong Liu, Zhaoyang Zeng, Kent Yu, Lei Zhang

    Abstract: We introduce T-Rex, an interactive object counting model designed to first detect and then count any objects. We formulate object counting as an open-set object detection task with the integration of visual prompts. Users can specify the objects of interest by marking points or boxes on a reference image, and T-Rex then detects all objects with a similar pattern. Guided by the visual feedback from… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Technical report. Work in progress

  25. arXiv:2311.12244  [pdf, other

    cs.LG cs.AI stat.ML

    Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning

    Authors: Hongming Zhang, Tongzheng Ren, Chenjun Xiao, Dale Schuurmans, Bo Dai

    Abstract: In most real-world reinforcement learning applications, state information is only partially observable, which breaks the Markov decision process assumption and leads to inferior performance for algorithms that conflate observations with state. Partially Observable Markov Decision Processes (POMDPs), on the other hand, provide a general framework that allows for partial observability to be accounte… ▽ More

    Submitted 10 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Comments: The first two authors contribute equally

  26. arXiv:2311.05437  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

    Authors: Shilong Liu, Hao Cheng, Haotian Liu, Hao Zhang, Feng Li, Tianhe Ren, Xueyan Zou, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang, Jianfeng Gao, Chunyuan Li

    Abstract: LLaVA-Plus is a general-purpose multimodal assistant that expands the capabilities of large multimodal models. It maintains a skill repository of pre-trained vision and vision-language models and can activate relevant tools based on users' inputs to fulfill real-world tasks. LLaVA-Plus is trained on multimodal instruction-following data to acquire the ability to use tools, covering visual understa… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

    Comments: 25 pages, 25M file size. Project Page: https://llava-vl.github.io/llava-plus/

  27. arXiv:2309.15461  [pdf, other

    cs.CL

    ChatCounselor: A Large Language Models for Mental Health Support

    Authors: June M. Liu, Donghao Li, He Cao, Tianhe Ren, Zeyi Liao, Jiamin Wu

    Abstract: This paper presents ChatCounselor, a large language model (LLM) solution designed to provide mental health support. Unlike generic chatbots, ChatCounselor is distinguished by its foundation in real conversations between consulting clients and professional psychologists, enabling it to possess specialized knowledge and counseling skills in the field of psychology. The training dataset, Psych8k, was… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

  28. arXiv:2308.15368  [pdf, other

    cs.RO cs.AI eess.SY

    RED: A Systematic Real-Time Scheduling Approach for Robotic Environmental Dynamics

    Authors: Zexin Li, Tao Ren, Xiaoxi He, Cong Liu

    Abstract: Intelligent robots are designed to effectively navigate dynamic and unpredictable environments laden with moving mechanical elements and objects. Such environment-induced dynamics, including moving obstacles, can readily alter the computational demand (e.g., the creation of new tasks) and the structure of workloads (e.g., precedence constraints among tasks) during runtime, thereby adversely affect… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: Accepted by RTSS 2023

  29. arXiv:2307.12972  [pdf, other

    cs.CV

    DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting

    Authors: Hongyang Li, Hao Zhang, Zhaoyang Zeng, Shilong Liu, Feng Li, Tianhe Ren, Lei Zhang

    Abstract: In this paper, we propose a new operator, called 3D DeFormable Attention (DFA3D), for 2D-to-3D feature lifting, which transforms multi-view 2D image features into a unified 3D space for 3D object detection. Existing feature lifting approaches, such as Lift-Splat-based and 2D attention-based, either use estimated depth to get pseudo LiDAR features and then splat them to a 3D space, which is a one-p… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  30. arXiv:2306.17504  [pdf, other

    cs.AI

    Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer

    Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Tianshuo Xu, Xiaoshuai Sun, Tongliang Liu, Rongrong Ji, Dacheng Tao

    Abstract: Deep neural networks often suffer from poor generalization due to complex and non-convex loss landscapes. Sharpness-Aware Minimization (SAM) is a popular solution that smooths the loss landscape by minimizing the maximized change of training loss when adding a perturbation to the weight. However, indiscriminate perturbation of SAM on all parameters is suboptimal and results in excessive computatio… ▽ More

    Submitted 30 June, 2023; originally announced June 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.05177

  31. arXiv:2306.07265  [pdf, other

    cs.CV

    detrex: Benchmarking Detection Transformers

    Authors: Tianhe Ren, Shilong Liu, Feng Li, Hao Zhang, Ailing Zeng, Jie Yang, Xingyu Liao, Ding Jia, Hongyang Li, He Cao, Jianan Wang, Zhaoyang Zeng, Xianbiao Qi, Yuhui Yuan, Jianwei Yang, Lei Zhang

    Abstract: The DEtection TRansformer (DETR) algorithm has received considerable attention in the research community and is gradually emerging as a mainstream approach for object detection and other perception tasks. However, the current field lacks a unified and comprehensive benchmark specifically tailored for DETR-based models. To address this issue, we develop a unified, highly modular, and lightweight co… ▽ More

    Submitted 13 June, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

    Comments: project link: https://github.com/IDEA-Research/detrex

  32. arXiv:2305.15023  [pdf, other

    cs.CV

    Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

    Authors: Gen Luo, Yiyi Zhou, Tianhe Ren, Shengxin Chen, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recently, growing interest has been aroused in extending the multimodal capability of large language models (LLMs), e.g., vision-language (VL) learning, which is regarded as the next milestone of artificial general intelligence. However, existing solutions are prohibitively expensive, which not only need to optimize excessive parameters, but also require another large-scale pre-training before VL… ▽ More

    Submitted 24 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  33. arXiv:2305.11686  [pdf, other

    eess.IV cs.CV cs.RO

    Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards Robot-assisted Intubation

    Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

    Abstract: Robotic-assisted tracheal intubation requires the robot to distinguish anatomical features like an experienced physician using deep-learning techniques. However, real datasets of oropharyngeal organs are limited due to patient privacy issues, making it challenging to train deep-learning models for accurate image segmentation. We hereby consider generating a new data modality through a virtual envi… ▽ More

    Submitted 27 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Extended abstract in IEEE ICRA 2023 Workshop (New Evolutions in Surgical Robotics: Embracing Multimodal Imaging Guidance, Intelligence, and Bio-inspired Mechanisms). arXiv admin note: text overlap with arXiv:2305.10883

  34. arXiv:2305.10883  [pdf, other

    cs.AI cs.CV eess.IV

    Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs

    Authors: Guankun Wang, Tian-Ao Ren, Jiewen Lai, Long Bai, Hongliang Ren

    Abstract: Video-assisted transoral tracheal intubation (TI) necessitates using an endoscope that helps the physician insert a tracheal tube into the glottis instead of the esophagus. The growing trend of robotic-assisted TI would require a medical robot to distinguish anatomical features like an experienced physician which can be imitated by utilizing supervised deep-learning techniques. However, the real d… ▽ More

    Submitted 27 July, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: The manuscript is accepted by Medical & Biological Engineering & Computing. Code and dataset: https://github.com/gkw0010/EISOST-Sim2Real-Dataset-Release

  35. arXiv:2304.13027  [pdf, other

    cs.CV

    A Strong and Reproducible Object Detector with Only Public Datasets

    Authors: Tianhe Ren, Jianwei Yang, Shilong Liu, Ailing Zeng, Feng Li, Hao Zhang, Hongyang Li, Zhaoyang Zeng, Lei Zhang

    Abstract: This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64.6 AP on COCO val2017 and 64.8 AP on COCO test-dev using only 700M parameters without any test time augmentation. It explores the combination of the powerful FocalNet-Huge backbone with the effective Stable-DINO detector. Different from existing SOTA models that utilize an extensive number of pa… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: 64.8 AP on COCO test-dev

  36. arXiv:2304.04742  [pdf, other

    cs.CV

    Detection Transformer with Stable Matching

    Authors: Shilong Liu, Tianhe Ren, Jiayu Chen, Zhaoyang Zeng, Hao Zhang, Feng Li, Hongyang Li, Jun Huang, Hang Su, Jun Zhu, Lei Zhang

    Abstract: This paper is concerned with the matching stability problem across different decoder layers in DEtection TRansformers (DETR). We point out that the unstable matching in DETR is caused by a multi-optimization path problem, which is highlighted by the one-to-one matching design in DETR. To address this problem, we show that the most important design is to use and only use positional metrics (like IO… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: SOTA detector. Project page: https://github.com/IDEA-Research/Stable-DINO

  37. arXiv:2304.03907  [pdf, other

    cs.LG math.OC

    Stochastic Nonlinear Control via Finite-dimensional Spectral Dynamic Embedding

    Authors: Tongzheng Ren, Zhaolin Ren, Haitong Ma, Na Li, Bo Dai

    Abstract: This paper presents an approach, Spectral Dynamics Embedding Control (SDEC), to optimal control for nonlinear stochastic systems. This method leverages an infinite-dimensional feature to linearly represent the state-action value function and exploits finite-dimensional truncation approximation for practical implementation. To characterize the effectiveness of these finite dimensional approximation… ▽ More

    Submitted 20 December, 2023; v1 submitted 8 April, 2023; originally announced April 2023.

    Comments: Compared to v1, added analysis of Nystrom features, more streamlined proofs, and more extensive numerical studies; compared to v2, corrected a small error in ordering of author list

  38. arXiv:2304.01672  [pdf, other

    cs.CV cs.AI

    Motion-R3: Fast and Accurate Motion Annotation via Representation-based Representativeness Ranking

    Authors: Jubo Yu, Tianxiang Ren, Shihui Guo, Fengyi Fang, Kai Wang, Zijiao Zeng, Yazhan Zhang, Andreas Aristidou, Yipeng Qin

    Abstract: In this paper, we follow a data-centric philosophy and propose a novel motion annotation method based on the inherent representativeness of motion data in a given dataset. Specifically, we propose a Representation-based Representativeness Ranking R3 method that ranks all motion data in a given dataset according to their representativeness in a learned motion representation space. We further propos… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

  39. arXiv:2303.14651  [pdf, other

    cs.CV

    You Only Segment Once: Towards Real-Time Panoptic Segmentation

    Authors: Jie Hu, Linyan Huang, Tianhe Ren, Shengchuan Zhang, Rongrong Ji, Liujuan Cao

    Abstract: In this paper, we propose YOSO, a real-time panoptic segmentation framework. YOSO predicts masks via dynamic convolutions between panoptic kernels and image feature maps, in which you only need to segment once for both instance and semantic segmentation tasks. To reduce the computational overhead, we design a feature pyramid aggregator for the feature map extraction, and a separable dynamic decode… ▽ More

    Submitted 26 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  40. arXiv:2303.14457  [pdf, other

    cs.CV cs.AI cs.GR

    Diverse Motion In-betweening with Dual Posture Stitching

    Authors: Tianxiang Ren, Jubo Yu, Shihui Guo, Ying Ma, Yutao Ouyang, Zijiao Zeng, Yazhan Zhang, Yipeng Qin

    Abstract: In-betweening is a technique for generating transitions given initial and target character states. The majority of existing works require multiple (often $>$10) frames as input, which are not always accessible. Our work deals with a focused yet challenging problem: to generate the transition when given exactly two frames (only the first and last). To cope with this challenging scenario, we impleme… ▽ More

    Submitted 25 March, 2023; originally announced March 2023.

    Comments: 10 pages, 5 figures

  41. arXiv:2303.05499  [pdf, other

    cs.CV

    Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

    Authors: Shilong Liu, Zhaoyang Zeng, Tianhe Ren, Feng Li, Hao Zhang, Jie Yang, Qing Jiang, Chunyuan Li, Jianwei Yang, Hang Su, Jun Zhu, Lei Zhang

    Abstract: In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of open-set object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively f… ▽ More

    Submitted 19 July, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: Code will be available at https://github.com/IDEA-Research/GroundingDINO

  42. arXiv:2301.03749  [pdf, other

    stat.ML cs.LG

    Markovian Sliced Wasserstein Distances: Beyond Independent Projections

    Authors: Khai Nguyen, Tongzheng Ren, Nhat Ho

    Abstract: Sliced Wasserstein (SW) distance suffers from redundant projections due to independent uniform random projecting directions. To partially overcome the issue, max K sliced Wasserstein (Max-K-SW) distance ($K\geq 1$), seeks the best discriminative orthogonal projecting directions. Despite being able to reduce the number of projections, the metricity of Max-K-SW cannot be guaranteed in practice due t… ▽ More

    Submitted 31 December, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted to NeurIPS 2023, 29 pages, 8 figures, 5 tables

  43. arXiv:2212.13771  [pdf, other

    cs.CV

    Exploring Vision Transformers as Diffusion Learners

    Authors: He Cao, Jianan Wang, Tianhe Ren, Xianbiao Qi, Yihao Chen, Yuan Yao, Lei Zhang

    Abstract: Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  44. arXiv:2212.08765  [pdf, other

    cs.LG stat.ML

    Latent Variable Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Chenjun Xiao, Tianjun Zhang, Na Li, Zhaoran Wang, Sujay Sanghavi, Dale Schuurmans, Bo Dai

    Abstract: Deep latent variable models have achieved significant empirical successes in model-based reinforcement learning (RL) due to their expressiveness in modeling complex transition dynamics. On the other hand, it remains unclear theoretically and empirically how latent variable models may facilitate learning, planning, and exploration to improve the sample efficiency of RL. In this paper, we provide a… ▽ More

    Submitted 7 March, 2023; v1 submitted 16 December, 2022; originally announced December 2022.

    Comments: ICLR 2023. The first two authors contribute equally. Project Website: https://rlrep.github.io/lvrep/

  45. arXiv:2211.16641   

    econ.GN cs.CE cs.CY

    Predicting China's CPI by Scanner Big Data

    Authors: Zhenkun Zhou, Zikun Song, Tao Ren

    Abstract: Scanner big data has potential to construct Consumer Price Index (CPI). This work utilizes the scanner data of supermarket retail sales, which are provided by China Ant Business Alliance (CAA), to construct the Scanner-data Food Consumer Price Index (S-FCPI) in China, and the index reliability is verified by other macro indicators, especially by China's CPI. And not only that, we build multiple ma… ▽ More

    Submitted 6 October, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: We have updated the paper with more results

  46. arXiv:2210.05794  [pdf, other

    cs.LG cs.CL cs.CV

    Designing Robust Transformers using Robust Kernel Density Estimation

    Authors: Xing Han, Tongzheng Ren, Tan Minh Nguyen, Khai Nguyen, Joydeep Ghosh, Nhat Ho

    Abstract: Recent advances in Transformer architectures have empowered their empirical success in a variety of tasks across different domains. However, existing works mainly focus on predictive accuracy and computational cost, without considering other practical issues, such as robustness to contaminated samples. Recent work by Nguyen et al., (2022) has shown that the self-attention mechanism, which is the c… ▽ More

    Submitted 8 November, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: Accepted by NeurIPS 2023 as a poster; 23 pages, 5 figures, 11 tables

  47. arXiv:2210.05177  [pdf, other

    cs.LG cs.AI cs.CV math.OC

    Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach

    Authors: Peng Mi, Li Shen, Tianhe Ren, Yiyi Zhou, Xiaoshuai Sun, Rongrong Ji, Dacheng Tao

    Abstract: Deep neural networks often suffer from poor generalization caused by complex and non-convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization (SAM), which smooths the loss landscape via minimizing the maximized change of training loss when adding a perturbation to the weight. However, we find the indiscriminate perturbation of SAM on all parameters is suboptimal, which… ▽ More

    Submitted 23 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

    Comments: 20 pages, 5figures, accepted by NeurIPS 2022

  48. arXiv:2209.13570  [pdf, other

    stat.ML cs.LG

    Hierarchical Sliced Wasserstein Distance

    Authors: Khai Nguyen, Tongzheng Ren, Huy Nguyen, Litu Rout, Tan Nguyen, Nhat Ho

    Abstract: Sliced Wasserstein (SW) distance has been widely used in different application scenarios since it can be scaled to a large number of supports without suffering from the curse of dimensionality. The value of sliced Wasserstein distance is the average of transportation cost between one-dimensional representations (projections) of original measures that are obtained by Radon Transform (RT). Despite i… ▽ More

    Submitted 6 February, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted to ICLR 2023, 29 pages, 8 figures, 3 tables,

  49. arXiv:2208.09515  [pdf, other

    cs.LG stat.ML

    Spectral Decomposition Representation for Reinforcement Learning

    Authors: Tongzheng Ren, Tianjun Zhang, Lisa Lee, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: Representation learning often plays a critical role in reinforcement learning by managing the curse of dimensionality. A representative class of algorithms exploits a spectral decomposition of the stochastic transition dynamics to construct representations that enjoy strong theoretical properties in an idealized setting. However, current spectral methods suffer from limited applicability because t… ▽ More

    Submitted 7 March, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

    Comments: ICLR 2023. The first two authors contribute equally

  50. arXiv:2207.07150  [pdf, other

    cs.LG stat.ML

    Making Linear MDPs Practical via Contrastive Representation Learning

    Authors: Tianjun Zhang, Tongzheng Ren, Mengjiao Yang, Joseph E. Gonzalez, Dale Schuurmans, Bo Dai

    Abstract: It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we… ▽ More

    Submitted 7 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: ICML 2022. The first two authors contribute equally