Skip to main content

Showing 1–50 of 57 results for author: Gong, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07385  [pdf, other

    cs.IT eess.SP

    Iterative Equalization of CPM With Unitary Approximate Message Passing

    Authors: Zilong Liu, Yi Song, Qinghua Guo, Peng Sun, Kexian Gong, Zhongyong Wang

    Abstract: Continuous phase modulation (CPM) has extensive applications in wireless communications due to its high spectral and power efficiency. However, its nonlinear characteristics pose significant challenges for detection in frequency selective fading channels. This paper proposes an iterative receiver tailored for the detection of CPM signals over frequency selective fading channels. This design levera… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2408.01732  [pdf, other

    cs.CV cs.AI

    Landmark-guided Diffusion Model for High-fidelity and Temporally Coherent Talking Head Generation

    Authors: Jintao Tan, Xize Cheng, Lingyu Xiong, Lei Zhu, Xiandong Li, Xianjia Wu, Kai Gong, Minglei Li, Yi Cai

    Abstract: Audio-driven talking head generation is a significant and challenging task applicable to various fields such as virtual avatars, film production, and online conferences. However, the existing GAN-based models emphasize generating well-synchronized lip shapes but overlook the visual quality of generated frames, while diffusion-based models prioritize generating high-quality frames but neglect lip s… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  3. arXiv:2407.09486  [pdf, other

    cs.DC cs.AI

    ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving

    Authors: Tao Huang, Pengfei Chen, Kyoka Gong, Jocky Hawk, Zachary Bright, Wenxin Xie, Kecheng Huang, Zhi Ji

    Abstract: Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, mo… ▽ More

    Submitted 17 May, 2024; originally announced July 2024.

  4. arXiv:2405.14802  [pdf, other

    eess.IV cs.CV

    Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation

    Authors: Hongxu Jiang, Muhammad Imran, Linhai Ma, Teng Zhang, Yuyin Zhou, Muxuan Liang, Kuang Gong, Wei Shao

    Abstract: Denoising diffusion probabilistic models (DDPMs) have achieved unprecedented success in computer vision. However, they remain underutilized in medical imaging, a field crucial for disease diagnosis and treatment planning. This is primarily due to the high computational cost associated with (1) the use of large number of time steps (e.g., 1,000) in diffusion processes and (2) the increased dimensio… ▽ More

    Submitted 23 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  5. arXiv:2401.17593  [pdf, other

    eess.IV cs.CV physics.med-ph

    Head and Neck Tumor Segmentation from [18F]F-FDG PET/CT Images Based on 3D Diffusion Model

    Authors: Yafei Dong, Kuang Gong

    Abstract: Head and neck (H&N) cancers are among the most prevalent types of cancer worldwide, and [18F]F-FDG PET/CT is widely used for H&N cancer management. Recently, the diffusion model has demonstrated remarkable performance in various image-generation tasks. In this work, we proposed a 3D diffusion model to accurately perform H&N tumor segmentation from 3D PET and CT volumes. The 3D diffusion model was… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Comments: 28 pages, 5 figures

  6. arXiv:2401.14405  [pdf, other

    cs.CV cs.AI cs.LG

    Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities

    Authors: Yiyuan Zhang, Xiaohan Ding, Kaixiong Gong, Yixiao Ge, Ying Shan, Xiangyu Yue

    Abstract: We propose to improve transformers of a specific modality with irrelevant data from other modalities, e.g., improve an ImageNet model with audio or point cloud datasets. We would like to highlight that the data samples of the target modality are irrelevant to the other modalities, which distinguishes our method from other works utilizing paired (e.g., CLIP) or interleaved data of different modalit… ▽ More

    Submitted 18 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: CVPR 2024. Code and models are available at https://github.com/AILab-CVC/M2PT

  7. arXiv:2401.11115  [pdf, other

    cs.CV

    MotionMix: Weakly-Supervised Diffusion for Controllable Motion Generation

    Authors: Nhat M. Hoang, Kehong Gong, Chuan Guo, Michael Bi Mi

    Abstract: Controllable generation of 3D human motions becomes an important topic as the world embraces digital transformation. Existing works, though making promising progress with the advent of diffusion models, heavily rely on meticulously captured and annotated (e.g., text) high-quality motion corpus, a resource-intensive endeavor in the real world. This motivates our proposed MotionMix, a simple yet eff… ▽ More

    Submitted 24 January, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted at the 38th Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, Main Conference

  8. arXiv:2312.10877  [pdf, other

    cs.CV

    Mimic: Speaking Style Disentanglement for Speech-Driven 3D Facial Animation

    Authors: Hui Fu, Zeqing Wang, Ke Gong, Keze Wang, Tianshui Chen, Haojie Li, Haifeng Zeng, Wenxiong Kang

    Abstract: Speech-driven 3D facial animation aims to synthesize vivid facial animations that accurately synchronize with speech and match the unique speaking style. However, existing works primarily focus on achieving precise lip synchronization while neglecting to model the subject-specific speaking style, often resulting in unrealistic facial animations. To the best of our knowledge, this work makes the fi… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 7 pages, 6 figures, accepted by AAAI-24

  9. arXiv:2312.04963  [pdf, other

    cs.CV cs.AI

    Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

    Authors: Lihe Ding, Shaocong Dong, Zhanpeng Huang, Zibin Wang, Yiyuan Zhang, Kaixiong Gong, Dan Xu, Tianfan Xue

    Abstract: Most 3D generation research focuses on up-projecting 2D foundation models into the 3D space, either by minimizing 2D Score Distillation Sampling (SDS) loss or fine-tuning on multi-view datasets. Without explicit 3D priors, these methods often lead to geometric anomalies and multi-view inconsistency. Recently, researchers have attempted to improve the genuineness of 3D objects by directly training… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  10. arXiv:2312.03700  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    OneLLM: One Framework to Align All Modalities with Language

    Authors: Jiaming Han, Kaixiong Gong, Yiyuan Zhang, Jiaqi Wang, Kaipeng Zhang, Dahua Lin, Yu Qiao, Peng Gao, Xiangyu Yue

    Abstract: Multimodal large language models (MLLMs) have gained significant attention due to their strong multimodal understanding capability. However, existing works rely heavily on modality-specific encoders, which usually differ in architecture and are limited to common modalities. In this paper, we present OneLLM, an MLLM that aligns eight modalities to language using a unified framework. We achieve this… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: Code: https://github.com/csuhan/OneLLM

  11. arXiv:2310.10008  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Unified and Effective Domain Generalization

    Authors: Yiyuan Zhang, Kaixiong Gong, Xiaohan Ding, Kaipeng Zhang, Fangrui Lv, Kurt Keutzer, Xiangyu Yue

    Abstract: We propose $\textbf{UniDG}$, a novel and $\textbf{Uni}$fied framework for $\textbf{D}$omain $\textbf{G}$eneralization that is capable of significantly enhancing the out-of-distribution generalization performance of foundation models regardless of their architectures. The core idea of UniDG is to finetune models during the inference stage, which saves the cost of iterative training. Specifically, w… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Project Website: https://invictus717.github.io/Generalization/

  12. arXiv:2310.02776  [pdf, other

    cs.CV

    Dynamic Shuffle: An Efficient Channel Mixture Method

    Authors: Kaijun Gong, Zhuowen Yin, Yushu Li, Kailing Guo, Xiangmin Xu

    Abstract: The redundancy of Convolutional neural networks not only depends on weights but also depends on inputs. Shuffling is an efficient operation for mixing channel information but the shuffle order is usually pre-defined. To reduce the data-dependent redundancy, we devise a dynamic shuffle module to generate data-dependent permutation matrices for shuffling. Since the dimension of permutation matrix is… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  13. arXiv:2308.14480  [pdf, other

    cs.CV cs.MM

    Priority-Centric Human Motion Generation in Discrete Latent Space

    Authors: Hanyang Kong, Kehong Gong, Dongze Lian, Michael Bi Mi, Xinchao Wang

    Abstract: Text-to-motion generation is a formidable task, aiming to produce human motions that align with the input text while also adhering to human capabilities and physical laws. While there have been advancements in diffusion models, their application in discrete spaces remains underexplored. Current methods often overlook the varying significance of different motions, treating them uniformly. It is ess… ▽ More

    Submitted 30 August, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted by ICCV2023

  14. arXiv:2307.10802  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    Meta-Transformer: A Unified Framework for Multimodal Learning

    Authors: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue

    Abstract: Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Project website: https://kxgong.github.io/meta_transformer/

  15. arXiv:2306.11984  [pdf, ps, other

    eess.IV cs.AI cs.CV

    TauPETGen: Text-Conditional Tau PET Image Synthesis Based on Latent Diffusion Models

    Authors: Se-In Jang, Cristina Lois, Emma Thibault, J. Alex Becker, Yafei Dong, Marc D. Normandin, Julie C. Price, Keith A. Johnson, Georges El Fakhri, Kuang Gong

    Abstract: In this work, we developed a novel text-guided image synthesis technique which could generate realistic tau PET images from textual descriptions and the subject's MR image. The generated tau PET images have the potential to be used in examining relations between different measures and also increasing the public availability of tau PET datasets. The method was based on latent diffusion models. Both… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

  16. arXiv:2304.02419  [pdf, other

    cs.CV

    TM2D: Bimodality Driven 3D Dance Generation via Music-Text Integration

    Authors: Kehong Gong, Dongze Lian, Heng Chang, Chuan Guo, Zihang Jiang, Xinxin Zuo, Michael Bi Mi, Xinchao Wang

    Abstract: We propose a novel task for generating 3D dance movements that simultaneously incorporate both text and music modalities. Unlike existing works that generate dance movements using a single modality such as music, our goal is to produce richer dance movements guided by the instructive information provided by the text. However, the lack of paired motion data with both music and text modalities limit… ▽ More

    Submitted 1 October, 2023; v1 submitted 5 April, 2023; originally announced April 2023.

    Comments: Accepted by ICCV2023

  17. arXiv:2302.03861  [pdf

    eess.IV cs.CV

    SwinCross: Cross-modal Swin Transformer for Head-and-Neck Tumor Segmentation in PET/CT Images

    Authors: Gary Y. Li, Junyu Chen, Se-In Jang, Kuang Gong, Quanzheng Li

    Abstract: Radiotherapy (RT) combined with cetuximab is the standard treatment for patients with inoperable head and neck cancers. Segmentation of head and neck (H&N) tumors is a prerequisite for radiotherapy planning but a time-consuming process. In recent years, deep convolutional neural networks have become the de facto standard for automated image segmentation. However, due to the expensive computational… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 9 pages, 3 figures. Med Phys. 2023

  18. arXiv:2212.10724  [pdf

    eess.IV cs.CV

    Investigation of Network Architecture for Multimodal Head-and-Neck Tumor Segmentation

    Authors: Ye Li, Junyu Chen, Se-in Jang, Kuang Gong, Quanzheng Li

    Abstract: Inspired by the recent success of Transformers for Natural Language Processing and vision Transformer for Computer Vision, many researchers in the medical imaging community have flocked to Transformer-based networks for various main stream medical tasks such as classification, segmentation, and estimation. In this study, we analyze, two recently published Transformer-based network architectures fo… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted for oral presentation by IEEE Medical Imaging Conference 2022

  19. arXiv:2209.06167  [pdf, other

    eess.IV cs.CV physics.med-ph

    PET image denoising based on denoising diffusion probabilistic models

    Authors: Kuang Gong, Keith A. Johnson, Georges El Fakhri, Quanzheng Li, Tinsu Pan

    Abstract: Due to various physical degradation factors and limited counts received, PET image quality needs further improvements. The denoising diffusion probabilistic models (DDPM) are distribution learning-based models, which try to transform a normal distribution into a specific data distribution based on iterative refinements. In this work, we proposed and evaluated different DDPM-based methods for PET i… ▽ More

    Submitted 14 September, 2022; v1 submitted 13 September, 2022; originally announced September 2022.

    Comments: 8 figures

  20. arXiv:2209.03300  [pdf, ps, other

    eess.IV cs.CV

    Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising

    Authors: Se-In Jang, Tinsu Pan, Ye Li, Pedram Heidari, Junyu Chen, Quanzheng Li, Kuang Gong

    Abstract: Position emission tomography (PET) is widely used in clinics and research due to its quantitative merits and high sensitivity, but suffers from low signal-to-noise ratio (SNR). Recently convolutional neural networks (CNNs) have been widely used to improve PET image quality. Though successful and efficient in local feature extraction, CNN cannot capture long-range dependencies well due to its limit… ▽ More

    Submitted 10 December, 2023; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: 15 pages

  21. arXiv:2204.14195  [pdf, other

    cs.CV

    Improving Transferability for Domain Adaptive Detection Transformers

    Authors: Kaixiong Gong, Shuang Li, Shugang Li, Rui Zhang, Chi Harold Liu, Qiang Chen

    Abstract: DETR-style detectors stand out amongst in-domain scenarios, but their properties in domain shift settings are under-explored. This paper aims to build a simple but effective baseline with a DETR-style detector on domain shift settings based on two findings. For one, mitigating the domain shift on the backbone and the decoder output features excels in getting favorable results. For another, advance… ▽ More

    Submitted 2 August, 2022; v1 submitted 29 April, 2022; originally announced April 2022.

  22. arXiv:2203.15625  [pdf, other

    cs.CV

    PoseTriplet: Co-evolving 3D Human Pose Estimation, Imitation, and Hallucination under Self-supervision

    Authors: Kehong Gong, Bingbing Li, Jianfeng Zhang, Tao Wang, Jing Huang, Michael Bi Mi, Jiashi Feng, Xinchao Wang

    Abstract: Existing self-supervised 3D human pose estimation schemes have largely relied on weak supervisions like consistency loss to guide the learning, which, inevitably, leads to inferior results in real-world scenarios with unseen poses. In this paper, we propose a novel self-supervised approach that allows us to explicitly generate 2D-3D pose pairs for augmenting supervision, through a self-enhancing d… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: CVPR 2022 Oral Paper, code available: https://github.com/Garfield-kh/PoseTriplet

  23. arXiv:2203.08034  [pdf

    eess.IV cs.CV cs.LG physics.med-ph

    A Noise-level-aware Framework for PET Image Denoising

    Authors: Ye Li, Jianan Cui, Junyu Chen, Guodong Zeng, Scott Wollenweber, Floris Jansen, Se-In Jang, Kyungsang Kim, Kuang Gong, Quanzheng Li

    Abstract: In PET, the amount of relative (signal-dependent) noise present in different body regions can be significantly different and is inherently related to the number of counts present in that region. The number of counts in a region depends, in principle and among other factors, on the total administered activity, scanner sensitivity, image acquisition duration, radiopharmaceutical tracer uptake in the… ▽ More

    Submitted 15 March, 2022; originally announced March 2022.

  24. arXiv:2202.07971  [pdf, other

    math.PR cs.PF

    Large-System Insensitivity of Zero-Waiting Load Balancing Algorithms

    Authors: Xin Liu, Kang Gong, Lei Ying

    Abstract: This paper studies the sensitivity (or insensitivity) of a class of load balancing algorithms that achieve asymptotic zero-waiting in the sub-Halfin-Whitt regime, named LB-zero. Most existing results on zero-waiting load balancing algorithms assume the service time distribution is exponential. This paper establishes the {\em large-system insensitivity} of LB-zero for jobs whose service time follow… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

  25. arXiv:2201.01443  [pdf, other

    eess.IV cs.CV physics.med-ph

    Neural KEM: A Kernel Method with Deep Coefficient Prior for PET Image Reconstruction

    Authors: Siqi Li, Kuang Gong, Ramsey D. Badawi, Edward J. Kim, Jinyi Qi, Guobao Wang

    Abstract: Image reconstruction of low-count positron emission tomography (PET) data is challenging. Kernel methods address the challenge by incorporating image prior information in the forward model of iterative PET image reconstruction. The kernelized expectation-maximization (KEM) algorithm has been developed and demonstrated to be effective and easy to implement. A common approach for a further improveme… ▽ More

    Submitted 24 October, 2022; v1 submitted 4 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.01174

  26. arXiv:2112.04137  [pdf, other

    cs.LG

    Pareto Domain Adaptation

    Authors: Fangrui Lv, Jian Liang, Kaixiong Gong, Shuang Li, Chi Harold Liu, Han Li, Di Liu, Guoren Wang

    Abstract: Domain adaptation (DA) attempts to transfer the knowledge from a labeled source domain to an unlabeled target domain that follows different distribution from the source. To achieve this, DA methods include a source classification objective to extract the source knowledge and a domain alignment objective to diminish the domain shift, ensuring knowledge transfer. Typically, former DA methods adopt s… ▽ More

    Submitted 9 December, 2021; v1 submitted 8 December, 2021; originally announced December 2021.

    Comments: Accepted in NeurIPS 2021

  27. arXiv:2109.09161  [pdf, other

    cs.CL eess.AS

    Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

    Authors: Guolin Zheng, Yubei Xiao, Ke Gong, Pan Zhou, Xiaodan Liang, Liang Lin

    Abstract: Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply cascade pre-trained acoustic and language models to learn the transfer from speech to text. However, how to solve the representation discrepancy of speech and text i… ▽ More

    Submitted 9 October, 2021; v1 submitted 19 September, 2021; originally announced September 2021.

  28. arXiv:2106.10359  [pdf, other

    eess.IV cs.CV physics.med-ph

    Direct Reconstruction of Linear Parametric Images from Dynamic PET Using Nonlocal Deep Image Prior

    Authors: Kuang Gong, Ciprian Catana, Jinyi Qi, Quanzheng Li

    Abstract: Direct reconstruction methods have been developed to estimate parametric images directly from the measured PET sinograms by combining the PET imaging model and tracer kinetics in an integrated framework. Due to limited counts received, signal-to-noise-ratio (SNR) and resolution of parametric images produced by direct reconstruction frameworks are still limited. Recently supervised deep learning me… ▽ More

    Submitted 18 June, 2021; originally announced June 2021.

    Comments: 10 pages, 10 figures

  29. arXiv:2105.02465  [pdf, other

    cs.CV

    PoseAug: A Differentiable Pose Augmentation Framework for 3D Human Pose Estimation

    Authors: Kehong Gong, Jianfeng Zhang, Jiashi Feng

    Abstract: Existing 3D human pose estimators suffer poor generalization performance to new datasets, largely due to the limited diversity of 2D-3D pose pairs in the training data. To address this problem, we present PoseAug, a new auto-augmentation framework that learns to augment the available training poses towards a greater diversity and thus improve generalization of the trained 2D-to-3D pose estimator.… ▽ More

    Submitted 6 May, 2021; originally announced May 2021.

    Comments: CVPR 2021 Oral Paper, code available: https://github.com/jfzhang95/PoseAug

  30. arXiv:2103.12579  [pdf, other

    cs.CV

    MetaSAug: Meta Semantic Augmentation for Long-Tailed Visual Recognition

    Authors: Shuang Li, Kaixiong Gong, Chi Harold Liu, Yulin Wang, Feng Qiao, Xinjing Cheng

    Abstract: Real-world training data usually exhibits long-tailed distribution, where several majority classes have a significantly larger number of samples than the remaining minority classes. This imbalance degrades the performance of typical supervised learning algorithms designed for balanced training sets. In this paper, we address this issue by augmenting minority classes with a recently proposed implic… ▽ More

    Submitted 7 April, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

    Comments: Accepted at CVPR 2021

  31. arXiv:2103.12562  [pdf, other

    cs.CV

    Transferable Semantic Augmentation for Domain Adaptation

    Authors: Shuang Li, Mixue Xie, Kaixiong Gong, Chi Harold Liu, Yulin Wang, Wei Li

    Abstract: Domain adaptation has been widely explored by transferring the knowledge from a label-rich source domain to a related but unlabeled target domain. Most existing domain adaptation algorithms attend to adapting feature representations across two domains with the guidance of a shared source-supervised classifier. However, such classifier limits the generalization ability towards unlabeled target reco… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: Accepted as CVPR 2021. The code is publicly available at https://github.com/BIT-DA/TSA

  32. arXiv:2103.10047  [pdf, other

    cs.CV

    Similarity Transfer for Knowledge Distillation

    Authors: Haoran Zhao, Kun Gong, Xin Sun, Junyu Dong, Hui Yu

    Abstract: Knowledge distillation is a popular paradigm for learning portable neural networks by transferring the knowledge from a large model into a smaller one. Most existing approaches enhance the student model by utilizing the similarity information between the categories of instance level provided by the teacher model. However, these works ignore the similarity correlation between different instances th… ▽ More

    Submitted 18 March, 2021; originally announced March 2021.

  33. arXiv:2101.10620  [pdf, other

    cs.CV

    Graphonomy: Universal Image Parsing via Graph Reasoning and Transfer

    Authors: Liang Lin, Yiming Gao, Ke Gong, Meng Wang, Xiaodan Liang

    Abstract: Prior highly-tuned image parsing models are usually studied in a certain domain with a specific set of semantic labels and can hardly be adapted into other scenarios (e.g., sharing discrepant label granularity) without extensive re-training. Learning a single universal parsing model by unifying label annotations from different domains or at various levels of granularity is a crucial but rarely add… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: To appear in IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (T-PAMI) 2021. We propose a graph reasoning and transfer learning framework, which incorporates human knowledge and label taxonomy into the intermediate graph representation learning beyond local convolutions. arXiv admin note: substantial text overlap with arXiv:1904.04536

  34. arXiv:2012.11896  [pdf, other

    cs.CL cs.SD eess.AS

    Adversarial Meta Sampling for Multilingual Low-Resource Speech Recognition

    Authors: Yubei Xiao, Ke Gong, Pan Zhou, Guolin Zheng, Xiaodan Liang, Liang Lin

    Abstract: Low-resource automatic speech recognition (ASR) is challenging, as the low-resource target language data cannot well train an ASR model. To solve this issue, meta-learning formulates ASR for each source language into many small ASR tasks and meta-learns a model initialization on all tasks from different source languages to access fast adaptation on unseen target languages. However, for different s… ▽ More

    Submitted 12 April, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: accepted in AAAI2021

  35. arXiv:2009.06129  [pdf, other

    eess.IV cs.LG physics.med-ph

    Super Resolution of Arterial Spin Labeling MR Imaging Using Unsupervised Multi-Scale Generative Adversarial Network

    Authors: Jianan Cui, Kuang Gong, Paul Han, Huafeng Liu, Quanzheng Li

    Abstract: Arterial spin labeling (ASL) magnetic resonance imaging (MRI) is a powerful imaging technology that can measure cerebral blood flow (CBF) quantitatively. However, since only a small portion of blood is labeled compared to the whole tissue volume, conventional ASL suffers from low signal-to-noise ratio (SNR), poor spatial resolution, and long acquisition time. In this paper, we proposed a super-res… ▽ More

    Submitted 13 September, 2020; originally announced September 2020.

    Comments: Accepted to 2020 MICCAI MLMI workshop

  36. arXiv:2009.05901  [pdf

    physics.med-ph cs.LG eess.IV

    Clinically Translatable Direct Patlak Reconstruction from Dynamic PET with Motion Correction Using Convolutional Neural Network

    Authors: Nuobei Xie, Kuang Gong, Ning Guo, Zhixing Qin, Jianan Cui, Zhifang Wu, Huafeng Liu, Quanzheng Li

    Abstract: Patlak model is widely used in 18F-FDG dynamic positron emission tomography (PET) imaging, where the estimated parametric images reveal important biochemical and physiology information. Because of better noise modeling and more information extracted from raw sinogram, direct Patlak reconstruction gains its popularity over the indirect approach which utilizes reconstructed dynamic PET images alone.… ▽ More

    Submitted 12 September, 2020; originally announced September 2020.

    Comments: Accepted to MICCAI 2020

  37. arXiv:2006.06935  [pdf, ps, other

    physics.soc-ph cs.SI

    Effects of heterogeneous self-protection awareness on resource-epidemic coevolution dynamics

    Authors: Xiaolong Chen, Kai Gong, Ruijie Wang, Shimin Cai, Wei Wang

    Abstract: Recent studies have demonstrated that the allocation of individual resources has a significant influence on the dynamics of epidemic spreading. In the real scenario, individuals have a different level of awareness for self-protection when facing the outbreak of an epidemic. To investigate the effects of the heterogeneous self-awareness distribution on the epidemic dynamics, we propose a resource-e… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

  38. arXiv:2005.04777  [pdf, other

    cs.CV

    Photometric Multi-View Mesh Refinement for High-Resolution Satellite Images

    Authors: Mathias Rothermel, Ke Gong, Dieter Fritsch, Konrad Schindler, Norbert Haala

    Abstract: Modern high-resolution satellite sensors collect optical imagery with ground sampling distances (GSDs) of 30-50cm, which has sparked a renewed interest in photogrammetric 3D surface reconstruction from satellite data. State-of-the-art reconstruction methods typically generate 2.5D elevation data. Here, we present an approach to recover full 3D surface meshes from multi-view satellite imagery. The… ▽ More

    Submitted 12 May, 2020; v1 submitted 10 May, 2020; originally announced May 2020.

    Comments: Accepted for publication in ISPRS Journal of Photogrammetry and Remote Sensing

  39. arXiv:2004.06272  [pdf, other

    cs.CV cs.LG eess.IV

    Bidirectional Graph Reasoning Network for Panoptic Segmentation

    Authors: Yangxin Wu, Gengwei Zhang, Yiming Gao, Xiajun Deng, Ke Gong, Xiaodan Liang, Liang Lin

    Abstract: Recent researches on panoptic segmentation resort to a single end-to-end network to combine the tasks of instance segmentation and semantic segmentation. However, prior models only unified the two related tasks at the architectural level via a multi-branch scheme or revealed the underlying correlation between them by unidirectional feature fusion, which disregards the explicit semantic and co-occu… ▽ More

    Submitted 13 April, 2020; originally announced April 2020.

    Comments: CVPR2020

  40. arXiv:1912.07180  [pdf

    physics.med-ph cs.LG eess.IV

    Penalized-likelihood PET Image Reconstruction Using 3D Structural Convolutional Sparse Coding

    Authors: Nuobei Xie, Kuang Gong, Ning Guo, Zhixin Qin, Zhifang Wu, Huafeng Liu, Quanzheng Li

    Abstract: Positron emission tomography (PET) is widely used for clinical diagnosis. As PET suffers from low resolution and high noise, numerous efforts try to incorporate anatomical priors into PET image reconstruction, especially with the development of hybrid PET/CT and PET/MRI systems. In this work, we proposed a novel 3D structural convolutional sparse coding (CSC) concept for penalized-likelihood PET i… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

    Comments: 11 pages, 12 figures

  41. arXiv:1910.01923  [pdf, other

    cs.CV

    Layout-Graph Reasoning for Fashion Landmark Detection

    Authors: Weijiang Yu, Xiaodan Liang, Ke Gong, Chenhan Jiang, Nong Xiao, Liang Lin

    Abstract: Detecting dense landmarks for diverse clothes, as a fundamental technique for clothes analysis, has attracted increasing research attention due to its huge application potential. However, due to the lack of modeling underlying semantic layout constraints among landmarks, prior works often detect ambiguous and structure-inconsistent landmarks of multiple overlapped clothes in one person. In this pa… ▽ More

    Submitted 4 October, 2019; originally announced October 2019.

    Comments: 9 pages, 5 figures, CVPR2019

    MSC Class: I.4.9

  42. The citation advantage of foreign language references for Chinese social science papers

    Authors: Kaile Gong, Juan Xie, Ying Cheng, Vincent Larivière, Cassidy R. Sugimoto

    Abstract: Contemporary scientific exchanges are international, yet language continues to be a persistent barrier to scientific communication, particularly for non-native English-speaking scholars. Since the ability to absorb knowledge has a strong impact on how researchers create new scientific knowledge, a comprehensive access to and understanding of both domestic and international scientific publications… ▽ More

    Submitted 20 August, 2019; originally announced August 2019.

    Comments: 24 pages, 9 figures, 10 tables

    Journal ref: Scientometrics,2019,120(3):1439-1460

  43. arXiv:1906.03639  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples

    Authors: Dufan Wu, Kuang Gong, Kyungsang Kim, Quanzheng Li

    Abstract: Deep neural networks have been proved efficient for medical image denoising. Current training methods require both noisy and clean images. However, clean images cannot be acquired for many practical medical applications due to naturally noisy signal, such as dynamic imaging, spectral computed tomography, arterial spin labeling magnetic resonance imaging, etc. In this paper we proposed a training m… ▽ More

    Submitted 9 June, 2019; originally announced June 2019.

    Comments: 9 pages, 2 figures, accepted by MICCAI 2019

  44. arXiv:1904.04536  [pdf, other

    cs.CV

    Graphonomy: Universal Human Parsing via Graph Transfer Learning

    Authors: Ke Gong, Yiming Gao, Xiaodan Liang, Xiaohui Shen, Meng Wang, Liang Lin

    Abstract: Prior highly-tuned human parsing models tend to fit towards each dataset in a specific domain or with discrepant label granularity, and can hardly be adapted to other human parsing tasks without extensive re-training. In this paper, we aim to learn a single universal human parsing model that can tackle all kinds of human parsing needs by unifying label annotations from different domains or at vari… ▽ More

    Submitted 9 April, 2019; originally announced April 2019.

    Comments: Accepted to CVPR 2019. The Code is available at https://github.com/Gaoyiminggithub/Graphonomy

  45. arXiv:1901.10623  [pdf, other

    cs.CL

    End-to-End Knowledge-Routed Relational Dialogue System for Automatic Diagnosis

    Authors: Lin Xu, Qixian Zhou, Ke Gong, Xiaodan Liang, Jianheng Tang, Liang Lin

    Abstract: Beyond current conversational chatbots or task-oriented dialogue systems that have attracted increasing attention, we move forward to develop a dialogue system for automatic medical diagnosis that converses with patients to collect additional symptoms beyond their self-reports and automatically makes a diagnosis. Besides the challenges for conversational dialogue systems (e.g. topic transition coh… ▽ More

    Submitted 18 March, 2019; v1 submitted 29 January, 2019; originally announced January 2019.

    Comments: 8 pages, 5 figues, AAAI

  46. arXiv:1810.11610  [pdf, other

    cs.CV

    Soft-Gated Warping-GAN for Pose-Guided Person Image Synthesis

    Authors: Haoye Dong, Xiaodan Liang, Ke Gong, Hanjiang Lai, Jia Zhu, Jian Yin

    Abstract: Despite remarkable advances in image synthesis research, existing works often fail in manipulating images under the context of large geometric transformations. Synthesizing person images conditioned on arbitrary poses is one of the most representative examples where the generation quality largely relies on the capability of identifying and modeling arbitrary transformations on different body parts… ▽ More

    Submitted 11 January, 2019; v1 submitted 27 October, 2018; originally announced October 2018.

    Comments: 17 pages, 14 figures

  47. arXiv:1808.00661  [pdf, other

    cs.CV

    Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

    Authors: Qixian Zhou, Xiaodan Liang, Ke Gong, Liang Lin

    Abstract: Beyond the existing single-person and multiple-person human parsing tasks in static images, this paper makes the first attempt to investigate a more realistic video instance-level human parsing that simultaneously segments out each person instance and parses each instance into more fine-grained parts (e.g., head, leg, dress). We introduce a novel Adaptive Temporal Encoding Network (ATEN) that alte… ▽ More

    Submitted 10 August, 2018; v1 submitted 2 August, 2018; originally announced August 2018.

    Comments: To appear in ACM MM 2018. Code link: https://github.com/HCPLab-SYSU/ATEN. Dataset link: https://sysu-hcp.net/lip

  48. arXiv:1808.00157  [pdf, other

    cs.CV

    Instance-level Human Parsing via Part Grouping Network

    Authors: Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, Liang Lin

    Abstract: Instance-level human parsing towards real-world human analysis scenarios is still under-explored due to the absence of sufficient data resources and technical difficulty in parsing multiple instances in a single pass. Several related works all follow the "parsing-by-detection" pipeline that heavily relies on separately trained detection models to localize instances and then performs human parsing… ▽ More

    Submitted 31 July, 2018; originally announced August 2018.

    Comments: Accepted by ECCV 2018 (Oral)

  49. arXiv:1807.11042  [pdf, other

    cs.CV

    Towards Good Practices on Building Effective CNN Baseline Model for Person Re-identification

    Authors: Fu Xiong, Yang Xiao, Zhiguo Cao, Kaicheng Gong, Zhiwen Fang, Joey Tianyi Zhou

    Abstract: Person re-identification is indeed a challenging visual recognition task due to the critical issues of human pose variation, human body occlusion, camera view variation, etc. To address this, most of the state-of-the-art approaches are proposed based on deep convolutional neural network (CNN), being leveraged by its strong feature learning power and classification boundary fitting capacity. Althou… ▽ More

    Submitted 29 July, 2018; originally announced July 2018.

  50. arXiv:1807.01759  [pdf, other

    cs.CV cs.LG physics.med-ph

    Learning Personalized Representation for Inverse Problems in Medical Imaging Using Deep Neural Network

    Authors: Kuang Gong, Kyungsang Kim, Jianan Cui, Ning Guo, Ciprian Catana, Jinyi Qi, Quanzheng Li

    Abstract: Recently deep neural networks have been widely and successfully applied in computer vision tasks and attracted growing interests in medical imaging. One barrier for the application of deep neural networks to medical imaging is the need of large amounts of prior training pairs, which is not always feasible in clinical practice. In this work we propose a personalized representation learning framewor… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Comments: 11 pages, 7 figures