Skip to main content

Showing 1–50 of 116 results for author: Matas, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.13632  [pdf, other

    cs.CV

    FungiTastic: A multi-modal dataset and benchmark for image categorization

    Authors: Lukas Picek, Klara Janouskova, Milan Sulc, Jiri Matas

    Abstract: We introduce a new, highly challenging benchmark and a dataset -- FungiTastic -- based on data continuously collected over a twenty-year span. The dataset originates in fungal records labeled and curated by experts. It consists of about 350k multi-modal observations that include more than 650k photographs from 5k fine-grained categories and diverse accompanying information, e.g., acquisition metad… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  2. arXiv:2408.12934  [pdf, other

    cs.CV

    WildFusion: Individual Animal Identification with Calibrated Similarity Fusion

    Authors: Vojtěch Cermak, Lukas Picek, Lukáš Adam, Lukáš Neumann, Jiří Matas

    Abstract: We propose a new method - WildFusion - for individual identification of a broad range of animal species. The method fuses deep scores (e.g., MegaDescriptor or DINOv2) and local matching similarity (e.g., LoFTR and LightGlue) to identify individual animals. The global and local information fusion is facilitated by similarity score calibration. In a zero-shot setting, relying on local similarity sco… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  3. arXiv:2408.12930  [pdf, other

    cs.CV

    Animal Identification with Independent Foreground and Background Modeling

    Authors: Lukas Picek, Lukas Neumann, Jiri Matas

    Abstract: We propose a method that robustly exploits background and foreground in visual identification of individual animals. Experiments show that their automatic separation, made easy with methods like Segment Anything, together with independent foreground and background-related modeling, improves results. The two predictions are combined in a principled way, thanks to novel Per-Instance Temperature Scal… ▽ More

    Submitted 23 August, 2024; originally announced August 2024.

  4. arXiv:2408.06899  [pdf, other

    cs.CV

    EEPPR: Event-based Estimation of Periodic Phenomena Rate using Correlation in 3D

    Authors: Jakub Kolář, Radim Špetlík, Jiří Matas

    Abstract: We present a novel method for measuring the period of phenomena like rotation, flicker and vibration, by an event camera, a device asynchronously reporting brightness changes at independently operating pixels with high temporal resolution. The approach assumes that for a periodic phenomenon, a highly similar set of events is generated within a spatio-temporal window at a time difference correspond… ▽ More

    Submitted 19 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 13 paper pages + 11 suppl pages, 15 figues, 3 tables

    ACM Class: I.4.8

  5. arXiv:2407.15707  [pdf, other

    cs.CV cs.AI eess.IV

    Predicting the Best of N Visual Trackers

    Authors: Basit Alawode, Sajid Javed, Arif Mahmood, Jiri Matas

    Abstract: We observe that the performance of SOTA visual trackers surprisingly strongly varies across different video attributes and datasets. No single tracker remains the best performer across all tracking attributes and datasets. To bridge this gap, for a given video sequence, we predict the "Best of the N Trackers", called the BofN meta-tracker. At its core, a Tracking Performance Prediction Network (TP… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  6. arXiv:2406.16204  [pdf, other

    cs.CV

    Breaking the Frame: Image Retrieval by Visual Overlap Prediction

    Authors: Tong Wei, Philipp Lindenberger, Jiri Matas, Daniel Barath

    Abstract: We propose a novel visual place recognition approach, VOP, that efficiently addresses occlusions and complex scenes by shifting from traditional reliance on global image similarities and local features to image overlap prediction. The proposed method enables the identification of visible image sections without requiring expensive feature detection and matching. By focusing on obtaining patch-level… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  7. arXiv:2405.19882  [pdf, other

    cs.CV

    PixOOD: Pixel-Level Out-of-Distribution Detection

    Authors: Tomáš Vojíř, Jan Šochman, Jiří Matas

    Abstract: We propose a dense image prediction out-of-distribution detection algorithm, called PixOOD, which does not require training on samples of anomalous data and is not designed for a specific application which avoids traditional training biases. In order to model the complex intra-class variability of the in-distribution data at the pixel level, we propose an online data condensation algorithm which i… ▽ More

    Submitted 5 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: under review

  8. arXiv:2403.09799  [pdf, other

    cs.CV cs.RO

    BOP Challenge 2023 on Detection, Segmentation and Pose Estimation of Seen and Unseen Rigid Objects

    Authors: Tomas Hodan, Martin Sundermeyer, Yann Labbe, Van Nguyen Nguyen, Gu Wang, Eric Brachmann, Bertram Drost, Vincent Lepetit, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2023, the fifth in a series of public competitions organized to capture the state of the art in model-based 6D object pose estimation from an RGB/RGB-D image and related tasks. Besides the three tasks from 2022 (model-based 2D detection, 2D segmentation, and 6D localization of objects seen during training), the 2023 c… ▽ More

    Submitted 16 April, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.13075

  9. arXiv:2402.14958  [pdf, other

    cs.CV

    EE3P: Event-based Estimation of Periodic Phenomena Properties

    Authors: Jakub Kolář, Radim Špetlík, Jiří Matas

    Abstract: We introduce a novel method for measuring properties of periodic phenomena with an event camera, a device asynchronously reporting brightness changes at independently operating pixels. The approach assumes that for fast periodic phenomena, in any spatial window where it occurs, a very similar set of events is generated at the time difference corresponding to the frequency of the motion. To estimat… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: 9 pages, 55 figures, accepted and presented at CVWW24, published in Proceedings of the 27th Computer Vision Winter Workshop, 2024

    ACM Class: I.4.8

    Journal ref: Proceedings of the 27th Computer Vision Winter Workshop, February 14-16, 2024, Terme Olimia, Slovenia, pages 66-74, CIP data: COBISS.SI-ID 185271043 ISBN 978-961-96564-0-2

  10. arXiv:2402.11287  [pdf, other

    cs.CV

    Dense Matchers for Dense Tracking

    Authors: Tomáš Jelínek, Jonáš Šerých, Jiří Matas

    Abstract: Optical flow is a useful input for various applications, including 3D reconstruction, pose estimation, tracking, and structure-from-motion. Despite its utility, the field of dense long-term tracking, especially over wide baselines, has not been extensively explored. This paper extends the concept of combining multiple optical flows over logarithmically spaced intervals as proposed by MFT. We demon… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 27th Computer Vision Winter Workshop. Ljubljana: Slovenian Pattern Recognition Society, 2024. p. 18-28

  11. arXiv:2401.03872  [pdf, other

    cs.CV

    A New Dataset and a Distractor-Aware Architecture for Transparent Object Tracking

    Authors: Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

    Abstract: Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, developme… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: Under the review. arXiv admin note: substantial text overlap with arXiv:2210.03436

  12. arXiv:2309.14052  [pdf, other

    cs.CV

    Single Image Test-Time Adaptation for Segmentation

    Authors: Klara Janouskova, Tamir Shor, Chaim Baskin, Jiri Matas

    Abstract: Test-Time Adaptation (TTA) methods improve the robustness of deep neural networks to domain shift on a variety of tasks such as image classification or segmentation. This work explores adapting segmentation models to a single unlabelled image with no other data available at test-time. In particular, this work focuses on adaptation by optimizing self-supervised losses at test-time. Multiple baselin… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

    Comments: TMLR accepted paper

  13. arXiv:2308.15816  [pdf, other

    cs.CV

    Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement

    Authors: Basit Alawode, Fayaz Ali Dharejo, Mehnaz Ummar, Yuhang Guo, Arif Mahmood, Naoufel Werghi, Fahad Shahbaz Khan, Jiri Matas, Sajid Javed

    Abstract: This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT). Despite its significance, underwater tracking has remained unexplored due to data inaccessibility. It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from s… ▽ More

    Submitted 31 August, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  14. Improving 2D Human Pose Estimation in Rare Camera Views with Synthetic Data

    Authors: Miroslav Purkrabek, Jiri Matas

    Abstract: Methods and datasets for human pose estimation focus predominantly on side- and front-view scenarios. We overcome the limitation by leveraging synthetic data and introduce RePoGen (RarE POses GENerator), an SMPL-based method for generating synthetic humans with comprehensive control over pose and view. Experiments on top-view datasets and a new dataset of real images with diverse poses show that a… ▽ More

    Submitted 20 April, 2024; v1 submitted 13 July, 2023; originally announced July 2023.

    Comments: https://mirapurkrabek.github.io/RePoGen-paper/

  15. arXiv:2305.12998  [pdf, other

    cs.CV

    MFT: Long-Term Tracking of Every Pixel

    Authors: Michal Neoral, Jonáš Šerých, Jiří Matas

    Abstract: We propose MFT -- Multi-Flow dense Tracker -- a novel method for dense, pixel-level, long-term tracking. The approach exploits optical flows estimated not only between consecutive frames, but also for pairs of frames at logarithmically spaced intervals. It selects the most reliable sequence of flows on the basis of estimates of its geometric accuracy and the probability of occlusion, both provided… ▽ More

    Submitted 10 November, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: accepted to WACV 2024. Code at https://github.com/serycjon/MFT

  16. arXiv:2304.06419  [pdf, other

    cs.CV cs.GR

    Tracking by 3D Model Estimation of Unknown Objects in Videos

    Authors: Denys Rozumnyi, Jiri Matas, Marc Pollefeys, Vittorio Ferrari, Martin R. Oswald

    Abstract: Most model-free visual object tracking methods formulate the tracking task as object location estimation given by a 2D segmentation or a bounding box in each video frame. We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation, namely the textured 3D shape and 6DoF pose in each video frame. Our representation tackles… ▽ More

    Submitted 13 April, 2023; originally announced April 2023.

  17. arXiv:2303.13148  [pdf, other

    cs.CV

    Calibrated Out-of-Distribution Detection with a Generic Representation

    Authors: Tomas Vojir, Jan Sochman, Rahaf Aljundi, Jiri Matas

    Abstract: Out-of-distribution detection is a common issue in deploying vision models in practice and solving it is an essential building block in safety critical applications. Most of the existing OOD detection solutions focus on improving the OOD robustness of a classification model trained exclusively on in-distribution (ID) data. In this work, we take a different approach and propose to leverage generic… ▽ More

    Submitted 5 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: 10 pages, accepted to Workshop on Uncertainty Quantification for Computer Vision, ICCV 2023

  18. arXiv:2303.10247  [pdf, other

    cs.CV

    Video shutter angle estimation using optical flow and linear blur

    Authors: David Korcak, Jiri Matas

    Abstract: We present a method for estimating the shutter angle, a.k.a. exposure fraction - the ratio of the exposure time and the reciprocal of frame rate - of videoclips containing motion. The approach exploits the relation of the exposure fraction, optical flow, and linear motion blur. Robustness is achieved by selecting image patches where both the optical flow and blur estimates are reliable, checking t… ▽ More

    Submitted 17 April, 2024; v1 submitted 17 March, 2023; originally announced March 2023.

    Journal ref: Proceedings of the 27th Computer Vision Winter Workshop, 2024, 57-65

  19. arXiv:2303.04700  [pdf, other

    cs.RO

    Efficient Visuo-Haptic Object Shape Completion for Robot Manipulation

    Authors: Lukas Rustler, Jiri Matas, Matej Hoffmann

    Abstract: For robot manipulation, a complete and accurate object shape is desirable. Here, we present a method that combines visual and haptic reconstruction in a closed-loop pipeline. From an initial viewpoint, the object shape is reconstructed using an implicit surface deep neural network. The location with highest uncertainty is selected for haptic exploration, the object is touched, the new information… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

  20. arXiv:2302.13075  [pdf, other

    cs.CV

    BOP Challenge 2022 on Detection, Segmentation and Pose Estimation of Specific Rigid Objects

    Authors: Martin Sundermeyer, Tomas Hodan, Yann Labbe, Gu Wang, Eric Brachmann, Bertram Drost, Carsten Rother, Jiri Matas

    Abstract: We present the evaluation methodology, datasets and results of the BOP Challenge 2022, the fourth in a series of public competitions organized with the goal to capture the status quo in the field of 6D object pose estimation from an RGB/RGB-D image. In 2022, we witnessed another significant improvement in the pose estimation accuracy -- the state of the art, which was 56.9 AR$_C$ in 2019 (Vidal et… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

    Comments: arXiv admin note: text overlap with arXiv:2009.07378

  21. arXiv:2302.09997  [pdf, other

    cs.CV

    A Large Scale Homography Benchmark

    Authors: Daniel Barath, Dmytro Mishkin, Michal Polic, Wolfgang Förstner, Jiri Matas

    Abstract: We present a large-scale dataset of Planes in 3D, Pi3D, of roughly 1000 planes observed in 10 000 images from the 1DSfM dataset, and HEB, a large-scale homography estimation benchmark leveraging Pi3D. The applications of the Pi3D dataset are diverse, e.g. training or evaluating monocular depth, surface normal estimation and image matching algorithms. The HEB dataset consists of 226 260 homographie… ▽ More

    Submitted 20 February, 2023; originally announced February 2023.

  22. arXiv:2302.05658  [pdf, other

    cs.CL cs.AI cs.LG

    DocILE Benchmark for Document Information Localization and Extraction

    Authors: Štěpán Šimsa, Milan Šulc, Michal Uřičář, Yash Patel, Ahmed Hamdi, Matěj Kocián, Matyáš Skalický, Jiří Matas, Antoine Doucet, Mickaël Coustaty, Dimosthenis Karatzas

    Abstract: This paper introduces the DocILE benchmark with the largest dataset of business documents for the tasks of Key Information Localization and Extraction and Line Item Recognition. It contains 6.7k annotated business documents, 100k synthetically generated documents, and nearly~1M unlabeled documents for unsupervised pre-training. The dataset has been built with knowledge of domain- and task-specific… ▽ More

    Submitted 3 May, 2023; v1 submitted 11 February, 2023; originally announced February 2023.

    Comments: Accepted to ICDAR 2023

  23. arXiv:2301.10057  [pdf, other

    cs.CV

    Planar Object Tracking via Weighted Optical Flow

    Authors: Jonas Serych, Jiri Matas

    Abstract: We propose WOFT -- a novel method for planar object tracking that estimates a full 8 degrees-of-freedom pose, i.e. the homography w.r.t. a reference view. The method uses a novel module that leverages dense optical flow and assigns a weight to each optical flow correspondence, estimating a homography by weighted least squares in a fully differentiable manner. The trained module assigns zero weight… ▽ More

    Submitted 24 January, 2023; originally announced January 2023.

    Comments: WACV 2023

  24. arXiv:2212.13185  [pdf, other

    cs.CV

    Generalized Differentiable RANSAC

    Authors: Tong Wei, Yash Patel, Alexander Shekhovtsov, Jiri Matas, Daniel Barath

    Abstract: We propose $\nabla$-RANSAC, a generalized differentiable RANSAC that allows learning the entire randomized robust estimation pipeline. The proposed approach enables the use of relaxation techniques for estimating the gradients in the sampling distribution, which are then propagated through a differentiable solver. The trainable quality function marginalizes over the scores from all the models esti… ▽ More

    Submitted 8 September, 2023; v1 submitted 26 December, 2022; originally announced December 2022.

  25. arXiv:2210.03436  [pdf, other

    cs.CV

    Trans2k: Unlocking the Power of Deep Models for Transparent Object Tracking

    Authors: Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

    Abstract: Visual object tracking has focused predominantly on opaque objects, while transparent object tracking received very little attention. Motivated by the uniqueness of transparent objects in that their appearance is directly affected by the background, the first dedicated evaluation dataset has emerged recently. We contribute to this effort by proposing the first transparent object tracking training… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to BMVC 2022. Project page: https://github.com/trojerz/Trans2k

  26. arXiv:2208.04717  [pdf, other

    cs.CV cs.GR

    Cascaded and Generalizable Neural Radiance Fields for Fast View Synthesis

    Authors: Phong Nguyen-Ha, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

    Abstract: We present CG-NeRF, a cascade and generalizable neural radiance fields method for view synthesis. Recent generalizing view synthesis methods can render high-quality novel views using a set of nearby input views. However, the rendering speed is still slow due to the nature of uniformly-point sampling of neural radiance fields. Existing scene-specific methods can train and render novel views efficie… ▽ More

    Submitted 19 November, 2023; v1 submitted 9 August, 2022; originally announced August 2022.

    Comments: Accepted at IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  27. arXiv:2207.14660  [pdf, other

    cs.CV

    Matching with AffNet based rectifications

    Authors: Václav Vávra, Dmytro Mishkin, Jiří Matas

    Abstract: We consider the problem of two-view matching under significant viewpoint changes with view synthesis. We propose two novel methods, minimizing the view synthesis overhead. The first one, named DenseAffNet, uses dense affine shapes estimates from AffNet, which allows it to partition the image, rectifying each partition with just a single affine map. The second one, named DepthAffNet, combines infor… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: 13 pages, 9 figures

  28. Human keypoint detection for close proximity human-robot interaction

    Authors: Jan Docekal, Jakub Rozlivek, Jiri Matas, Matej Hoffmann

    Abstract: We study the performance of state-of-the-art human keypoint detectors in the context of close proximity human-robot interaction. The detection in this scenario is specific in that only a subset of body parts such as hands and torso are in the field of view. In particular, (i) we survey existing datasets with human pose annotation from the perspective of close proximity images and prepare and make… ▽ More

    Submitted 9 February, 2023; v1 submitted 15 July, 2022; originally announced July 2022.

    Comments: 8 pages 8 figures

    ACM Class: I.2.9; I.4.9; I.2.10

    Journal ref: IEEE-RAS International Conference on Humanoid Robots (Humanoids 2022)

  29. arXiv:2204.03688  [pdf, other

    cs.CV cs.AI

    DAD-3DHeads: A Large-scale Dense, Accurate and Diverse Dataset for 3D Head Alignment from a Single Image

    Authors: Tetiana Martyniuk, Orest Kupyn, Yana Kurlyak, Igor Krashenyi, Jiři Matas, Viktoriia Sharmanska

    Abstract: We present DAD-3DHeads, a dense and diverse large-scale dataset, and a robust model for 3D Dense Head Alignment in the wild. It contains annotations of over 3.5K landmarks that accurately represent 3D head shape compared to the ground-truth scans. The data-driven model, DAD-3DNet, trained on our dataset, learns shape, expression, and pose parameters, and performs 3D reconstruction of a FLAME mesh.… ▽ More

    Submitted 11 April, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

  30. arXiv:2203.01994  [pdf, other

    cs.CV

    Fast Neural Architecture Search for Lightweight Dense Prediction Networks

    Authors: Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

    Abstract: We present LDP, a lightweight dense prediction neural architecture search (NAS) framework. Starting from a pre-defined generic backbone, LDP applies the novel Assisted Tabu Search for efficient architecture exploration. LDP is fast and suitable for various dense estimation problems, unlike previous NAS methods that are either computational demanding or deployed only for a single subtask. The perfo… ▽ More

    Submitted 9 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: 15 pages, 11 figures, 8 tables. arXiv admin note: substantial text overlap with arXiv:2108.11105

  31. arXiv:2112.11846  [pdf, other

    cs.CV

    A Discriminative Single-Shot Segmentation Network for Visual Object Tracking

    Authors: Alan Lukežič, Jiří Matas, Matej Kristan

    Abstract: Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker -- D3S2, which narrows the gap between visual object tracking and video object segmentation. A si… ▽ More

    Submitted 27 December, 2021; v1 submitted 22 December, 2021; originally announced December 2021.

    Comments: Extended version of the D3S tracker (CVPR2020). Accepted to IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1911.08862

  32. arXiv:2112.07957  [pdf, other

    cs.CV

    FEAR: Fast, Efficient, Accurate and Robust Visual Tracker

    Authors: Vasyl Borsuk, Roman Vei, Orest Kupyn, Tetiana Martyniuk, Igor Krashenyi, Jiři Matas

    Abstract: We present FEAR, a family of fast, efficient, accurate, and robust Siamese visual trackers. We present a novel and efficient way to benefit from dual-template representation for object model adaption, which incorporates temporal information with only a single learnable parameter. We further improve the tracker architecture with a pixel-wise fusion block. By plugging-in sophisticated backbones with… ▽ More

    Submitted 19 July, 2022; v1 submitted 15 December, 2021; originally announced December 2021.

  33. arXiv:2112.02838  [pdf, other

    cs.CV

    Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook

    Authors: Sajid Javed, Martin Danelljan, Fahad Shahbaz Khan, Muhammad Haris Khan, Michael Felsberg, Jiri Matas

    Abstract: Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating t… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Tracking Survey

  34. arXiv:2111.14093  [pdf, other

    cs.CV

    Adaptive Reordering Sampler with Neurally Guided MAGSAC

    Authors: Tong Wei, Jiri Matas, Daniel Barath

    Abstract: We propose a new sampler for robust estimators that always selects the sample with the highest probability of consisting only of inliers. After every unsuccessful iteration, the inlier probabilities are updated in a principled way via a Bayesian approach. The probabilities obtained by the deep network are used as prior (so-called neural guidance) inside the sampler. Moreover, we introduce a new lo… ▽ More

    Submitted 8 September, 2023; v1 submitted 28 November, 2021; originally announced November 2021.

  35. arXiv:2111.11280  [pdf, other

    cs.CV

    Point Cloud Color Constancy

    Authors: Xiaoyan Xing, Yanlin Qian, Sibo Feng, Yuhan Dong, Jiri Matas

    Abstract: In this paper, we present Point Cloud Color Constancy, in short PCCC, an illumination chromaticity estimation algorithm exploiting a point cloud. We leverage the depth information captured by the time-of-flight (ToF) sensor mounted rigidly with the RGB sensor, and form a 6D cloud where each point contains the coordinates and RGB intensities, noted as (x,y,z,r,g,b). PCCC applies the PointNet archit… ▽ More

    Submitted 28 July, 2024; v1 submitted 22 November, 2021; originally announced November 2021.

    Comments: CVPR 2022

  36. arXiv:2109.02763  [pdf, other

    cs.SD cs.CV eess.AS

    Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

    Authors: Dengxin Dai, Arun Balajee Vasudevan, Jiri Matas, Luc Van Gool

    Abstract: Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene understanding purely based on binaural sounds. The considered tasks include predicting the semantic masks of sound-making objects, the motion of sound-making objects, a… ▽ More

    Submitted 27 February, 2022; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted by TPAMI. arXiv admin note: substantial text overlap with arXiv:2003.04210

  37. arXiv:2108.11179  [pdf, other

    cs.CV

    Recall@k Surrogate Loss with Large Batches and Similarity Mixup

    Authors: Yash Patel, Giorgos Tolias, Jiri Matas

    Abstract: This work focuses on learning deep visual representation models for retrieval by exploring the interplay between a new loss function, the batch size, and a new regularization approach. Direct optimization, by gradient descent, of an evaluation metric, is not possible when it is non-differentiable, which is the case for recall in retrieval. A differentiable surrogate loss for the recall is proposed… ▽ More

    Submitted 25 March, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: CVPR 2022 camera-ready version

  38. arXiv:2108.11105  [pdf, other

    cs.CV

    Lightweight Monocular Depth with a Novel Neural Architecture Search Method

    Authors: Lam Huynh, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila

    Abstract: This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks are computationally highly demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration. Moreover, we construct the search space o… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 11 pages, 10 figures

  39. arXiv:2108.11098  [pdf, other

    cs.CV

    Monocular Depth Estimation Primed by Salient Point Detection and Normalized Hessian Loss

    Authors: Lam Huynh, Matteo Pedone, Phong Nguyen, Jiri Matas, Esa Rahtu, Janne Heikkila

    Abstract: Deep neural networks have recently thrived on single image depth estimation. That being said, current developments on this topic highlight an apparent compromise between accuracy and network size. This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. Specifically, we utilize a sparse set of… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 11 pages, 7 figures

  40. arXiv:2106.11695  [pdf, other

    cs.CV cs.LG

    The Hitchhiker's Guide to Prior-Shift Adaptation

    Authors: Tomas Sipka, Milan Sulc, Jiri Matas

    Abstract: In many computer vision classification tasks, class priors at test time often differ from priors on the training set. In the case of such prior shift, classifiers must be adapted correspondingly to maintain close to optimal performance. This paper analyzes methods for adaptation of probabilistic classifiers to new priors and for estimating new priors on an unlabeled test set. We propose a novel me… ▽ More

    Submitted 3 December, 2021; v1 submitted 22 June, 2021; originally announced June 2021.

    Comments: WACV 2022 16 pages, 7 figures

  41. arXiv:2106.10240  [pdf, other

    cs.CV

    VSAC: Efficient and Accurate Estimator for H and F

    Authors: Maksym Ivashechkin, Daniel Barath, Jiri Matas

    Abstract: We present VSAC, a RANSAC-type robust estimator with a number of novelties. It benefits from the introduction of the concept of independent inliers that improves significantly the efficacy of the dominant plane handling and, also, allows near error-free rejection of incorrect models, without false positives. The local optimization process and its application is improved so that it is run on averag… ▽ More

    Submitted 13 September, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  42. arXiv:2104.05044  [pdf, other

    cs.CV

    USACv20: robust essential, fundamental and homography matrix estimation

    Authors: Maksym Ivashechkin, Daniel Barath, Jiri Matas

    Abstract: We review the most recent RANSAC-like hypothesize-and-verify robust estimators. The best performing ones are combined to create a state-of-the-art version of the Universal Sample Consensus (USAC) algorithm. A recent objective is to implement a modular and optimized framework, making future RANSAC modules easy to be included. The proposed method, USACv20, is tested on eight publicly available real-… ▽ More

    Submitted 11 April, 2021; originally announced April 2021.

    Comments: arXiv admin note: text overlap with arXiv:1912.05909

  43. arXiv:2103.13875  [pdf, other

    cs.CV

    Finding Geometric Models by Clustering in the Consensus Space

    Authors: Daniel Barath, Denys Rozumny, Ivan Eichhardt, Levente Hajder, Jiri Matas

    Abstract: We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies. The problem is formalized as finding dominant model instances progressively without forming crisp point-to-model assignments. Dominant instances are found via a RANSAC-like sampling and a consolidation process driven by a model quality function considering previously proposed instances. New ones are f… ▽ More

    Submitted 17 April, 2023; v1 submitted 25 March, 2021; originally announced March 2021.

  44. Danish Fungi 2020 -- Not Just Another Image Recognition Dataset

    Authors: Lukáš Picek, Milan Šulc, Jiří Matas, Jacob Heilmann-Clausen, Thomas S. Jeppesen, Thomas Læssøe, Tobias Frøslev

    Abstract: We introduce a novel fine-grained dataset and benchmark, the Danish Fungi 2020 (DF20). The dataset, constructed from observations submitted to the Atlas of Danish Fungi, is unique in its taxonomy-accurate class labels, small number of errors, highly unbalanced long-tailed class distribution, rich observation metadata, and well-defined class hierarchy. DF20 has zero overlap with ImageNet, allowing… ▽ More

    Submitted 20 August, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

  45. arXiv:2103.04635  [pdf, other

    cs.CV

    FEDS -- Filtered Edit Distance Surrogate

    Authors: Yash Patel, Jiri Matas

    Abstract: This paper proposes a procedure to train a scene text recognition model using a robust learned surrogate of edit distance. The proposed method borrows from self-paced learning and filters out the training examples that are hard for the surrogate. The filtering is performed by judging the quality of the approximation, using a ramp function, enabling end-to-end training. Following the literature, th… ▽ More

    Submitted 26 May, 2021; v1 submitted 8 March, 2021; originally announced March 2021.

    Comments: ICDAR 2021 camera-ready version

  46. arXiv:2012.10296  [pdf, other

    cs.CV

    Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion

    Authors: Lam Huynh, Phong Nguyen-Ha, Jiri Matas, Esa Rahtu, Janne Heikkila

    Abstract: In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance. Unlike existing depth completion methods, our approach performs well on extremely sparse and unevenly distributed point clouds, which makes it agnostic to the source of the 3D points. We achieve this by introducing a novel multi-scale 3D point fusion network that is both lightweight and efficient.… ▽ More

    Submitted 25 August, 2021; v1 submitted 18 December, 2020; originally announced December 2020.

    Comments: 10 pages, 9 figures

  47. FMODetect: Robust Detection of Fast Moving Objects

    Authors: Denys Rozumnyi, Jiri Matas, Filip Sroubek, Marc Pollefeys, Martin R. Oswald

    Abstract: We propose the first learning-based approach for fast moving objects detection. Such objects are highly blurred and move over large distances within one video frame. Fast moving objects are associated with a deblurring and matting problem, also called deblatting. We show that the separation of deblatting into consecutive matting and deblurring allows achieving real-time performance, i.e. an order… ▽ More

    Submitted 17 August, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: Accepted to International Conference on Computer Vision (ICCV) 2021

    Journal ref: 2021 IEEE/CVF International Conference on Computer Vision (ICCV)

  48. DeFMO: Deblurring and Shape Recovery of Fast Moving Objects

    Authors: Denys Rozumnyi, Martin R. Oswald, Vittorio Ferrari, Jiri Matas, Marc Pollefeys

    Abstract: Objects moving at high speed appear significantly blurred when captured with cameras. The blurry appearance is especially ambiguous when the object has complex shape or texture. In such cases, classical methods, or even humans, are unable to recover the object's appearance and motion. We propose a method that, given a single image with its estimated background, outputs the object's appearance and… ▽ More

    Submitted 30 March, 2021; v1 submitted 1 December, 2020; originally announced December 2020.

    Comments: CVPR 2021 camera-ready

    Journal ref: 2021 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  49. arXiv:2011.14398  [pdf, other

    cs.CV cs.GR

    RGBD-Net: Predicting color and depth images for novel views synthesis

    Authors: Phong Nguyen-Ha, Animesh Karnewar, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila

    Abstract: We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images.… ▽ More

    Submitted 9 July, 2021; v1 submitted 29 November, 2020; originally announced November 2020.

    Comments: 19 pages, 15 figures. Code will be available at: https://github.com/phongnhhn92/RGBDNet

  50. arXiv:2011.11986  [pdf, other

    cs.CV

    Efficient Initial Pose-graph Generation for Global SfM

    Authors: Daniel Barath, Dmytro Mishkin, Ivan Eichhardt, Ilia Shipachev, Jiri Matas

    Abstract: We propose ways to speed up the initial pose-graph generation for global Structure-from-Motion algorithms. To avoid forming tentative point correspondences by FLANN and geometric verification by RANSAC, which are the most time-consuming steps of the pose-graph creation, we propose two new methods - built on the fact that image pairs usually are matched consecutively. Thus, candidate relative poses… ▽ More

    Submitted 26 November, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: Added supplementary material