Skip to content

tldrafael/understanding-and-coding-detr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

This repo contains the code from the medium post: Understanding and Coding DETR.

Reproduce the notebook detr.ipynb on Colab .

Download my training checkpoints

coco-tiny 40k iterations ckpt loss-history
coco-complete 150 epochs ckpt loss-history

Introduction

AI is on fire these days, and transformer-based architectures are one of the key factors that leveraged the last AI success cases. In the computer vision field, DEtection TRansformer (DETR) from Carion et al. (2020) was one of the first works to demonstrate that transformers could also be used for vision tasks, reaching competitive results with other baseline works.

Another meaningful DETR contribution was the introduction of a new solution paradigm for the object detection task that handles it as a set prediction problem. The new paradigm introduced a new element called object queries and freed the object detection solutions of hand-designed mechanisms like anchors or non-maximum suppression (NMS). This paradigm was later applied to segmentation tasks and state-of-the-art models like Mask2Former (Cheng et al., 2022) and OneFormer (Jain et al., 2023).

This article will expose the main mechanisms of the DETR implementation, and, in the end, you'll be able to train the network on the COCO dataset without any external scripts but PyTorch methods.