I have tried train a FasterRCNN model from stratch using images frames from videos and manually annotated bounding box values.
Actual project is to train an object detection model which can detect humans in a video and results in all the bounding boxes with humans upon which I have calculated the distance between boxes. Bounding boxes closer than a threshold value are classified with different bounding box color and percentage of humans closer are dispalyed in the video. Also created Bird's eye view using which we can avoid overlapping.
Sucessfully implemented this project idea using restnet50 pretrained model.