Skip to content

Commit

Permalink
improve documentations
Browse files Browse the repository at this point in the history
Signed-off-by: Yuxuan Liu <[email protected]>
  • Loading branch information
HinsRyu committed May 29, 2024
1 parent 5ac1226 commit 04a7629
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 11 deletions.
20 changes: 14 additions & 6 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,21 @@ You could checkout the ROS1 version of each inference package:
- [Monodepth ROS1](https://github.com/Owen-Liuyuxuan/monodepth_ros)
- [Mono3D ROS1](https://github.com/Owen-Liuyuxuan/visualDet3D_ros)

**Update 1**: We have make a ROS1 version of this repo. Please use it with `git checkout ros1`.

**Update 2**: We have added support for [Metric3D](https://github.com/YvanYin/Metric3D/tree/main), which is a state-of-the-art model for depth with scale. It can predict depth with reliable scale while generalize very well on various scenarios. We produce a reliable onnx version of the model, that could runs with TensorRT even on Jetson machines. We also extents its output to point clouds, facilitating robotic applications.

In this repo, we fully re-structure the code and messages formats for ROS2 (humble), and integrate multi-thread inferencing for three vision tasks.

- Currently all pretrained models are trained using the [visionfactory](https://github.com/Owen-Liuyuxuan/visionfactory) repo. Thus focusing on out-door autonomous driving scenarios. But it is ok to plugin ONNX models that satisfiy the [interface](#onnx-model-interface). Published models description:

| Model | Type | Link | Description |
| ------------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
| monodepth_res101_384_1280.onnx | MonoDepth | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/monodepth_res101_384_1280.onnx) | FSNet, res101 backbone, model input shape (384x1280) trained on KITTI/KITTI360/nuscenes |
| bisenetv1.onnx | Segmentation | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/bisenetv1.onnx) | BiSeNetV1, model input shape (512x768) trained on remapped KITTI360/ApolloScene/CityScapes/BDD100k/a2d2 |
| mono3d_yolox_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/mono3d_yolox_576_768.onnx) | YoloX-m MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes|
| dla34_deform_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0.1/dla34_deform_576_768.onnx) | DLA34 Deformable Upsample MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes|
| Model | Type | Link | Description |
| ------------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| monodepth_res101_384_1280.onnx | MonoDepth | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/monodepth_res101_384_1280.onnx) | FSNet, res101 backbone, model input shape (384x1280) trained on KITTI/KITTI360/nuscenes |
| metric_3d .onnx | MonoDepth | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.1/metric_3d.onnx) | Metric3Dv2, ViT backbone, supervised depth contains full pipeline from depth image to point cloud. |
| bisenetv1.onnx | Segmentation | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/bisenetv1.onnx) | BiSeNetV1, model input shape (512x768) trained on remapped KITTI360/ApolloScene/CityScapes/BDD100k/a2d2 |
| mono3d_yolox_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/mono3d_yolox_576_768.onnx) | YoloX-m MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes |
| dla34_deform_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0.1/dla34_deform_576_768.onnx) | DLA34 Deformable Upsample MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes |


## Getting Started
Expand Down Expand Up @@ -74,6 +79,9 @@ Segmentation ONNX `def forward(self, normalized_images[1, 3, H, W]):->long[1, H,

Mono3D ONNX: `def forward(self, normalized_images[1, 3, H, W], P[1, 3, 4]):->scores[N], bboxes[N,12], cls_indexes[N]`

Metric3D ONNX: `def forward(self, unnormalized_images[1, 3, 616, 1064], P[1, 3, 4], P_inv[1, 4, 4], T[1, 4, 4], mask[1, 616, 1064]):->float[1, 1, H, W], float[HW, 6], bool[HW, 6]`


Classes definitions are from the [visionfactory](https://github.com/Owen-Liuyuxuan/visionfactory) repo.

## Data, Domain
Expand Down
10 changes: 5 additions & 5 deletions ros2_vision_inference/ros2_vision_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,10 +151,10 @@ def run(self):
# Perform inference
outputs = self.ort_session.run(None, onnx_input)
depth_image = outputs[0][0, 0] # [1, 1, H, W] -> [H, W]
point_cloud = outputs[1] # pred_depth = outputs[0] # [1, H, W, 6]
mask = outputs[2] # [1, H, W]
point_cloud = outputs[1] # [HW, 6]
mask = outputs[2] # [HW]
print(point_cloud.shape, mask.shape)
point_cloud = point_cloud[mask] # [H, W, 6]
point_cloud = point_cloud[mask] # [HW, 6]
point_cloud = point_cloud.reshape([-1, 6])

depth_image = depth_image[pad_info[0] : depth_image.shape[0] - pad_info[1], pad_info[2] : depth_image.shape[1] - pad_info[3]] # [H, W] -> [h, w]
Expand Down Expand Up @@ -195,11 +195,11 @@ def prepare_input(self, rgb_image: np.ndarray)->Tuple[torch.Tensor, List[int]]:
mask[pad_info[0] : H - pad_info[1], pad_info[2] : W - pad_info[3]] = 1

onnx_input = {
'image': np.ascontiguousarray(np.transpose(rgb, (2, 0, 1))[None], dtype=np.float32) , # 1, 3, h, w
'image': np.ascontiguousarray(np.transpose(rgb, (2, 0, 1))[None], dtype=np.float32) , # 1, 3, H, W
'P': P.astype(np.float32)[None], # 1, 3, 4
'P_inv': P_inv.astype(np.float32)[None], # 1, 4, 4
'T': T.astype(np.float32)[None], # 1, 4, 4
'mask' : mask.astype(np.bool)[None] # 1, h, w
'mask' : mask.astype(np.bool)[None] # 1, H, W
}
return onnx_input, pad_info

Expand Down

0 comments on commit 04a7629

Please sign in to comment.