improve documentations

Signed-off-by: Yuxuan Liu <[email protected]>
Owen-Liuyuxuan · May 29, 2024 · 04a7629 · 04a7629
1 parent 5ac1226
commit 04a7629
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 11 deletions.
diff --git a/readme.md b/readme.md
@@ -9,16 +9,21 @@ You could checkout the ROS1 version of each inference package:
 - [Monodepth ROS1](https://github.com/Owen-Liuyuxuan/monodepth_ros)
 - [Mono3D ROS1](https://github.com/Owen-Liuyuxuan/visualDet3D_ros)
 
+**Update 1**: We have make a ROS1 version of this repo. Please use it with `git checkout ros1`.
+
+**Update 2**: We have added support for [Metric3D](https://github.com/YvanYin/Metric3D/tree/main), which is a state-of-the-art model for depth with scale. It can predict depth with reliable scale while generalize very well on various scenarios. We produce a reliable onnx version of the model, that could runs with TensorRT even on Jetson machines. We also extents its output to point clouds, facilitating robotic applications.
+
 In this repo, we fully re-structure the code and messages formats for ROS2 (humble), and integrate multi-thread inferencing for three vision tasks.
 
 - Currently all pretrained models are trained using the [visionfactory](https://github.com/Owen-Liuyuxuan/visionfactory) repo. Thus focusing on out-door autonomous driving scenarios. But it is ok to plugin ONNX models that satisfiy the [interface](#onnx-model-interface). Published models description:
 
-| Model | Type | Link | Description |
-| ------------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- |
-| monodepth_res101_384_1280.onnx | MonoDepth | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/monodepth_res101_384_1280.onnx) | FSNet, res101 backbone, model input shape (384x1280) trained on KITTI/KITTI360/nuscenes |
-| bisenetv1.onnx | Segmentation | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/bisenetv1.onnx) | BiSeNetV1, model input shape (512x768) trained on remapped KITTI360/ApolloScene/CityScapes/BDD100k/a2d2 |
-| mono3d_yolox_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/mono3d_yolox_576_768.onnx) | YoloX-m MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes|
-| dla34_deform_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0.1/dla34_deform_576_768.onnx) | DLA34 Deformable Upsample MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes|
+| Model | Type | Link | Description |
+| ------------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
+| monodepth_res101_384_1280.onnx | MonoDepth | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/monodepth_res101_384_1280.onnx) | FSNet, res101 backbone, model input shape (384x1280) trained on KITTI/KITTI360/nuscenes |
+| metric_3d .onnx | MonoDepth | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.1/metric_3d.onnx) | Metric3Dv2, ViT backbone, supervised depth contains full pipeline from depth image to point cloud. |
+| bisenetv1.onnx | Segmentation | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/bisenetv1.onnx) | BiSeNetV1, model input shape (512x768) trained on remapped KITTI360/ApolloScene/CityScapes/BDD100k/a2d2 |
+| mono3d_yolox_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0/mono3d_yolox_576_768.onnx) | YoloX-m MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes |
+| dla34_deform_576_768.onnx | Mono3D Detection | [link](https://github.com/Owen-Liuyuxuan/ros2_vision_inference/releases/download/v1.0.1/dla34_deform_576_768.onnx) | DLA34 Deformable Upsample MonoFlex, model input (576x768) trained on KITTI/nuscenes/ONCE/bdd100k/cityscapes |
 
 
 ## Getting Started
@@ -74,6 +79,9 @@ Segmentation ONNX `def forward(self, normalized_images[1, 3, H, W]):->long[1, H,
 
 Mono3D ONNX: `def forward(self, normalized_images[1, 3, H, W], P[1, 3, 4]):->scores[N], bboxes[N,12], cls_indexes[N]`
 
+Metric3D ONNX: `def forward(self, unnormalized_images[1, 3, 616, 1064], P[1, 3, 4], P_inv[1, 4, 4], T[1, 4, 4], mask[1, 616, 1064]):->float[1, 1, H, W], float[HW, 6], bool[HW, 6]`
+
+
 Classes definitions are from the [visionfactory](https://github.com/Owen-Liuyuxuan/visionfactory) repo.
 
 ## Data, Domain

diff --git a/ros2_vision_inference/ros2_vision_inference.py b/ros2_vision_inference/ros2_vision_inference.py
@@ -151,10 +151,10 @@ def run(self):
  # Perform inference
  outputs = self.ort_session.run(None, onnx_input)
  depth_image = outputs[0][0, 0] # [1, 1, H, W] -> [H, W]
- point_cloud = outputs[1] # pred_depth = outputs[0] # [1, H, W, 6]
- mask = outputs[2] # [1, H, W]
+ point_cloud = outputs[1] # [HW, 6]
+ mask = outputs[2] # [HW]
  print(point_cloud.shape, mask.shape)
- point_cloud = point_cloud[mask] # [H, W, 6]
+ point_cloud = point_cloud[mask] # [HW, 6]
  point_cloud = point_cloud.reshape([-1, 6])
 
  depth_image = depth_image[pad_info[0] : depth_image.shape[0] - pad_info[1], pad_info[2] : depth_image.shape[1] - pad_info[3]] # [H, W] -> [h, w]
@@ -195,11 +195,11 @@ def prepare_input(self, rgb_image: np.ndarray)->Tuple[torch.Tensor, List[int]]:
  mask[pad_info[0] : H - pad_info[1], pad_info[2] : W - pad_info[3]] = 1
 
  onnx_input = {
- 'image': np.ascontiguousarray(np.transpose(rgb, (2, 0, 1))[None], dtype=np.float32) , # 1, 3, h, w
+ 'image': np.ascontiguousarray(np.transpose(rgb, (2, 0, 1))[None], dtype=np.float32) , # 1, 3, H, W
  'P': P.astype(np.float32)[None], # 1, 3, 4
  'P_inv': P_inv.astype(np.float32)[None], # 1, 4, 4
  'T': T.astype(np.float32)[None], # 1, 4, 4
- 'mask' : mask.astype(np.bool)[None] # 1, h, w
+ 'mask' : mask.astype(np.bool)[None] # 1, H, W
  }
  return onnx_input, pad_info