support mobile_sam in label_anything (#132)

* support mobile_sam in label_anything * update readme * update readme * remove some unused files * modify readme.md * update mobile_sam en doc
open-mmlab · Jul 30, 2023 · 5846726 · 5846726
1 parent 7c9564a
commit 5846726
Show file tree

Hide file tree

Showing 22 changed files with 3,533 additions and 85 deletions.
diff --git a/label_anything/docker/Dockerfile b/label_anything/docker/Dockerfile
@@ -0,0 +1,6 @@
+ARG PYTORCH="2.0.0"
+ARG CUDA="11.7"
+ARG CUDNN="8"
+
+FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel
+
diff --git a/label_anything/readme.md b/label_anything/readme.md
@@ -12,7 +12,6 @@ This article introduces a semi-automatic annotation solution combining Label-Stu
  <img src="https://user-images.githubusercontent.com/25839884/233969712-0d9d6f0a-70b0-4b3e-b054-13eda037fb20.gif" width="80%">
 </div>
 
-
 <br>
 
 - SAM (Segment Anything) is a segmentation model launched by Meta AI, designed to segment everything.
@@ -78,12 +77,16 @@ wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
 # For better segmentation results, use the sam_vit_h_4b8939.pth weights
 # wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
 # wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+
+# download mobile_sam pretrained model
+wget https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/weights/mobile_sam.pt
+# or manually download mobile_sam.pt in https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/, and put it into path/to/playground/label_anything
+
 ```
 
 PS: If you are using a having trouble with the wget/curl commands, please manually download the target file (copy the URL to a browser or download tool). The same applies to the following instructions.
 For example: https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
 
-
 Install Label-Studio and label-studio-ml-backend
 
 ```shell
@@ -95,21 +98,36 @@ pip install label-studio-ml==1.0.9
 ```
 
 ## Start the service
+
 ⚠label_anything requires the SAM backend to be enabled and then the web service to be started before the model can be loaded. (a total of two steps are required to start)
 
 1.Start the SAM backend inference service:
 
 ```shell
 cd path/to/playground/label_anything
 
+# inference on sam
 label-studio-ml start sam --port 8003 --with \
+ model_name=mobile_sam \
  sam_config=vit_b \
  sam_checkpoint_file=./sam_vit_b_01ec64.pth \
  out_mask=True \
  out_bbox=True \
  device=cuda:0
 # device=cuda:0 is for using GPU inference. If you want to use CPU inference, replace cuda:0 with cpu.
 # out_poly=True returns the annotation of the bounding polygon.
+
+# inference on mobile_sam
+label-studio-ml start sam --port 8003 --with \
+model_name=mobile_sam \
+sam_config=vit_t \
+sam_checkpoint_file=./mobile_sam.pt \
+out_mask=True \
+out_bbox=True \
+device=cpu 
+# device=cuda:0 is for using GPU inference. If you want to use CPU inference, replace cuda:0 with cpu.
+# out_poly=True returns the annotation of the bounding polygon.
+
 ```
 
 PS: In Windows environment, entering the following in Anaconda Powershell Prompt is equivalent to the input above:
@@ -131,12 +149,13 @@ sam_checkpoint_file=$env:sam_checkpoint_file `
 out_mask=$env:out_mask `
 out_bbox=$env:out_bbox `
 device=$env:device
+
+# mobile_sam on windows have not been tested, if you are interesteed in it, please modify the shell script like upper script.
 ```
 
 ![image](https://user-images.githubusercontent.com/25839884/233821553-0030945a-8d83-4416-8edd-373ae9203a63.png)
 
-
-At this point, the SAM backend inference service has started. 
+At this point, the SAM backend inference service has started.
 
 ⚠The above terminal window needs to be kept open.
 
@@ -162,6 +181,7 @@ set ML_TIMEOUT_SETUP=40
 ```
 
 Start Label-Studio web service:
+
 ```shell
 label-studio start
 ```
@@ -196,18 +216,18 @@ wget https://download.openmmlab.com/mmyolo/data/cat_dataset.zip && unzip cat_dat
 
 ![](https://cdn.vansin.top/picgo20230330133715.png)
 
-
 2.Use images stored on the server：
 
-
 realized through 'Cloud Storages'
 
 ① Set environment variables before launch the SAM backend:
+
 ```
 export LOCAL_FILES_DOCUMENT_ROOT=path/to/playground/label_anything
 ```
 
 ② Set environment variables before launch the label studio backend to allow label studio to use local files：
+
 ```
 export LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
 
@@ -253,7 +273,8 @@ Configure Label-Studio keypoint, Mask, and other annotations in Settings/Labelin
  </BrushLabels>
 </View>
 ```
-In the above XML, we have configured the annotations, where KeyPointLabels are for keypoint annotations, BrushLabels are for Mask annotations, PolygonLabels are for bounding polygon annotations, and RectangleLabels are for rectangle annotations. 
+
+In the above XML, we have configured the annotations, where KeyPointLabels are for keypoint annotations, BrushLabels are for Mask annotations, PolygonLabels are for bounding polygon annotations, and RectangleLabels are for rectangle annotations.
 
 This example uses two categories, cat and person. If community users want to add more categories, they need to add the corresponding categories in KeyPointLabels, BrushLabels, PolygonLabels, and RectangleLabels respectively.
 
@@ -283,12 +304,10 @@ To use this feature, enable the Auto-Annotation toggle and it is recommended to
 
 ![image](https://user-images.githubusercontent.com/25839884/233833200-a44c9c5f-66a8-491a-b268-ecfb6acd5284.png)
 
-
 Point2Label: As can be seen from the following gif animation, by simply clicking a point on the object, the SAM algorithm is able to segment and detect the entire object.
 
 ![SAM8](https://user-images.githubusercontent.com/25839884/233835410-29896554-963a-42c3-a523-3b1226de59b6.gif)
 
-
 Bbox2Label: As can be seen from the following gif animation, by simply annotating a bounding box, the SAM algorithm is able to segment and detect the entire object.
 
 ![SAM10](https://user-images.githubusercontent.com/25839884/233969712-0d9d6f0a-70b0-4b3e-b054-13eda037fb20.gif)
@@ -305,7 +324,6 @@ You can use VS Code to open the extracted folder and see the annotated dataset,
 
 ![](https://cdn.vansin.top/picgo20230330140321.png)
 
-
 ### Label Studio Output Conversion to RLE Format Masks
 
 Since the coco exported by label studio does not support rle instance labeling, it only supports polygon instances.
@@ -325,10 +343,10 @@ python tools/convert_to_rle_mask_coco.py --json_file_path path/to/LS_json --out_
 
 --out_dir Output path
 
-
 After generation the script outputs a list in the terminal that corresponds to the category ids and can be used to copy and fill the config for training.
 
 Under the output path, there are two folders: annotations and images, annotations is the coco format json, and images is the sorted dataset.
+
 ```
 Your dataset
 ├── annotations
@@ -357,7 +375,6 @@ cd mmdetection; pip install -e .; cd ..
 
 Then use this script to output the config for training on demand, where the template `mask-rcnn_r50_fpn` is provided in `label_anything/config_template`.
 
-
 ```shell
 #Install Jinja2
 pip install Jinja2
@@ -401,7 +418,6 @@ The following is the result of the transformation using the transformed dataset
 
 After the previous step a config is generated that can be used for mmdetection training, the path is ``data/my_set/config_name.py`` which we can use for training.
 
-
 ```shell
 python tools/train.py data/my_set/mask-rcnn_r50_fpn.py
 ```
@@ -413,6 +429,7 @@ After training, you can use ``tools/test.py`` for testing.
 ```shell
 python tools/test.py data/my_set/mask-rcnn_r50_fpn.py path/of/your/checkpoint --show --show-dir my_show
 ```
+
 The visualization image will be saved in `work_dir/{timestamp}/my_show`
 
 When finished, we can get the model test visualization. On the left is the annotation image, and on the right is the model output.

diff --git a/label_anything/readme_zh.md b/label_anything/readme_zh.md
@@ -16,7 +16,6 @@
 
 <br>
 
-
 - SAM (Segment Anything) 是 Meta AI 推出的分割一切的模型。
 - [Label Studio](https://github.com/heartexlabs/label-studio) 是一款优秀的标注软件，覆盖图像分类、目标检测、分割等领域数据集标注的功能。
 
@@ -65,19 +64,24 @@ pip install torch==1.10.1 torchvision==0.11.2 torchaudio==0.10.1
 
 ```
 
-安装 SAM 并下载预训练模型
+安装 SAM 并下载预训练模型（目前支持)
 
 ```shell
 cd path/to/playground/label_anything
 # 在 Windows 中，进行下一步之前需要完成以下命令行
 # conda install pycocotools -c conda-forge 
-pip install opencv-python pycocotools matplotlib onnxruntime onnx
+pip install opencv-python pycocotools matplotlib onnxruntime onnx timm
 pip install git+https://github.com/facebookresearch/segment-anything.git
-wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
 
+# 下载sam预训练模型
+wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
 # 如果想要分割的效果好请使用 sam_vit_h_4b8939.pth 权重
 # wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_l_0b3195.pth
 # wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
+
+# 下载mobile_sam 预训练模型
+wget https://raw.githubusercontent.com/ChaoningZhang/MobileSAM/master/weights/mobile_sam.pt
+# 如果下载失败请手动下载https://github.com/ChaoningZhang/MobileSAM/blob/master/weights/ 目录下的mobile_sam.pt,将其放置到path/to/playground/label_anything目录下
 ```
 
 PS: 如果您使用 Windows 环境，请忽略 wget 命令，手动下载 wget 的目标文件（复制 url 到浏览器或下载工具中）
@@ -97,25 +101,40 @@ pip install label-studio-ml==1.0.9
 
 ⚠label_anything 需要启用 SAM 后端推理后再启动网页服务才可配置模型（一共需要两步启动）
 
-1.启动 SAM 后端推理服务：
+1.启动后端推理服务：
+
+目前label_anything支持sam和mobile_sam两种推理模型, 用户可以根据自身需求自行选择，注意模型和上一步下载的权重需要对应。mobile_sam相较于sam具有更快的推理速度和更低的显存占用，分割效果仅有轻微下滑，建议cpu推理采用mobile_sam。
 
 ```shell
 cd path/to/playground/label_anything
 
+# 采用SAM进行后端推理
 label-studio-ml start sam --port 8003 --with \
+model_name=sam \
 sam_config=vit_b \
 sam_checkpoint_file=./sam_vit_b_01ec64.pth \
 out_mask=True \
 out_bbox=True \
-device=cuda:0 \
+device=cuda:0
 # device=cuda:0 为使用 GPU 推理，如果使用 cpu 推理，将 cuda:0 替换为 cpu
 # out_poly=True 返回外接多边形的标注
 
+# 采用mobile_sam进行后端推理
+label-studio-ml start sam --port 8003 --with \
+model_name=mobile_sam \
+sam_config=vit_t \
+sam_checkpoint_file=./mobile_sam.pt \
+out_mask=True \
+out_bbox=True \
+device=cpu 
+# device=cuda:0 为使用 GPU 推理，如果使用 cpu 推理，将 cuda:0 替换为 cpu
+# out_poly=True 返回外接多边形的标注
 ```
 
 PS: 在 Windows 环境中，在 Anaconda Powershell Prompt 输入以下内容等价于上方的输入:
 
 ```shell
+
 cd path/to/playground/label_anything
 
 $env:sam_config = "vit_b"
@@ -136,7 +155,6 @@ device=$env:device
 
 ![image](https://user-images.githubusercontent.com/25839884/233821553-0030945a-8d83-4416-8edd-373ae9203a63.png)
 
-
 此时，SAM 后端推理服务已经启动。
 
 ⚠以上的终端窗口需要保持打开状态。
@@ -200,17 +218,18 @@ wget https://download.openmmlab.com/mmyolo/data/cat_dataset.zip && unzip cat_dat
 
 ![](https://cdn.vansin.top/picgo20230330133715.png)
 
-
 2.直接使用服务器上的图片数据：
 
 通过 Cloud Storages 的方式实现。
 
 ① 在启动 SAM 后端之前，需要设置环境变量：
+
 ```
 export LOCAL_FILES_DOCUMENT_ROOT=path/to/playground/label_anything
 ```
 
 ② 在启动 label studio 之前，需要设置环境变量：
+
 ```
 export LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true
 
@@ -232,6 +251,7 @@ export LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=path/to/playground/label_anything
 ### 配置 XML
 
 ---
+
 在 `Settings/Labeling Interface` 中配置 Label-Studio 关键点和 Mask 标注。
 
 ```xml
@@ -255,6 +275,7 @@ export LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=path/to/playground/label_anything
  </BrushLabels>
 </View>
 ```
+
 在上述 XML 中我们对标注进行了配置，其中 `KeyPointLabels` 为关键点标注，`BrushLabels` 为 Mask 标注，`PolygonLabels` 为外接多边形标注，`RectangleLabels` 为矩形标注。
 
 本实例使用 `cat` 和 `person` 两个类别，如果社区用户想增加更多的类别需要分别在 `KeyPointLabels`、`BrushLabels`、`PolygonLabels`、`RectangleLabels` 中添加对应的类别。
@@ -283,17 +304,16 @@ export LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=path/to/playground/label_anything
 
 需要打开 `Auto-Annotation` 的开关，并建议勾选 `Auto accept annotation suggestions`,并点击右侧 Smart 工具，切换到 Point 后，选择下方需要标注的物体标签，这里选择 cat。如果是 BBox 作为提示词请将 Smart 工具切换到 Rectangle。
 
-
 ![image](https://user-images.githubusercontent.com/25839884/233833200-a44c9c5f-66a8-491a-b268-ecfb6acd5284.png)
 
 Point2Label：由下面的 gif 的动图可以看出，只需要在物体上点一个点，SAM 算法就能将整个物体分割和检测出来。
 
 ![SAM8](https://user-images.githubusercontent.com/25839884/233835410-29896554-963a-42c3-a523-3b1226de59b6.gif)
 
-
 Bbox2Label: 由下面的 gif 的动图可以看出，只需要标注一个边界框，SAM 算法就能将整个物体分割和检测出来。
 
 ![SAM10](https://user-images.githubusercontent.com/25839884/233969712-0d9d6f0a-70b0-4b3e-b054-13eda037fb20.gif)
+
 ## COCO 格式数据集导出
 
 ### Label Studio 网页端导出
@@ -307,7 +327,6 @@ Bbox2Label: 由下面的 gif 的动图可以看出，只需要标注一个边界
 
 ![](https://cdn.vansin.top/picgo20230330140321.png)
 
-
 ### Label Studio 输出转换为RLE格式掩码
 
 由于 label studio 导出来的 coco 不支持 rle 的实例标注，只支持 polygon 的实例。
@@ -322,14 +341,15 @@ polygon 实例格式由于不太好控制点数，太多不方便微调（不像
 cd path/to/playground/label_anything
 python tools/convert_to_rle_mask_coco.py --json_file_path path/to/LS_json --out_dir path/to/output/file
 ```
+
 --json_file_path 输入 Label studio 的输出 json
 
 --out_dir 输出路径
 
-
 生成后脚本会在终端输出一个列表，这个列表是对应类别id的，可用于复制填写 config 用于训练。
 
 输出路径下有 annotations 和 images 两个文件夹，annotations 里是 coco 格式的 json， images 是整理好的数据集。
+
 ```
 Your dataset
 ├── annotations
@@ -383,7 +403,6 @@ playground
 ├── ...
 ```
 
-
 接着我们使用 `tools/analysis_tools/browse_dataset.py` 对数据集进行可视化。
 
 ```shell
@@ -402,7 +421,6 @@ python tools/analysis_tools/browse_dataset.py data/my_set/mask-rcnn_r50_fpn.py -
 
 经过上一步生成了可用于 mmdetection 训练的 config，路径为 `data/my_set/config_name.py` 我们可以用于训练。
 
-
 ```shell
 python tools/train.py data/my_set/mask-rcnn_r50_fpn.py
 ```
@@ -414,6 +432,7 @@ python tools/train.py data/my_set/mask-rcnn_r50_fpn.py
 ```shell
 python tools/test.py data/my_set/mask-rcnn_r50_fpn.py path/of/your/checkpoint --show --show-dir my_show
 ```
+
 可视化图片将会保存在 `work_dir/{timestamp}/my_show`
 
 完成后我们可以获得模型测试可视化图。左边是标注图片，右边是模型输出。
@@ -452,6 +471,3 @@ device=cuda:0 \
 效果展示如下图：
 
 ![图片](https://github.com/JimmyMa99/playground/assets/101508488/c134e579-2f1b-41ed-a82b-8211f8df8b94)
-
-
-