Skip to content

JackeyLa5/Language-Grasp-Anything

Repository files navigation

Language Grasp-Anything

Language Grasp-Anything is an open source project that combines speech recognition, target detection and robotic 6-dof grasping to generate feasible grasping poses of specific objects from cluttered scenes. It builds on the Whisper speech recognition model, the GroundingDINO detection model, and the Graspness grasping model.

🚀 A short video demonstration [Bilibili]

 Grasp.mp4 
Grasp_Git.mp4

🛠️Installation

The code requires python>=3.8, as well as pytorch>=1.7 and torchvision>=0.8. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Install Whisper:

pip install -U openai-whisper

See the whisper official page if you have other questions for the installation.

Install Grounding DINO:

python -m pip install -e GroundingDINO

Install Graspness:

For Graspness, please refer here.

🏃Getting Started

  • Download the checkpoint for Grounding Dino:

    cd Grounded-Segment-Anything
    
    wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
  • Download the checkpoint for Graspness:

    https://drive.google.com/file/d/1HXhO6z2XNAnGW4BiGHBy83cVa-d32AeV/view?usp=share_link
  • Run demo:

    export CUDA_VISIBLE_DEVICES=0
    python real_grasp.py --infer --vis

😮Acknowledgments

This project is based on the following repositories:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published