Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GroundingDINO module needs to be built for every prediction request ? #333

Open
aganesh9 opened this issue May 28, 2024 · 2 comments
Open

Comments

@aganesh9
Copy link

aganesh9 commented May 28, 2024

Hi, I am following the example code here to setup the GroundingDINO inferencing in triton. I am trying to run this line just once in my deployment, since the model shouldn't have to be built for every single prediction request. However, the inference only works correctly if I run it for each request. If I build just once, the first request is successful, and the subsequent requests return an invalid bounding box.

What is happening inside the build_model method that makes the model unusable for the next request ? It is not taking any input specific parameters it looks like ?

Snippet I am using for reference:

     def __init__(self):
        self.cpu_only = False
        self.model_config_path = current_directory + '/GroundingDINO/groundingdino/config/GroundingDINO_SwinB_cfg.py' 
        self.model_checkpoint_path = current_directory + '/GroundingDINO/weights/groundingdino_swinb_cogcoor.pth'
        self.model = None
        self.device = 'cuda'

    def initialize(self, args):
        args = SLConfig.fromfile(self.model_config_path) 
        args.device = "cuda" if not self.cpu_only else "cpu"
        self.model = build_model(args)
        checkpoint = torch.load(self.model_checkpoint_path, map_location=self.device)
        load_res = self.model.load_state_dict(clean_state_dict(checkpoint["model"]), strict=False)
        _ = self.model.eval()
        self.model = self.model.to(self.device)

    def execute(self, requests):
       #Parsing request...
      # encoder texts
            tokenized = self.model.tokenizer(captions, padding="longest", return_tensors="pt").to(self.device)
            specical_tokens = self.model.tokenizer.convert_tokens_to_ids(["[CLS]", "[SEP]", ".", "?"])
            
            (
            text_self_attention_masks,
            position_ids,
            cate_to_token_mask_list,
            ) = generate_masks_with_special_tokens_and_transfer_map(
            tokenized, specical_tokens, self.model.tokenizer)

            if text_self_attention_masks.shape[1] > self.model.max_text_len:
                text_self_attention_masks = text_self_attention_masks[
                :, : self.model.max_text_len, : self.model.max_text_len]

            position_ids = position_ids[:, : self.model.max_text_len]
            tokenized["input_ids"] = tokenized["input_ids"][:, : self.model.max_text_len]
            tokenized["attention_mask"] = tokenized["attention_mask"][:, : self.model.max_text_len]
            tokenized["token_type_ids"] = tokenized["token_type_ids"][:, : self.model.max_text_len]
            
            with torch.no_grad():
                outputs = self.model(image[None], tokenized["input_ids"],tokenized["attention_mask"], 
                     position_ids,tokenized["token_type_ids"], text_self_attention_masks)
            #Continue processing...
@NielsRogge
Copy link

Hi,

See #321 for easy inference, supports batched inference

@zfigov
Copy link

zfigov commented Jun 20, 2024

I think that this maybe due to a bug - in groundingdino.py
image
The features of the image set the first time. The next time the features exist and therefore the new image isn't updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants