-
Sigmind Limited
- Dhaka, Bangladesh
- https://sigmind.ai
- @sigmindAI
Lists (3)
Sort Name ascending (A-Z)
Starred repositories
real time face swap and one-click video deepfake with only a single image
C library to manage the GPIO header of the Nvidia Jetson boards
A C++ library that enables the use of Jetson's GPIOs
A simple co-pilot for Linux to interpret human language queries into useful Linux terminal commands and execute them
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback
This project has implemented the RAG function on Jetson with video formats.
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
🔥🔥High-Performance Face Recognition Library on PaddlePaddle & PyTorch🔥🔥
QualityScaler - image/video AI upscaler app
A reference example for integrating NanoOwl with Metropolis Microservices for Jetson
Instant voice cloning by MIT and MyShell.
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted fo…
🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.
Okkhor-Diffusion: Bangla Handwritten Character Generation using DDPM
Interact with your documents using the power of GPT, 100% privately, no data leaks
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Awesome Large Action Model (LAM): Models that could help gets things done.
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Linux using TensorRT-LLM
Foundational model for human-like, expressive TTS
PyTorch code and models for V-JEPA self-supervised learning from video.