Block or Report
Block or report DHUAVY
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
CLIPort: What and Where Pathways for Robotic Manipulation
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning
Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images
Latent Text-to-Image Diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Implementation of "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents"
Transformer: PyTorch Implementation of "Attention Is All You Need"
This repo is the official implementation of "MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control "
JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
One-step image-to-image with Stable Diffusion turbo: sketch2image, day2night, and more
Unofficial implementation of Palette: Image-to-Image Diffusion Models by Pytorch
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
[Arxiv 2024] From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation
🔊 Text-Prompted Generative Audio Model
ModelScope: bring the notion of Model-as-a-Service to life.
LlamaIndex is a data framework for your LLM applications
PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Get up and running with Llama 3, Mistral, Gemma 2, and other large language models.
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
DHUAVY / BIALBEF
Forked from salesforce/ALBEFCode for ALBEF: a new vision-language pre-training method
A simplified version for DMC (Deep Multimodal Clustering for Unsupervised Audiovisual Learning)