Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
-
Updated
Jun 27, 2024 - Python
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch
[ICLR 2024] Official implementation of " 🦙 Time-LLM: Time Series Forecasting by Reprogramming Large Language Models"
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Towards Generalist Biomedical AI
CVPR'24, Official Codebase of our Paper: "Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation".
Implementation of CVPR 2023 paper "Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering".
[CVPR'22 Best Paper Finalist] Official PyTorch implementation of the method presented in "Learning Multi-View Aggregation In the Wild for Large-Scale 3D Semantic Segmentation"
Code and Pretrained Models for ICLR 2023 Paper "Contrastive Audio-Visual Masked Autoencoder".
CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
This repository contains the code for a video captioning system inspired by Sequence to Sequence -- Video to Text. This system takes as input a video and generates a caption in English describing the video.
This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.
A Python package housing a collection of deep-learning multi-modal data fusion method pipelines! From data loading, to training, to evaluation - fusilli's got you covered 🌸
My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"
Seed, Code, Harvest: Grow Your Own App with Tree of Thoughts!
[CVPR 2022] Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
An intelligent multimodal-learning based system for video, product and ads analysis. Based on the system, people can build a lot of downstream applications such as product recommendation, video retrieval, etc.
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
Compose multimodal datasets 🎹
A Tool for extracting multimodal features from videos.
Add a description, image, and links to the multimodal-deep-learning topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-deep-learning topic, visit your repo's landing page and select "manage topics."