Skip to content
View nicolay-r's full-sized avatar
🤗
🤗
Block or Report

Block or report nicolay-r

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Multimodal LLM

19 repositories

OmniFusion — a multimodal model to communicate using text and images

Python 216 21 Updated Apr 28, 2024

Radiology Objects in COntext (ROCO): A Multimodal Image Dataset

Python 160 17 Updated Apr 5, 2022
Python 156 50 Updated Jan 14, 2024

The official start-up code for paper "FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark."

Python 43 2 Updated Nov 15, 2022

HI-ML toolbox for deep learning for medical imaging and Azure integration

Python 244 55 Updated Jul 3, 2024

The official code for "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data".

Python 304 31 Updated Apr 3, 2024

GLoRIA: A Multimodal Global-Local Representation Learning Framework forLabel-efficient Medical Image Recognition

Python 155 27 Updated Feb 6, 2023
Python 3 1 Updated Jun 14, 2024

A collection of resources on applications of multi-modal learning in medical imaging.

382 39 Updated Jul 2, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,403 697 Updated Jul 2, 2024
Python 5 1 Updated Jan 27, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 23,453 3,105 Updated Jun 4, 2024

Official source code for the paper: "Reading Between the Frames Multi-Modal Non-Verbal Depression Detection in Videos"

Python 27 2 Updated May 16, 2024

Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Python 258 42 Updated Sep 17, 2022

Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retrieval (Lerner et al., ECIR'24)

Python 23 2 Updated Jan 16, 2024

tiny vision language model

Jupyter Notebook 4,463 392 Updated Jul 3, 2024

Fine Tuning Multimodal LLM "Idefics 9B" on Pokemon Go Dataset available on Hugging Face.

Jupyter Notebook 14 7 Updated Jan 15, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 63 5 Updated Jun 13, 2024