Skip to content
View simonnxren's full-sized avatar

Block or report simonnxren

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A comprehensive survey on Internal Consistency and Self-Feedback in Large Language Models.

Jupyter Notebook 151 3 Updated Sep 19, 2024

Dynamic 2D Gaussians: Geometrically accurate radiance fields for dynamic objects

Python 62 2 Updated Sep 25, 2024

A system for agentic LLM-powered data processing and ETL

Python 797 78 Updated Oct 8, 2024

Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Python 5,056 407 Updated Oct 2, 2024

Implementation of paper 'Towards Coherent Image Inpainting Using Denoising Diffusion Implicit Models'

Python 60 6 Updated Apr 3, 2024

High accuracy RAG for answering questions from scientific documents with citations

Python 6,045 568 Updated Oct 7, 2024

GeoCalib: Learning Single-image Calibration with Geometric Optimization (ECCV 2024)

Python 384 13 Updated Oct 7, 2024

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

Python 2,762 254 Updated Sep 25, 2024

A project that optimizes OWL-ViT for real-time inference with NVIDIA TensorRT.

Python 236 43 Updated Jul 30, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 895 40 Updated Sep 30, 2024

Clipora is a powerful toolkit for fine-tuning OpenCLIP models using Low Rank Adapters (LoRA).

Python 15 Updated Aug 15, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,170 850 Updated Sep 13, 2024

A tool to modify ONNX models in a visualization fashion, based on Netron and Flask.

JavaScript 1,299 162 Updated Jun 29, 2024

A very simple news crawler with a funny name

Python 283 74 Updated Oct 8, 2024

Code for the paper "Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models"

Jupyter Notebook 857 78 Updated Jun 22, 2024

Code for *Eventfulness for Interactive Video Alignment*

Python 7 2 Updated Dec 13, 2023

Bring portraits to life!

Python 12,186 1,279 Updated Oct 7, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 5,345 551 Updated Sep 29, 2024

Accepted as [NeurIPS 2024] Spotlight Presentation Paper

Jupyter Notebook 5,842 584 Updated Sep 26, 2024

A tool to project equirectangular panorama into perspective images

Python 272 55 Updated Oct 20, 2021

code for "Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion"

Python 523 21 Updated Oct 6, 2024

A tool for compiling trained SKLearn models into other representations (such as SQL, Sympy or Excel formulas)

Python 171 10 Updated Nov 17, 2022

Local models support for Microsoft's graphrag using ollama (llama3, mistral, gemma2 phi3)- LLM & Embedding extraction

Python 662 95 Updated Sep 30, 2024

[NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of existing MLLMs to comprehend long multimodal documents.

Python 80 5 Updated Sep 26, 2024

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 3,449 283 Updated Aug 14, 2024

Convert ONNX models to PyTorch.

Python 595 69 Updated Aug 15, 2024

Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything

Python 970 63 Updated Sep 18, 2024

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Python 306 15 Updated Oct 8, 2024

Source code of paper "NVS-Solver: Video Diffusion Model as Zero-Shot Novel View Synthesizer"

Python 248 4 Updated Aug 15, 2024

[BMVC2021] The first image composition assessment dataset. Used in the paper "Image Composition Assessment with Saliency-augmented Multi-pattern Pooling". Useful for image composition assessment, i…

Python 111 13 Updated May 5, 2022
Next