Block or Report
Block or report kugwzk
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
A family of compressed models obtained via pruning and knowledge distillation
Open weights language model from Google DeepMind, based on Griffin.
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
Sparse Backpropagation for Mixture-of-Expert Training
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
The official evaluation suite and dynamic data release for MixEval.
[CVPR2024 Highlight] VBench - We Evaluate Video Generation
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
The homepage of OneBit model quantization framework.
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
Reaching LLaMA2 Performance with 0.1M Dollars
[ICML2024]Adaptive decoding balances the diversity and coherence of open-ended text generation.
VisualWebArena is a benchmark for multimodal agents.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
The model, data and code for the visual GUI Agent SeeClick
Open-Sora: Democratizing Efficient Video Production for All
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).