Lists (2)
Sort Name ascending (A-Z)
Stars
flazerain / AI-Scientist
Forked from SakanaAI/AI-ScientistThe AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
NeurIPS23 "Flow Factorized Representation Learning"
Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
Building open version of OpenAI o1 via reasoning traces (Groq, ollama, Anthropic, Gemini, OpenAI, Azure supported) Demo: https://huggingface.co/spaces/pseudotensor/open-strawberry
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
The First Multimodal Seach Engine Pipeline and Benchmark for LMMs
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Empowering RAG with a memory-based data interface for all-purpose applications!
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
LLM based autonomous agent that conducts in-depth web research on any given topic
Kolmogorov-Arnold Transformer: A PyTorch Implementation with CUDA kernel
StoryMaker: Towards consistent characters in text-to-image generation
Writing AI Conference Papers: A Handbook for Beginners
[EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner
Tracking any thing based on text prompt
A benchmark for cross-domain few-shot object detection (ECCV24 paper: Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector)
[ICML 2024] "MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts"
OpenMMLab's next-generation platform for general 3D object detection.
State-of-the-Art zero-shot voice conversion & singing voice conversion with in context learning
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need (IJCV 2024)
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.