Skip to content
View htian01's full-sized avatar

Block or report htian01

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

OS-ATLAS: A Foundation Action Model For Generalist GUI Agents

139 5 Updated Nov 9, 2024

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

Python 779 37 Updated Jun 2, 2024

Bridging Large Vision-Language Models and End-to-End Autonomous Driving

152 4 Updated Oct 31, 2024

Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models

136 6 Updated Nov 4, 2024

田柯宇 (Tian Keyu)恶意攻击集群事件的证据揭露

570 38 Updated Oct 20, 2024

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Python 50 2 Updated Sep 14, 2024

Tools for merging pretrained large language models.

Python 4,795 437 Updated Nov 5, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,967 462 Updated Oct 29, 2024

Collection of open datasets in computer vision.

32 10 Updated Jun 9, 2018

OCR, layout analysis, reading order, table recognition in 90+ languages

Python 13,945 864 Updated Nov 7, 2024

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

84 2 Updated Sep 12, 2024

Config files for my GitHub profile.

130 5 Updated Aug 7, 2024

✨✨Latest Advances on Multimodal Large Language Models

12,571 803 Updated Nov 10, 2024

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

Python 1,021 41 Updated May 31, 2024

[AAAI 2023] DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

56 2 Updated Nov 28, 2022

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

599 20 Updated Jun 5, 2023

PEFT for PyTorch 1.12

Python 2 Updated Jul 19, 2023

骆驼(Luotuo): Open Sourced Chinese Language Models. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技

Jupyter Notebook 3,638 247 Updated Sep 3, 2023

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,257 716 Updated Aug 5, 2024

starter from "How to Train a GAN?" at NIPS2016

11,454 1,664 Updated Jan 9, 2022

[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.

Python 3,350 545 Updated Aug 15, 2024

pclp教程

Python 49 12 Updated Oct 10, 2021

One Million Scenes for Autonomous Driving

Python 178 32 Updated Jul 8, 2022

PyTorch implementation of MAE https//arxiv.org/abs/2111.06377

Python 7,323 1,219 Updated Jul 23, 2024

Effective Python: Second Edition — Source Code and Errata for the Book

Python 2,246 719 Updated Oct 14, 2024

Python bindings to the pointcloud library (pcl)

Cython 2,013 700 Updated Dec 30, 2023

Point Cloud Library (PCL)

C++ 9,982 4,615 Updated Nov 9, 2024

Geometric Computer Vision Library for Spatial AI

Python 9,950 969 Updated Nov 10, 2024

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the came…

Jupyter Notebook 2,235 263 Updated Jul 20, 2022
Next