Skip to content
View icyxp's full-sized avatar
☸️
Focsuing Cloud Native
☸️
Focsuing Cloud Native

Block or report icyxp

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A tool to abuse Exchange services

Go 2,167 357 Updated Jun 10, 2024
Python 195 15 Updated Nov 7, 2024

Heterogeneous AI Computing Virtualization Middleware

Go 896 185 Updated Nov 5, 2024

eBPF distributed networking observability tool for Kubernetes

Go 2,725 207 Updated Nov 7, 2024

k8spacket - collects TCP traffic and TLS connection metadata in the Kubernetes cluster using eBPF and visualizes in Grafana

Go 1,013 52 Updated Nov 7, 2024

开源 Proxmox VE 网页后台添加处理器、NVMe、SSD 的温度和负载信息的脚本工具。

Shell 115 31 Updated Feb 27, 2024

Tools for merging pretrained large language models.

Python 4,787 434 Updated Nov 5, 2024
Python 156 19 Updated Oct 1, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 5,921 482 Updated Nov 7, 2024

[NeurIPS'24 Spotlight] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 whil…

Python 776 36 Updated Nov 2, 2024

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 347 35 Updated Aug 19, 2024

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 4,057 428 Updated Nov 7, 2024

This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?

Python 693 46 Updated Oct 24, 2024

PbootCMS是全新内核且永久开源免费的PHP企业网站开发建设管理系统,是一套高效、简洁、 强悍的可免费商用的PHP CMS源码,能够满足各类企业网站开发建设的需要。系统采用简单到想哭的模板标签,只要懂HTML就可快速开发企业网站。官方提供了大量网站模板免费下载和使用,将致力于为广大开发者和企业提供最佳的网站开发建设解决方案。

PHP 103 19 Updated Sep 24, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 611 47 Updated Sep 4, 2024

Distributed ML Training and Fine-Tuning on Kubernetes

Go 1,604 697 Updated Nov 4, 2024

A clean, elegant, beautiful and powerful admin template, based on Vue3, Vite5, TypeScript, Pinia, NaiveUI and UnoCSS. 一个清新优雅、高颜值且功能强大的后台管理模板,基于最新的前端技术栈,包括 Vue3, Vite5, TypeScript, Pinia, NaiveUI 和 …

TypeScript 10,149 1,824 Updated Nov 7, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,647 363 Updated Jul 11, 2024

OpenAIOS vGPU device plugin for Kubernetes is originated from the OpenAIOS project to virtualize GPU device memory, in order to allow applications to access larger memory space than its physical ca…

Go 515 93 Updated May 21, 2024

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Python 657 53 Updated Sep 10, 2024

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 2,179 143 Updated Nov 7, 2024

A blazing fast inference solution for text embeddings models

Rust 2,810 176 Updated Nov 5, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,789 193 Updated Nov 1, 2024

The Triton TensorRT-LLM Backend

Python 700 103 Updated Nov 5, 2024

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

C++ 236 24 Updated Mar 15, 2024

[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Python 4,604 252 Updated Aug 22, 2024

Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"

Python 252 24 Updated Sep 3, 2024

The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.

Python 711 100 Updated Oct 30, 2024
Next