Skip to content
View wangfakang's full-sized avatar

Organizations

@envoyproxy

Block or report wangfakang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Currently, it's another gateway based on Istio/Envoy. (TODO: give it a better description)

Go 62 16 Updated Aug 28, 2024

MSCCL++: A GPU-driven communication stack for scalable AI applications

C++ 217 30 Updated Aug 27, 2024

oneAPI Collective Communications Library (oneCCL)

C++ 189 67 Updated Aug 22, 2024

DeepLearning Framework Performance Profiling Toolkit

Python 275 27 Updated Mar 28, 2022

mperf是一个面向移动/嵌入式平台的算子性能调优工具箱

C++ 168 26 Updated Aug 17, 2023

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 35,926 5,580 Updated Aug 19, 2024

A library to analyze PyTorch traces.

Python 262 36 Updated Aug 24, 2024

ROCm Communication Collectives Library (RCCL)

C++ 246 113 Updated Aug 28, 2024

Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators

C++ 81 26 Updated Aug 16, 2024
Python 75 35 Updated Dec 11, 2019

NCCL Profiling Kit

Python 100 9 Updated Jul 1, 2024

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)

C 1,099 416 Updated Aug 28, 2024

TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches

Python 50 7 Updated Jul 25, 2023

Open-source observability for your LLM application, based on OpenTelemetry

Python 1,715 138 Updated Aug 28, 2024

Collective communications library with various primitives for multi-machine training.

C++ 1,182 295 Updated Jun 26, 2024

awesome llm plaza: daily tracking all sorts of awesome topics of llm, e.g. llm for coding, robotics, reasoning, multimod etc.

115 8 Updated Aug 20, 2024

Unified Collective Communication Library

C 188 93 Updated Aug 22, 2024

A fast GPU memory copy library based on NVIDIA GPUDirect RDMA technology

C++ 841 143 Updated Jul 8, 2024

Ongoing research training transformer models at scale

Python 9,787 2,203 Updated Aug 28, 2024

A PyTorch Native LLM Training Framework

Python 559 27 Updated Aug 25, 2024

Blink+: Increase GPU group bandwidth by utilizing across tenant NVLink.

Jupyter Notebook 5 2 Updated Jun 22, 2022

选字验证码破解,试验过网易和极验,破解率99

C 648 239 Updated Dec 17, 2020

Tools for monitoring NVIDIA GPUs on Linux

C 1,012 302 Updated Nov 2, 2021

Infrastructure Programmer Development Kit (IPDK) is an open source, vendor agnostic framework of drivers and APIs for infrastructure offload and management that runs on a CPU, IPU, DPU or switch.

Shell 183 68 Updated Feb 2, 2024

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,130 2,222 Updated Aug 1, 2024

Grumpy is a Python to Go source code transcompiler and runtime.

Go 10,548 649 Updated Jan 18, 2022

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.

Python 27,844 3,335 Updated Aug 22, 2024

GPUDirect Async support for IB Verbs

C++ 88 13 Updated Nov 10, 2022

High-performance, GPU-aware communication library

C++ 84 21 Updated Jul 22, 2024
Next