Skip to content
View kf-zhang's full-sized avatar

Highlights

  • Pro
Block or Report

Block or report kf-zhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Python 1,256 133 Updated Jun 3, 2024

A PyTorch Native LLM Training Framework

Python 489 19 Updated May 31, 2024

The Modern C++ Challenge, published by Packt

C 304 104 Updated Jan 30, 2023
C++ 245 36 Updated Sep 15, 2023

a lightweight LLM model inference framework

C++ 659 79 Updated Apr 7, 2024

MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架

C++ 4,740 538 Updated Jul 6, 2024

A communication library for deep learning

C++ 49 9 Updated Oct 30, 2023

What would you do with 1000 H100s...

Jupyter Notebook 789 47 Updated Jan 10, 2024

The full minitorch student suite.

Python 1,456 281 Updated Mar 1, 2024

LeetCode题解,151道题完整版。

TeX 11,233 3,433 Updated Jul 10, 2024

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Python 29,820 6,306 Updated Jul 4, 2024

Provides symbolic API for model creation in PyTorch.

Jupyter Notebook 60 3 Updated Apr 27, 2023

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

Python 22,023 9,494 Updated Jul 4, 2024

This repository houses VERY simple example code for each STL algorithm and explains what each does.

C++ 113 39 Updated Dec 13, 2018

📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/

C++ 23,505 2,959 Updated Jun 1, 2024

Open source FPGA-based NIC and platform for in-network compute

Verilog 1,555 388 Updated Jul 5, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,632 259 Updated Jul 11, 2024
C++ 57 17 Updated Feb 29, 2024

Prints values and types during compilation!

Python 54 Updated Nov 7, 2022

Fast C++ logging library.

C++ 23,041 4,373 Updated Jul 9, 2024

Lightning fast C++/CUDA neural network framework

C++ 3,549 438 Updated Jul 10, 2024

MLX: An array framework for Apple silicon

C++ 15,755 898 Updated Jul 11, 2024

A toolkit for 3D computer vision tasks.

Python 161 13 Updated Jul 7, 2024

ring-attention experiments

Python 75 10 Updated Apr 10, 2024

Transformers with Arbitrarily Large Context

Python 572 43 Updated Jul 8, 2024

📚 技术面试必备基础知识、Leetcode、计算机操作系统、计算机网络、系统设计

173,647 50,774 Updated Jul 5, 2024

The road to hack SysML and become an system expert

Emacs Lisp 377 46 Updated Jul 1, 2024

tiny ring attention implement for learning purpose

Python 4 1 Updated Feb 14, 2024

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 93 10 Updated Jun 18, 2024
Next