-
Fudan University
- Shanghai
-
20:46
(UTC +08:00)
Block or Report
Block or report Yangsx-1
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…
how to optimize some algorithm in cuda.
Video+code lecture on building nanoGPT from scratch
2021年最新整理, C++ 学习资料,含C++ 11 / 14 / 17 / 20 / 23 新特性、入门教程、推荐书籍、优质文章、学习笔记、教学视频等
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
《简单粗暴 LaTeX》出版图书开源仓库 | The opensource repo for my published LaTeX book.
模型部署白皮书(CUDA|ONNX|TensorRT|C++)🚀🚀🚀
12 Weeks, 24 Lessons, AI for All!
《C++ Templates The Complete Guide - second edition》的非专业个人翻译
High-speed Large Language Model Serving on PCs with Consumer-grade GPUs
A high-throughput and memory-efficient inference and serving engine for LLMs
面向开发者的 LLM 入门教程,吴恩达大模型系列课程中文版
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
Ongoing research training transformer models at scale
Making large AI models cheaper, faster and more accessible
A Cloud Native Batch System (Project under CNCF)
Header-only C++/python library for fast approximate nearest neighbors
简单粗暴 TensorFlow 2 | A Concise Handbook of TensorFlow 2 | 一本简明的 TensorFlow 2 入门指导教程
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.