Stars
Implementing a ChatGPT-like LLM in PyTorch from scratch, step by step
Hackable and optimized Transformers building blocks, supporting a composable construction.
A tiny yet powerful LLM inference system tailored for researching purpose. vLLM-equivalent performance with only 2k lines of code (2% of vLLM).
中文nlp解决方案(大模型、数据、模型、训练、推理)
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Material for cuda-mode lectures
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
本项目是一个面向小白开发者的大模型应用开发教程,在线阅读地址:https://datawhalechina.github.io/llm-universe/
Projects for an undergraduate OS course
A collection of gdb tips. 100 maybe just mean many here.
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Fast and memory-efficient exact attention
Mirror of the Restoration of 1st Edition UNIX kernel sources from pdf document.
C++高性能分布式服务器框架,webserver,websocket server,自定义tcp_server(包含日志模块,配置模块,线程模块,协程模块,协程调度模块,io协程调度模块,hook模块,socket模块,bytearray序列化,http模块,TcpServer模块,Websocket模块,Https模块等, Smtp邮件模块, MySQL, SQLite3, ORM,Red…
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Collaborative Collection of C++ Best Practices. This online resource is part of Jason Turner's collection of C++ Best Practices resources. See README.md for more information.
《Hello 算法》:动画图解、一键运行的数据结构与算法教程。支持 Python, Java, C++, C, C#, JS, Go, Swift, Rust, Ruby, Kotlin, TS, Dart 代码。简体版和繁体版同步更新,English version ongoing
A minimal, responsive, and feature-rich Jekyll theme for technical writing.
🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.