Lists (6)
Sort Name ascending (A-Z)
Stars
A throughput-oriented high-performance serving framework for LLMs
LLMs interview notes and answers:该仓库主要记录大模型(LLMs)算法工程师相关的面试题和参考答案
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
使用 cutlass 实现 flash-attention 精简版,具有教学意义
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Ring attention implementation with flash attention
A simple try on knowledge distillation.
可在浏览器和微信小程序中使用的人脸识别算法. This is a WASM implementation of the Retinaface face detection algorithm.
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
A Detailed Cplusplus Concurrency Tutorial 《C++ 并发编程指南》
中文的C++ Template的教学指南。与知名书籍C++ Templates不同,该系列教程将C++ Templates作为一门图灵完备的语言来讲授,以求帮助读者对Meta-Programming融会贯通。(正在施工中)
Simple samples for TensorRT programming
face detection face recognition包含人脸检测(retinaface,yolov5face,yolov7face,yolov8face),人脸检测跟踪(ByteTracker),人脸角度计算(Face_Angle)人脸矫正(Face_Aligner),人脸识别(Arcface),口罩检测(MaskRecognitiion),年龄性别检测(Gender_age),静…
Code and information for face image quality assessment with SER-FIQ
Solutions and Notes for Labs of Computer Systems: A Programmer's Perspective 3rd Editon // 《深入理解计算机系统》第三版的实验文件、解答与笔记
亲测可用的 VPN。亲测有效的科学上网,同时支持 windows、mac、linux、ios 和 andrioid 系统。并提供 chrome、firefox、opera 等浏览器的插件使用。
Examples from Programming in Parallel with CUDA