-
Tesla
- Sunnyvale
-
20:29
(UTC -07:00) - zhuangh.github.io
- https://linkedin.com/zhuangh
- @zhuangh
Stars
Reference implementation of the Transformer architecture optimized for Apple Neural Engine (ANE)
A playbook for systematically maximizing the performance of deep learning models.
Submanifold sparse convolutional networks
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Fast and memory-efficient exact attention
Roadmap to becoming an Artificial Intelligence Expert in 2022
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
List of Computer Science courses with video lectures.
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
搞定C++:punch:。C++ Primer 中文版第5版学习仓库,包括笔记和课后练习答案。
RapidStream TAPA compiles task-parallel HLS program into high-frequency FPGA accelerators.
by ex-googlers, for ex-googlers - a lookup table of similar tech & services
Development repository for the Triton language and compiler
程序员延寿指南 | A programmer's guide to live longer
Open deep learning compiler stack for cpu, gpu and specialized accelerators
FlexASR: A Reconfigurable Hardware Accelerator for Attention-based Seq-to-Seq Networks
Reinforcement learning environments for compiler and program optimization tasks
Brevitas: neural network quantization in PyTorch
Stencil with Optimized Dataflow Architecture Compiler
Training neural models with structured signals.
HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.