Skip to content
View liyancas's full-sized avatar

Block or report liyancas

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

Jupyter Notebook 1,517 93 Updated Feb 16, 2024

Universal LLM Deployment Engine with ML Compilation

Python 18,610 1,506 Updated Sep 11, 2024

Practical GPU Sharing Without Memory Size Constraints

C 200 22 Updated Aug 13, 2024

深度学习经典、新论文逐段精读

26,151 2,384 Updated Aug 8, 2024

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 2,564 400 Updated Sep 11, 2024

Intel staging area for llvm.org contribution. Home for Intel LLVM-based projects.

LLVM 1,211 726 Updated Sep 11, 2024

Transformer related optimization, including BERT, GPT

C++ 5,768 884 Updated Mar 27, 2024

🐶 Kubernetes CLI To Manage Your Clusters In Style!

Go 26,443 1,655 Updated Sep 9, 2024

A Cloud Native Batch System (Project under CNCF)

Go 4,063 937 Updated Sep 11, 2024

MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.

C++ 4,231 700 Updated Jul 29, 2024

PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎)

C++ 6,921 1,605 Updated Aug 30, 2024

Squeezenet V1.1 on Cyclone V SoC-FPGA at 450ms/image, 20x faster than ARM A9 processor alone. A project for 2017 Innovate FPGA design contest.

Objective-C 90 39 Updated Jun 27, 2018

Bolt is a deep learning library with high performance and heterogeneous flexibility.

C++ 908 158 Updated Jul 30, 2024

Model analyzer in PyTorch

Python 1,457 144 Updated Mar 19, 2023

Code for our CVPR 2019 paper: Selective Kernel Networks; See zhihu:https://zhuanlan.zhihu.com/p/59690223

C++ 587 107 Updated Mar 26, 2019

State-of-the-art 2D and 3D Face Analysis Project

Python 22,788 5,343 Updated Aug 30, 2024

MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba

C++ 8,580 1,653 Updated Sep 3, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,325 900 Updated Sep 11, 2024

Tengine is a lite, high performance, modular inference engine for embedded device

C++ 4,611 998 Updated Dec 24, 2023

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

C++ 4,910 820 Updated Jun 17, 2024

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.

Python 26,229 5,451 Updated Nov 20, 2023

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn

C++ 1,160 308 Updated Sep 10, 2024

Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators

C 1,520 219 Updated Aug 28, 2019

Caffe for Sparse and Low-rank Deep Neural Networks

C++ 375 134 Updated Mar 8, 2020

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Python 3,870 811 Updated Dec 23, 2020

Caffe: a fast open framework for deep learning.

C++ 34,032 18,700 Updated Jul 31, 2024

Vector Math Library

C 74 27 Updated Dec 18, 2016