Stars
A code generator for array-based code on CPUs and GPUs
The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to…
CUSP : A C++ Templated Sparse Matrix Library
CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)
A C++ implementation of the QR decomposition algorithm.
undergraduate Project implementing a tiled QR decomposition on GPUs with cuda
A new QR decomposition algorithm implemented in CUDA
Implementation and analysis of five different GPU based SPMV algorithms in CUDA
Quantum chemistry and solid state physics software package
Implement asm gemm on vega64 for 4096x4096 fp32 matrix
14 basic topics for VEGA64 performance optmization
A simple high performance CUDA GEMM implementation.
Galois: C++ library for multi-core and multi-node parallelization
Automatically exported from code.google.com/p/cuda-shortest-path
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
机器人视觉 移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习 无人驾驶
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network
ECG classification using MIT-BIH data, a deep CNN learning implementation of Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,…
Third party assembler and GEMM library for NVIDIA Kepler GPU
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
Must-read papers on graph neural networks (GNN)
ncic-sugon / Spark-GATK
Forked from PAA-NCIC/Spark-GATKSpark-GATK is a genomics analysis framwork based on Apache Spark and ADAM.