Skip to content
View writerblack's full-sized avatar

Block or report writerblack

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A code generator for array-based code on CPUs and GPUs

Python 586 72 Updated Nov 2, 2024

The Radeon Compute Profiler (RCP) is a performance analysis tool that gathers data from the API run-time and GPU for OpenCL™ and ROCm/HSA applications. This information can be used by developers to…

C++ 85 19 Updated Jun 16, 2020

CUSP : A C++ Templated Sparse Matrix Library

C++ 402 133 Updated Nov 4, 2024

CSR-based SpMV on Heterogeneous Processors (Intel Broadwell, AMD Kaveri and nVidia Tegra K1)

C++ 26 6 Updated May 12, 2015

A C++ implementation of the QR decomposition algorithm.

C++ 12 7 Updated Oct 14, 2017

GPU implementation of QR decomposition

Cuda 3 1 Updated Aug 27, 2020

undergraduate Project implementing a tiled QR decomposition on GPUs with cuda

C++ 3 2 Updated Jul 24, 2013

A new QR decomposition algorithm implemented in CUDA

Cuda 15 3 Updated Jun 24, 2024

Implementation and analysis of five different GPU based SPMV algorithms in CUDA

Cuda 35 12 Updated Feb 5, 2019

Quantum chemistry and solid state physics software package

Fortran 849 388 Updated Nov 3, 2024

Implement asm gemm on vega64 for 4096x4096 fp32 matrix

C++ 20 7 Updated Oct 12, 2019

14 basic topics for VEGA64 performance optmization

C++ 50 23 Updated Mar 18, 2021

A simple high performance CUDA GEMM implementation.

Cuda 333 36 Updated Jan 4, 2024

Galois: C++ library for multi-core and multi-node parallelization

C++ 314 133 Updated May 16, 2024

Automatically exported from code.google.com/p/cuda-shortest-path

TeX 8 6 Updated Apr 9, 2015

Optimized half precision gemm assembly kernels (deprecated due to ROCm)

Perl 47 11 Updated Jun 16, 2017

机器人视觉 移动机器人 VS-SLAM ORB-SLAM2 深度学习目标检测 yolov3 行为检测 opencv PCL 机器学习 无人驾驶

C++ 8,010 2,778 Updated Jul 9, 2024

【火炉炼AI】-机器学习系列文章

Jupyter Notebook 204 93 Updated Apr 13, 2019

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 20,410 4,160 Updated Oct 29, 2024

Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network

Python 689 265 Updated Mar 24, 2023

Implement SSSP on CPU, GPU, and Hybrid

Cuda 3 Updated Mar 20, 2020
Cuda 1 Updated Nov 12, 2019

ECG classification using MIT-BIH data, a deep CNN learning implementation of Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network,…

Python 190 60 Updated May 1, 2023

Awesome resources for GPUs

489 49 Updated Jul 1, 2023

Third party assembler and GEMM library for NVIDIA Kepler GPU

CSS 77 20 Updated Oct 8, 2019

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

C++ 22,233 5,582 Updated Nov 4, 2024

Must-read papers on graph neural networks (GNN)

15,993 2,992 Updated Dec 20, 2023

Assembler for NVIDIA Maxwell architecture

Sass 949 162 Updated Jan 3, 2023

CUDA Templates for Linear Algebra Subroutines

C++ 5,613 956 Updated Oct 29, 2024

Spark-GATK is a genomics analysis framwork based on Apache Spark and ADAM.

Scala 80 65 Updated Jul 16, 2020
Next