Stars
Introduction to Parallel Programming class code
Implement asm gemm on vega64 for 4096x4096 fp32 matrix
14 basic topics for VEGA64 performance optmization
ROCm / tensorflow-upstream
Forked from tensorflow/tensorflowTensorFlow ROCm port
Kubernetes (k8s) device plugin to enable registration of AMD GPU to a container cluster
Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
Code for training py-faster-rcnn and py-R-FCN on multiple GPUs in caffe
YOLO reimplement in caffe, written with python layer.
A tensorflow implementation for SqueezeDet, a convolutional neural network for object detection.
weiliu89 / caffe
Forked from BVLC/caffeCaffe: a fast open framework for deep learning.