Skip to content
View bertmaher's full-sized avatar

Block or report bertmaher

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
  • triton Public

    Forked from triton-lang/triton

    Development repository for the Triton language and compiler

    C++ MIT License Updated Oct 10, 2024
  • FBGEMM-1 Public

    Forked from pytorch/FBGEMM

    FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

    C++ Other Updated Oct 1, 2024
  • benchmark Public

    Forked from pytorch/benchmark
    Python BSD 3-Clause "New" or "Revised" License Updated Sep 27, 2024
  • Fast and memory-efficient exact attention

    Python BSD 3-Clause "New" or "Revised" License Updated Sep 6, 2024
  • A simplistic approach to high-performance GPU matmul

    Cuda 1 Updated Jul 31, 2024
  • pytorch Public

    Forked from pytorch/pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

    C++ 3 Other Updated Jul 10, 2024
  • tf32_gemm Public

    Example of binding a TF32 CUTLASS GEMM kernel to PyTorch

    Python 6 1 Updated Jun 7, 2024
  • llama2.so Public

    Forked from karpathy/llama2.c

    Inference Llama 2 with a model compiled to native code by TorchInductor

    C++ 10 4 MIT License Updated Feb 8, 2024
  • HTML Updated Jul 11, 2023
  • multipy Public

    Forked from pytorch/multipy

    torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters in a single C++ process.

    C++ Other Updated Nov 1, 2022
  • torchdynamo Public

    Forked from pytorch/torchdynamo

    A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

    Python BSD 3-Clause "New" or "Revised" License Updated Oct 7, 2022
  • A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python 🚀

    Python 1 The Unlicense Updated Oct 26, 2021
  • functorch Public

    Forked from pytorch/functorch

    functorch is a prototype of JAX-like composable function transforms for PyTorch.

    C++ BSD 3-Clause "New" or "Revised" License Updated Oct 18, 2021
  • cs344 Public

    C++ Updated Oct 6, 2021
  • learn_cuda Public

    Simple programs for learning CUDA

    Cuda Updated Sep 27, 2021
  • builder Public

    Forked from pytorch/builder

    Continuous builder and binary build scripts for pytorch

    Shell BSD 2-Clause "Simplified" License Updated Sep 7, 2021
  • membench Public

    C++ 2 2 Updated May 24, 2021
  • tvm Public

    Forked from apache/tvm

    Open deep learning compiler stack for cpu, gpu and specialized accelerators

    Python Apache License 2.0 Updated Mar 30, 2021
  • Python 1 MIT License Updated Oct 18, 2020
  • nvprof2json Public

    Forked from ezyang/nvprof2json

    Convert nvprof profiles into about:tracing compatible JSON files

    Python Updated Sep 13, 2020
  • lmdave Public

    Forked from MaiZure/lmdave

    Let's Make: Dangerous Dave

    C Updated Sep 5, 2020
  • hub Public

    Forked from pytorch/hub

    Submission to https://pytorch.org/hub/

    Python Updated Sep 1, 2020
  • Google AI 2018 BERT pytorch implementation

    Python Apache License 2.0 Updated Aug 28, 2020
  • Background Matting: The World is Your Green Screen

    Python Updated Aug 28, 2020
  • fastNLP Public

    Forked from zhangguanheng66/fastNLP

    fastNLP: A Modularized and Extensible NLP Framework. Currently still in incubation.

    Python Apache License 2.0 Updated Aug 25, 2020
  • Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.

    Python MIT License Updated Aug 20, 2020
  • glow Public

    Forked from pytorch/glow

    Compiler for Neural Network hardware accelerators

    C++ Apache License 2.0 Updated Feb 14, 2020
  • tvm-1 Public

    Forked from pytorch/tvm

    TVM integration into PyTorch

    C++ Updated Aug 23, 2019
  • bitserial Public

    Hacking around with ultra-low precision GEMM using TVM

    LLVM 2 3 Updated Aug 6, 2019
  • ds2 Public

    Forked from facebookarchive/ds2

    Debug server for lldb.

    C++ Other Updated Dec 12, 2017