Skip to content
View machilusZ's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report machilusZ

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Ring attention implementation with flash attention

Python 578 45 Updated Oct 30, 2024
Python 7 1 Updated Sep 25, 2024

Puzzles for learning Triton

Jupyter Notebook 1,061 73 Updated Sep 25, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 1,389 127 Updated Nov 6, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,618 959 Updated Oct 29, 2024

Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton

Python 1,316 67 Updated Nov 6, 2024

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Python 85 8 Updated Aug 9, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 250 10 Updated Oct 11, 2024

Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.

HTML 194 78 Updated Feb 7, 2024

Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory

Python 17,836 1,236 Updated Nov 6, 2024

Supercharge Your LLM Application Evaluations 🚀

Python 7,132 723 Updated Nov 6, 2024
Python 312 16 Updated Jul 16, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 29,686 4,481 Updated Nov 6, 2024

[ACL'24 Outstanding] Data and code for L-Eval, a comprehensive long context language models evaluation benchmark

Python 357 14 Updated Jul 9, 2024

[ECIR'24] Implementation of "Large Language Models are Zero-Shot Rankers for Recommender Systems"

Python 232 21 Updated Jan 25, 2024

[ACL 2024] LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Python 657 53 Updated Sep 10, 2024

A PHP based search engine that filters results from various sources like Google, Yahoo and Bing, based on the relevance of web pages with searched keywords. To evaluate the results various technics…

PHP 3 1 Updated Oct 25, 2017

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,624 273 Updated Aug 14, 2024

🌟 Chrome extension that enables users to chat with ChatGPT by opening a sidebar on any website

TypeScript 73 19 Updated Nov 5, 2024

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 35,351 4,102 Updated Nov 6, 2024

LLM inference in C/C++

C++ 67,375 9,673 Updated Nov 6, 2024
Python 102 17 Updated Oct 15, 2024

The ChatGPT Retrieval Plugin lets you easily find personal or work documents by asking questions in natural language.

Python 21,060 3,687 Updated Jul 4, 2024

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Python 33,777 5,747 Updated Nov 6, 2024

Training and serving large-scale neural networks with auto parallelization.

Python 3,072 357 Updated Dec 9, 2023

Let us control diffusion models!

Python 30,273 2,722 Updated Feb 25, 2024

Source Code of Paper "GPTScore: Evaluate as You Desire"

Python 229 16 Updated Feb 21, 2023
TypeScript 154 23 Updated Jul 27, 2023

Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation

Python 193 26 Updated Feb 10, 2024

NLTK library wrapper for .NET

C# 46 8 Updated Feb 26, 2021
Next