cpm0722

Hansu Kim cpm0722

SW Engineer / ML Engineer

75 followers · 46 following

Achievements

Organizations

Block or Report

Block or report cpm0722

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Lists (16)

Sort

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

likejazz / llama3.np

llama3.np is a pure NumPy implementation for Llama 3 model.

Python 937 71 Updated Jun 2, 2024

likejazz / llama3.cuda

llama3.cuda is a pure C/CUDA implementation for Llama 3 model.

Cuda 271 17 Updated Jun 4, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 62,012 8,902 Updated Jul 22, 2024

ItzCrazyKns / Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

TypeScript 11,264 1,002 Updated Jul 22, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,031 138 Updated Jul 20, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 5,910 1,571 Updated Jul 22, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,644 83 Updated Jan 21, 2024

MDK8888 / GPTFast

Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.

Python 663 63 Updated Jun 23, 2024

joelparkerhenderson / architecture-decision-record

Architecture decision record (ADR) examples for software planning, IT leadership, and template documentation

11,609 2,398 Updated Jun 12, 2024

locustio / locust

Write scalable load tests in plain Python 🚗💨

Python 24,260 2,930 Updated Jul 22, 2024

kakaobrain / honeybee

Official implementation of project Honeybee (CVPR 2024)

Python 400 18 Updated May 10, 2024

flame / how-to-optimize-gemm

C 1,680 351 Updated Jul 29, 2023

tpoisonooo / how-to-optimize-gemm

row-major matmul optimization

C++ 568 79 Updated Sep 9, 2023

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 765 121 Updated Jul 29, 2023

hao-ai-lab / LookaheadDecoding

Python 1,049 63 Updated Feb 14, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,380 488 Updated Jul 13, 2024

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 10,281 616 Updated Jul 18, 2024

qdrant / qdrant

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Rust 18,949 1,297 Updated Jul 22, 2024

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,771 166 Updated Jul 10, 2024

ByteByteGoHq / system-design-101

Explain complex systems using visuals and simple terms. Help you prepare for system design interviews.

60,774 6,276 Updated May 16, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,620 830 Updated Jul 19, 2024

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 3,451 306 Updated Jul 22, 2024

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

Python 6,823 767 Updated Jul 22, 2024