-
Samsung Research HQ
- in/shyram
- https://bento.me/shyram
Block or Report
Block or report shyram
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuse🔤 NLP
Robust Speech Recognition via Large-Scale Weak Supervision
Collection of papers and resources for data augmentation for NLP.
Natural Language Processing Tutorial for Deep Learning Researchers
🐍 pymecab-ko. you can find original version here: https://bitbucket.org/eunjeon/mecab-ko, https://github.com/SamuraiT/mecab-python3
TensorFlow code and pre-trained models for BERT
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform
A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.
OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)
evolve llm training instruction, from english instruction to any language.
A high-throughput and memory-efficient inference and serving engine for LLMs
Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"
GEMBA — GPT Estimation Metric Based Assessment
A preliminary evaluation of ChatGPT/GPT-4 for machine translation.