Skip to content
View shyram's full-sized avatar

Organizations

@Gubuzeong
Block or Report

Block or report shyram

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

🔤 NLP

Natural Language Processing
89 repositories

(한국어) 텍스트 마이닝을 위한 공부거리들

Jupyter Notebook 204 61 Updated Apr 7, 2020

Robust Speech Recognition via Large-Scale Weak Supervision

Python 64,900 7,581 Updated Jul 22, 2024
Jupyter Notebook 43 11 Updated Jul 5, 2024

Collection of papers and resources for data augmentation for NLP.

822 78 Updated Aug 12, 2022

Data augmentation for NLP

Jupyter Notebook 4,358 455 Updated Jun 24, 2024

Soft Contextual Data Augmentation

Python 39 9 Updated Jun 21, 2022

Natural Language Processing Tutorial for Deep Learning Researchers

Jupyter Notebook 13,935 3,892 Updated Feb 21, 2024

🐍 pymecab-ko. you can find original version here: https://bitbucket.org/eunjeon/mecab-ko, https://github.com/SamuraiT/mecab-python3

C++ 13 1 Updated Feb 14, 2023
Jupyter Notebook 13 2 Updated Mar 28, 2022

TensorFlow code and pre-trained models for BERT

Python 37,527 9,541 Updated Jul 20, 2024

한국어 높임말 교정

Python 25 3 Updated Dec 31, 2022

Korean HateSpeech Dataset

371 38 Updated Jul 18, 2020

fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backend.

C 407 26 Updated Jun 2, 2023
Python 1,443 123 Updated Apr 27, 2023

List of Korean pre-trained language models.

183 16 Updated Aug 31, 2023

Crawl BookCorpus

Python 796 111 Updated Jul 14, 2023

JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf

Python 23,397 1,946 Updated Apr 24, 2024

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

Python 10,806 1,158 Updated Jun 30, 2023

A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB team.

Python 242 37 Updated Dec 15, 2023

Website

Python 46 11 Updated Jan 24, 2023

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset

7,287 371 Updated Jul 16, 2023

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 11,220 758 Updated Jul 10, 2024

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333

Python 1,002 57 Updated Jan 11, 2024

A collection of open-source dataset to train instruction-following LLMs (ChatGPT,LLaMA,Alpaca)

1,044 58 Updated Jan 4, 2024

evolve llm training instruction, from english instruction to any language.

Python 105 12 Updated Sep 15, 2023

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,297 3,311 Updated Jul 22, 2024

Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"

Python 208 23 Updated Feb 4, 2024

GEMBA — GPT Estimation Metric Based Assessment

Python 84 13 Updated Feb 18, 2024

A preliminary evaluation of ChatGPT/GPT-4 for machine translation.

Python 235 16 Updated Nov 3, 2023