- Seoul, Republic Of Korea
- https://www.linkedin.com/in/hyunsoo-ha-872aaa134/
Stars
AI-data warehouse to enrich, transform and analyze data from cloud storages
Collection of training data management explorations for large language models
The RedPajama-Data repository contains code for preparing large datasets for training large language models.
The Universe of Evaluation. All about the evaluation for LLMs.
The Universe of Data. All about data, data science, and data engineering
A natural language interface for computers
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
KoAlpaca: 한국어 명령어를 이해하는 오픈소스 언어모델 (KoAlpaca: An open-source language model to understand Korean instructions)
Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)
This dataset contains re-annotations of 4 popular Latin/English scene text recognition datasets.
머신러닝 입문자 혹은 스터디를 준비하시는 분들에게 도움이 되고자 만든 repository입니다. (This repository is intented for helping whom are interested in machine learning study)
scikit-learn cross validators for iterative stratification of multilabel data
A python implementation of the Rapid Automatic Keyword Extraction
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.
Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.
Our project deals with the trend analysis on the crawled Korea Herald dataset using SRL-BERT and Sentence-BERT.
비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다
Korean BERT pre-trained cased (KoBERT)
Deep Keyphrase Extraction using BERT
An open-source NLP research library, built on PyTorch.
텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.