CryptoSalamander

Hyunsoo Luke HA CryptoSalamander

AI Research Engineer at Upstage. Previously researched OCR. Currently handling various data tasks in the Data-Centric LLM team for training Upstage Solar LLM.

50 followers · 35 following

@UpstageAI
Seoul, Republic Of Korea
https://www.linkedin.com/in/hyunsoo-ha-872aaa134/

Achievements

Organizations

Stars

iterative / datachain

AI-data warehouse to enrich, transform and analyze data from cloud storages

Python 938 55 Updated Nov 1, 2024

ZigeW / data_management_LLM

Collection of training data management explorations for large language models

278 28 Updated Aug 2, 2024

togethercomputer / RedPajama-Data

The RedPajama-Data repository contains code for preparing large datasets for training large language models.

Python 4,559 350 Updated Oct 17, 2024

UpstageAI / evalverse

The Universe of Evaluation. All about the evaluation for LLMs.

Python 213 21 Updated Jul 9, 2024

UpstageAI / dataverse

The Universe of Data. All about data, data science, and data engineering

Python 516 52 Updated Jul 18, 2024

OpenInterpreter / open-interpreter

A natural language interface for computers

Python 54,701 4,782 Updated Oct 31, 2024

Yuliang-Liu / MultimodalOCR

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Python 463 29 Updated Oct 16, 2024

Beomi / KoAlpaca

KoAlpaca: 한국어 명령어를 이해하는 오픈소스 언어모델 (KoAlpaca: An open-source language model to understand Korean instructions)

Jupyter Notebook 1,542 237 Updated Oct 25, 2024

ku21fan / STR-Fewer-Labels

Scene Text Recognition (STR) methods trained with fewer real labels (CVPR 2021)

Jupyter Notebook 174 27 Updated Dec 23, 2023

FangShancheng / ABINet-PP

ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting

Python 78 5 Updated Feb 11, 2023

sparkfish / augraphy

Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes

Python 348 46 Updated Sep 17, 2024

baudm / parseq

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

Python 577 127 Updated May 29, 2024

Jyouhou / Case-Sensitive-Scene-Text-Recognition-Datasets

This dataset contains re-annotations of 4 popular Latin/English scene text recognition datasets.

49 9 Updated Mar 24, 2020

adeline-cs / GTR

Scene text recognition

Python 105 14 Updated Jul 7, 2022

autonise / CRAFT-Remade

Implementation of CRAFT Text Detection

Python 191 47 Updated Jul 6, 2023

tony9402 / Heart_Disease_AI_Datathon_2021

심초음파/심전도 AI 모델 Datathon

Python 1 Updated Dec 2, 2021

teddylee777 / machine-learning

머신러닝 입문자 혹은 스터디를 준비하시는 분들에게 도움이 되고자 만든 repository입니다. (This repository is intented for helping whom are interested in machine learning study)

Jupyter Notebook 2,626 848 Updated Apr 5, 2024

boost-devs / ai-tech-interview

👩‍💻👨‍💻 AI 엔지니어 기술 면접 스터디 (⭐️ 1k+)

1,852 450 Updated Oct 12, 2024

trent-b / iterative-stratification

scikit-learn cross validators for iterative stratification of multilabel data

Python 850 75 Updated Oct 12, 2024

aneesha / RAKE

A python implementation of the Rapid Automatic Keyword Extraction

Python 974 594 Updated Sep 4, 2020

huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 134,291 26,850 Updated Nov 1, 2024

google-deepmind / mathematics_dataset

This dataset code generates mathematical question and answer pairs, from a range of question types at roughly school-level difficulty.

Python 1,795 249 Updated Jul 24, 2024

asahi417 / kex

Kex is a python library for unsupervised keyword extraction from a document, providing an easy interface and benchmarks on 15 public datasets.

Python 53 6 Updated Feb 17, 2022

wjdghks950 / Trend-Analysis-using-SRL--and-Sentence-BERT

Our project deals with the trend analysis on the crawled Korea Herald dataset using SRL-BERT and Sentence-BERT.

Jupyter Notebook 2 1 Updated Jun 6, 2021

lovit / KR-WordRank

비지도학습 방법으로 한국어 텍스트에서 단어/키워드를 자동으로 추출하는 라이브러리입니다

Python 352 57 Updated Apr 13, 2022

SKTBrain / KoBERT

Korean BERT pre-trained cased (KoBERT)

Jupyter Notebook 1,296 368 Updated Oct 3, 2024

MaartenGr / KeyBERT

Minimal keyword extraction with BERT

Python 3,524 348 Updated Jul 16, 2024

ibatra / BERT-Keyword-Extractor

Deep Keyphrase Extraction using BERT

Jupyter Notebook 255 71 Updated Feb 21, 2022

allenai / allennlp

An open-source NLP research library, built on PyTorch.

Python 11,754 2,251 Updated Nov 22, 2022

uoneway / Text-Summarization-Repo

텍스트 요약 분야의 주요 연구 주제, Must-read Papers, 이용 가능한 model 및 data 등을 추천 자료와 함께 정리한 저장소입니다.

332 47 Updated Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hyunsoo Luke HA CryptoSalamander

Achievements

Achievements

Organizations

Block or report CryptoSalamander

Stars

iterative / datachain

ZigeW / data_management_LLM

togethercomputer / RedPajama-Data

UpstageAI / evalverse

UpstageAI / dataverse

OpenInterpreter / open-interpreter

Yuliang-Liu / MultimodalOCR

Beomi / KoAlpaca

ku21fan / STR-Fewer-Labels

FangShancheng / ABINet-PP

sparkfish / augraphy

baudm / parseq

Jyouhou / Case-Sensitive-Scene-Text-Recognition-Datasets

adeline-cs / GTR

autonise / CRAFT-Remade

tony9402 / Heart_Disease_AI_Datathon_2021

teddylee777 / machine-learning

boost-devs / ai-tech-interview

trent-b / iterative-stratification

aneesha / RAKE

huggingface / transformers

google-deepmind / mathematics_dataset

asahi417 / kex

wjdghks950 / Trend-Analysis-using-SRL--and-Sentence-BERT

lovit / KR-WordRank

SKTBrain / KoBERT

MaartenGr / KeyBERT

ibatra / BERT-Keyword-Extractor

allenai / allennlp

uoneway / Text-Summarization-Repo