snoop2head's portfolio

Capture Questions, Answer with Code

Name: Young Jin Ahn
Email: [email protected]
Blog: https://snoop2head.github.io/

📖 Education

Korea Advanced Institute of Science and Technology (KAIST)

Master of Science in Artificial Intelligence

Machine Learning for AI (A+)
Advanced Deep Learning (A-)
Programming for AI (A+)
Scientific Writing (P)
Advanced Machine Learning for AI
Deep Reinforcement Learning
Machine Learning for Healthcare
Large Language Models

Yonsei University

Bachelor of Arts in Economics & Minor in Applied Statistics

INTRODUCTION TO STATISTICS (A0)
STATISTICAL METHOD (A+)
CALCULUS (B+)
LINEAR ALGEBRA (B+)
MATHEMATICAL STATISTICS 1 (A+)
LINEAR REGRESSION (B+)
R AND PYTHON PROGRAMMING (A+)
DATA STRUCTURE (B+)
SPECIAL PROBLEMS IN COMPUTING (A0)
SOCIAL INFORMATICS (A+)
TIME SERIES ANALYSIS (A+)
THEORY AND PRACTICE OF DEEP LEARNING (A+)

🏆 Competition Awards

Host / Platform	Topic / Task	Result	Repository	Year
National IT Industry Promotion Agency	Machine Reading Compehension	🥈 2nd (2/26)	MRC_Baseline	2022
Ministry of Statistics	Korean Standard Industry Classification	🎖 7th (7/311)	-	2022
Dacon	KLUE benchmark Natural Language Inference	🥇 1st (1/468)	🌐 KLUE NLI	2022
Dacon & AI Frenz	Python Code Clone Detection	🥉 3rd (3/337)	CloneDetection	2022
Dacon & CCEI Korea	Stock Price Forecast on KOSPI & KOSDAQ	🎖 6th (6/205)	elastic-stock-prediction	2021

**Dacon is Kaggle alike competition platform in Korea.

🛠 Multimodal Projects

KoDALLE: Text to Fashion (2021)

Generating dress outfit images based on given input text | 📄 Presentation

Created training pipeline from VQGAN through DALLE
Maintained versions of 1 million pairs image-caption dataset.
Trained VQGAN and DALLE model from the scratch.
Established live demo for the KoDALLE on Huggingface Space via FastAPI.

🔐 Differential Privacy

Language Model Memorization (2022)

Implementation of Carlini et al(2020) Extracting Training Data from Large Language Models

Accelerated inference speed with parallel Multi-GPU usage.
Ruled out 'low-quality repeated generations' problem of the paper with repetition penalty and with ngram restriction.

Membership Inference Attack (2022)

Implementation of Shokri et al(2016) Membership Inference Attacks Against Machine Learning Models

Prevented overfitting of shadow models' by adding early stop, regularizing with weight decay and allocating train/val/test datasets.
Referenced Carlini et al(2021) to conduct further research on different types of models and metrics.
Reproduced attack metrics as the following.

MIA Attack Metrics	Accuracy	Precision	Recall	F1 Score
CIFAR10	0.7761	0.7593	0.8071	0.7825
CIFAR100	0.9746	0.9627	0.9875	0.9749

MIA ROC Curve CIFAR10	MIA ROC Curve CIFAR100

💬 Natural Language Processing Projects

KoQuillBot (2022) & T5 Translation (2022)

Paraphrasing tool with round trip translation utilizing T5 Machine Translation. | 🤗 KoQuillBot Demo & 🤗 Translator Demo

	BLEU Score	Translation Result
Korean ➡️ English	45.15	🔗 Inference Result
English ➡️ Korean	-	-

Deep Encoder Shallow Decoder (2022)

Implementation of Kasai et al(2020) Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation | 📄 Translation Output

Composed custom dataset, trainer, inference code in pytorch and huggingface.
Trained and hosted encoder-decoder transformers model using huggingface.

	BLEU Score	Translation Result
Korean ➡️ English	35.82	🔗 Inference Result
English ➡️ Korean	-	-

KLUE-RBERT (2021)

Extracting relations between subject and object entity in KLUE Benchmark dataset | ✍️ Blog Post

Finetuned RoBERTa model according to RBERT structure in pytorch.
Applied stratified k-fold cross validation for the custom trainer.

Conditional Generation with KoGPT (2021)

Sentence generation with given emotion conditions | 🤗 Huggingface Demo

Finetuned KoGPT-Trinity with conditional emotion labels.
Maintained huggingface hosted model and live demo.

Machine Reading Comprehension in Naver Boostcamp (2021)

Retrieved and extracted answers from wikipedia texts for given question | ✍️ Blog Post

Attached bidirectional LSTM layers to the backbone transformers model to extract answers.
Divided benchmark into start token prediction accuracy and end token prediction accuracy.

Mathpresso Corporation Joint Project (2020)

Corporate joint project for mathematics problems classification task | 📄 Presentation

Preprocessed Korean mathematics problems dataset based on EDA.
Maintained version of preprocessing module.

Constructing Emotional Instagram Posts Dataset (2019)

Created Emotional Instagram Posts(글스타그램) dataset | 📄 Presentation

Managed version control for the project Github Repository.
Converted Korean texts on image file into text file using Google Cloud Vision API.

👀 Computer Vision Projects

DotNeuralNet (2023)

Light-weight Neural Network for Optical Braille Recognition in the wild & on the book. | 🤗 Huggingface Demo

Classified multi label one-hot encoded labels for raised braille patterns.
Pseudo-labeled Natural Scene Braille dataset.
Trained single stage object detection YOLO models for braille cell recognition.

ElimNet (2021)

Elimination based Lightweight Neural Net with Pretrained Weights | 📄 Presentation

Constructed lightweight CNN model with less than 1M #params by removing top layers from pretrained CNN models.
Assessed on Trash Annotations in Context(TACO) Dataset sampled for 6 classes with 20,851 images.
Compared metrics accross VGG11, MobileNetV3 and EfficientNetB0.

Face Mask, Age, Gender Classification in Naver Boostcamp (2021)

Identifying 18 classes from given images: Age Range(3 classes), Biological Sex(2 classes), Face Mask(3 classes) | ✍️ Blog Post

Optimized combination of backbone models, losses and optimizers.
Created additional dataset with labels(age, sex, mask) to resolve class imbalance.
Cropped facial characteristics with MTCNN and RetinaFace to reduce noise in the image.

Realtime Desktop Posture Classification (2020)

Real-time desk posture classification through webcam | 📷 Demo Video

Created real-time detection window using opencv-python.
Converted image dataset into Yaw/Pitch/Roll numerical dataset using RetinaFace model.
Trained and optimized random forest classification model with precision rate of 93%.

🕸 Web Projects

Exchange Program Overview Website (2020)

Overview for student life in foreign universities | ✈️ Website Demo

3400 Visitors within a year (2021.07 ~ 2022.07)
22000 Pageviews within a year (2021.07 ~ 2022.07)
3 minutes+ of Average Retention Time

Collected and preprocessed 11200 text review data from the Yonsei website using pandas.
Visualized department distribution and weather information using matplotlib.
Sentiment analysis on satisfaction level for foreign universities with pretrained BERT model.
Clustered universities with provided curriculum with K-means clustering.
Hosted reports on universities using Gatsby.js, GraphQL, and Netlify.

fitcuration website (2020)

Search-based exercise retrieval web service | 📷 Demo Video

Built retrieval algorithm based on search keyword using TF-IDF.
Deployed website using Docker, AWS RDS, AWS S3, AWS EBS
Constructed backend using Django, Django ORM & PostgreSQL.
Composed client-side using Sass, Tailwind, HTML5.

💰 Quantitative Finance Projects

Forecasting Federal Rate with Lasso Regression Model (2022)

Federal Rate Prediction for the next FOMC Meeting

Wrangled quantitative dataset with Finance Data Reader.
Yielded metrics and compared candidate regression models for the adaquate fit.
Hyperparameter optimization for the candidate models.

Korean Spinoff Event Tracker (2020)

Get financial data of public companies involved in spinoff events on Google Spreadsheet | 🧩 Dataset Demo

Wrangled finance dataset which are displayed on Google Sheets

🏷 Opensource Contributions

NVlabs/stylegan2-ada-pytorch (2021)

Fixed torch version comparison fallback error for source repo of NVIDIA Research | ✍️ Pull Request

Skills: torch, torchvision

docker/docker.github.io (2020)

Updated PostgreSQL initialization for "Quickstart: dockerizing django" documentation | ✍️ Pull Request

Skills: Docker, docker-compose, Django

🗄 ETCs

Covid19 Confirmed Cases Prediction (2020)

Predict the spread of COVID-19 in early stage after its entrance to country.

Fixed existing errors on Github Repository.
Wrote footnotes in both English and Korean.
±5% accuracy for one-day prediction.
±10% accuracy for 30-day prediction.

Indigo (2019)

Don't miss concerts for your favorite artists with KakaoTalk Chatbot | 📷 Demo Video

Created API server for KakaoTalk chatbot with Flask, Pymongo and MongoDB.
Deployed the API server on AWS EC2.
Visualized concert schedules on user's Google Calendar.
Created / Updated events in Google Calendar.

🛠 Skillsets

Data Analysis and Machine Learning

Data Analysis Library: pandas, numpy
Deep Learning: pytorch, transformers
Machine Learning: scikit-learn, gensim, xgboost

Backend

Python / Django - Django ORM, CRUD, OAuth
Python / FastAPI(uvicorn) - CRUD API
Python / Flask - CRUD API

Client

HTML / Pug.js
CSS / Sass, Tailwind, Bulma
JavaScript / ES6

Deployment

Docker, docker-compose
AWS EC2, Google Cloud App Engine
AWS S3, RDS (PostgreSQL)
AWS Elastic Beanstalk, CodePipeline;

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
images		images
.gitignore		.gitignore
CV.pdf		CV.pdf
README.md		README.md

snoop2head/portfolio

Folders and files

Latest commit

History

Repository files navigation