Skip to content

snoop2head/portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

96 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

snoop2head's portfolio

Capture Questions, Answer with Code


πŸ“– Education

Korea Advanced Institute of Science and Technology (KAIST)

Master of Science in Artificial Intelligence
  • Machine Learning for AI (A+)
  • Advanced Deep Learning (A-)
  • Programming for AI (A+)
  • Scientific Writing (P)
  • Advanced Machine Learning for AI
  • Deep Reinforcement Learning
  • Machine Learning for Healthcare
  • Large Language Models

Yonsei University

Bachelor of Arts in Economics & Minor in Applied Statistics
  • INTRODUCTION TO STATISTICS (A0)
  • STATISTICAL METHOD (A+)
  • CALCULUS (B+)
  • LINEAR ALGEBRA (B+)
  • MATHEMATICAL STATISTICS 1 (A+)
  • LINEAR REGRESSION (B+)
  • R AND PYTHON PROGRAMMING (A+)
  • DATA STRUCTURE (B+)
  • SPECIAL PROBLEMS IN COMPUTING (A0)
  • SOCIAL INFORMATICS (A+)
  • TIME SERIES ANALYSIS (A+)
  • THEORY AND PRACTICE OF DEEP LEARNING (A+)

πŸ† Competition Awards

Host / Platform Topic / Task Result Repository Year
National IT Industry
Promotion Agency
Machine Reading Compehension πŸ₯ˆ 2nd
(2/26)
image MRC_Baseline 2022
Ministry of Statistics Korean Standard Industry Classification πŸŽ– 7th
(7/311)
- 2022
Dacon KLUE benchmark Natural Language Inference πŸ₯‡ 1st
(1/468)
🌐 KLUE NLI 2022
Dacon & AI Frenz Python Code Clone Detection πŸ₯‰ 3rd
(3/337)
image CloneDetection 2022
Dacon & CCEI Korea Stock Price Forecast on KOSPI & KOSDAQ πŸŽ– 6th
(6/205)
image elastic-stock-prediction 2021

**Dacon is Kaggle alike competition platform in Korea.


πŸ›  Multimodal Projects

image

Generating dress outfit images based on given input text | πŸ“„ Presentation

  • Created training pipeline from VQGAN through DALLE
  • Maintained versions of 1 million pairs image-caption dataset.
  • Trained VQGAN and DALLE model from the scratch.
  • Established live demo for the KoDALLE on Huggingface Space via FastAPI.

πŸ” Differential Privacy

Implementation of Carlini et al(2020) Extracting Training Data from Large Language Models

  • Accelerated inference speed with parallel Multi-GPU usage.
  • Ruled out 'low-quality repeated generations' problem of the paper with repetition penalty and with ngram restriction.

Implementation of Shokri et al(2016) Membership Inference Attacks Against Machine Learning Models

  • Prevented overfitting of shadow models' by adding early stop, regularizing with weight decay and allocating train/val/test datasets.
  • Referenced Carlini et al(2021) to conduct further research on different types of models and metrics.
  • Reproduced attack metrics as the following.
MIA Attack Metrics Accuracy Precision Recall F1 Score
CIFAR10 0.7761 0.7593 0.8071 0.7825
CIFAR100 0.9746 0.9627 0.9875 0.9749
MIA ROC Curve CIFAR10 MIA ROC Curve CIFAR100
roc_curve CIFAR10 roc_curve CIFAR100

πŸ’¬ Natural Language Processing Projects

Paraphrasing tool with round trip translation utilizing T5 Machine Translation. | πŸ€— KoQuillBot Demo & πŸ€— Translator Demo

BLEU Score Translation Result
Korean ➑️ English 45.15 πŸ”— Inference Result
English ➑️ Korean - -

Implementation of Kasai et al(2020) Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation | πŸ“„ Translation Output

  • Composed custom dataset, trainer, inference code in pytorch and huggingface.
  • Trained and hosted encoder-decoder transformers model using huggingface.
BLEU Score Translation Result
Korean ➑️ English 35.82 πŸ”— Inference Result
English ➑️ Korean - -

Extracting relations between subject and object entity in KLUE Benchmark dataset | ✍️ Blog Post

  • Finetuned RoBERTa model according to RBERT structure in pytorch.
  • Applied stratified k-fold cross validation for the custom trainer.

Sentence generation with given emotion conditions | πŸ€— Huggingface Demo

  • Finetuned KoGPT-Trinity with conditional emotion labels.
  • Maintained huggingface hosted model and live demo.

Retrieved and extracted answers from wikipedia texts for given question | ✍️ Blog Post

  • Attached bidirectional LSTM layers to the backbone transformers model to extract answers.
  • Divided benchmark into start token prediction accuracy and end token prediction accuracy.

Corporate joint project for mathematics problems classification task | πŸ“„ Presentation

  • Preprocessed Korean mathematics problems dataset based on EDA.
  • Maintained version of preprocessing module.

Created Emotional Instagram Posts(κΈ€μŠ€νƒ€κ·Έλž¨) dataset | πŸ“„ Presentation

  • Managed version control for the project Github Repository.
  • Converted Korean texts on image file into text file using Google Cloud Vision API.

πŸ‘€ Computer Vision Projects

Light-weight Neural Network for Optical Braille Recognition in the wild & on the book. | πŸ€— Huggingface Demo

yolov8 img

  • Classified multi label one-hot encoded labels for raised braille patterns.
  • Pseudo-labeled Natural Scene Braille dataset.
  • Trained single stage object detection YOLO models for braille cell recognition.

Elimination based Lightweight Neural Net with Pretrained Weights | πŸ“„ Presentation

  • Constructed lightweight CNN model with less than 1M #params by removing top layers from pretrained CNN models.
  • Assessed on Trash Annotations in Context(TACO) Dataset sampled for 6 classes with 20,851 images.
  • Compared metrics accross VGG11, MobileNetV3 and EfficientNetB0.

Identifying 18 classes from given images: Age Range(3 classes), Biological Sex(2 classes), Face Mask(3 classes) | ✍️ Blog Post

  • Optimized combination of backbone models, losses and optimizers.
  • Created additional dataset with labels(age, sex, mask) to resolve class imbalance.
  • Cropped facial characteristics with MTCNN and RetinaFace to reduce noise in the image.

Real-time desk posture classification through webcam | πŸ“· Demo Video

  • Created real-time detection window using opencv-python.
  • Converted image dataset into Yaw/Pitch/Roll numerical dataset using RetinaFace model.
  • Trained and optimized random forest classification model with precision rate of 93%.

πŸ•Έ Web Projects

Overview for student life in foreign universities | ✈️ Website Demo

  • 3400 Visitors within a year (2021.07 ~ 2022.07)
  • 22000 Pageviews within a year (2021.07 ~ 2022.07)
  • 3 minutes+ of Average Retention Time

imageimage

  • Collected and preprocessed 11200 text review data from the Yonsei website using pandas.
  • Visualized department distribution and weather information using matplotlib.
  • Sentiment analysis on satisfaction level for foreign universities with pretrained BERT model.
  • Clustered universities with provided curriculum with K-means clustering.
  • Hosted reports on universities using Gatsby.js, GraphQL, and Netlify.

Search-based exercise retrieval web service | πŸ“· Demo Video

  • Built retrieval algorithm based on search keyword using TF-IDF.
  • Deployed website using Docker, AWS RDS, AWS S3, AWS EBS
  • Constructed backend using Django, Django ORM & PostgreSQL.
  • Composed client-side using Sass, Tailwind, HTML5.

imageimageimageimage

πŸ’° Quantitative Finance Projects

Federal Rate Prediction for the next FOMC Meeting

  • Wrangled quantitative dataset with Finance Data Reader.
  • Yielded metrics and compared candidate regression models for the adaquate fit.
  • Hyperparameter optimization for the candidate models.

Get financial data of public companies involved in spinoff events on Google Spreadsheet | 🧩 Dataset Demo

  • Wrangled finance dataset which are displayed on Google Sheets

🏷 Opensource Contributions

Fixed torch version comparison fallback error for source repo of NVIDIA Research | ✍️ Pull Request

  • Skills: torch, torchvision

Updated PostgreSQL initialization for "Quickstart: dockerizing django" documentation | ✍️ Pull Request

  • Skills: Docker, docker-compose, Django

πŸ—„ ETCs

Predict the spread of COVID-19 in early stage after its entrance to country.

  • Fixed existing errors on Github Repository.
  • Wrote footnotes in both English and Korean.
  • Β±5% accuracy for one-day prediction.
  • Β±10% accuracy for 30-day prediction.

Don't miss concerts for your favorite artists with KakaoTalk Chatbot | πŸ“· Demo Video

  • Created API server for KakaoTalk chatbot with Flask, Pymongo and MongoDB.
  • Deployed the API server on AWS EC2.
  • Visualized concert schedules on user's Google Calendar.
  • Created / Updated events in Google Calendar.

πŸ›  Skillsets

Data Analysis and Machine Learning

  • Data Analysis Library: pandas, numpy
  • Deep Learning: pytorch, transformers
  • Machine Learning: scikit-learn, gensim, xgboost

Backend

  • Python / Django - Django ORM, CRUD, OAuth
  • Python / FastAPI(uvicorn) - CRUD API
  • Python / Flask - CRUD API

Client

  • HTML / Pug.js
  • CSS / Sass, Tailwind, Bulma
  • JavaScript / ES6

Deployment

  • Docker, docker-compose
  • AWS EC2, Google Cloud App Engine
  • AWS S3, RDS (PostgreSQL)
  • AWS Elastic Beanstalk, CodePipeline;

About

πŸ“„ snoop2head's portfolio

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published