Skip to content

veredsil/TCGA-Data-Analysis

 
 

Repository files navigation

💻 TCGA Genome Data Analysis project 💻

본 프로젝트는 Dataon의 2022 연구 데이터 분석활용 경진대회에 참여하며 진행했습니다.

대회 URL : http:https://dataon-con.kr/pages/about_new.php

주제 - Gexp : Genemarker Expert 머신러닝 기반 멀티 클래스 분석 바이오 마커 탐지 소프트웨어

👩‍👩‍👧‍👧 Team Info.

이름 역할
김예지 Measurement of Ranking and Feature Importance Using Modeling
한채은 Measure and Compare Accuracy Using Modeling and Visualization
이선우 Data download and extract file Using Web Crawling
강서연 Data Visualization Using Heatmap and clustering

🏆 Awarding

🎉 우수상 수상 🎉

공모전 수상

📋 Pipeline

Scripts

gexp        
├── download_cancer.py     
├── load_labeled_data.py     
├── biomarker_rank.py       
├── plot_stepwise_accuracy.py     
├── describe_genes.py
├── normalize.py        
├── plot_heatmap.py        
└──            

download_cancer.py

Download cancer data (mRNAseq) from the firebrowse site(http:https://firebrowse.org/)

Optional Argument
 --cancer_list
 --data_source

Example
image

load_labeled_data.py

Create a Target variable as part of the preprocessing process

Optional Argument
 --data_dir
 --label_list
 --patient_type

Example

biomarker_rank.py

Measure and rank feature importance by model(RandomForest, EXtraTrees, XGBoost, AdaBoost, DecisionTree)

Optional Argument
 --cancer_df
 --models

Example

plot_stepwise_accuracy.py

Visualization of accuracy by step and model(RandomForest, MLP)

Optional Argument
 --cancer_df    
 --ranking_df     
 --model     
 --step_num    
 --metric     
 --multi_class    

Example

describe_genes.py

As a result of performance evaluation, gene information can be viewed as many as the number of genes in the high-performance model

Optional Argument          
 --score_df       
 --ranking_df        
 --gene_descrip       

Example

plot_heatmap.py

Visualization with clustermap as normalized data for top N biomarker genes

Optional Arguement
 --cancer_df       
 --ranking_df    
 --topN       
 --vmin      
 --vmax      

Example

About

cancer mRNAseq data analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 90.0%
  • Python 10.0%