From 1c08ebed3ac7bca787cfb305728e6774347ed9ea Mon Sep 17 00:00:00 2001 From: ASAP_k1ky <46879264+Bue-von-hon@users.noreply.github.com> Date: Sun, 1 Mar 2020 12:54:43 +0900 Subject: [PATCH] Update README.ko.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit finish!๐Ÿ˜Ž๐Ÿ˜Ž --- README.ko.md | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/README.ko.md b/README.ko.md index 6325648..672689f 100644 --- a/README.ko.md +++ b/README.ko.md @@ -6,19 +6,19 @@ ProteinNet์€ ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ ์˜ˆ์ธก ๋Œ€ํšŒ(CASP)๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“ค์–ด ์กŒ CASP๋Š” ์ด๋ฏธ ์•Œ๊ณ ๋Š” ์žˆ์ง€๋งŒ ๊ณต๊ฐœ๋˜์ง€ ์•Š์€, ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ๋ธ”๋ผ์ธ๋“œ ํ…Œ์ŠคํŠธ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋Œ€ํšŒ์ž…๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ๊ฑฐ๋‚˜ ์ ์€ ํ™˜๊ฒฝ์•ˆ์—์„œ๋„ ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ, ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์…‹์˜ ํฌ๊ธฐ๋ฅผ ํ•˜๋‚˜๋กœ ์ด์–ด์ง€๋Š” ๋ฐ์ดํ„ฐ ํ˜•ํƒœ๋กœ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. -#### ์ด๊ฑด ์•„์ง ๊ฐœ๋ฐœ์ค‘์— ์žˆ์Šต๋‹ˆ๋‹ค +** ์ด๊ฑด ์•„์ง ๊ฐœ๋ฐœ์ค‘์— ์žˆ์Šต๋‹ˆ๋‹ค ** ๋ฐ์ดํ„ฐ ์…‹์œผ๋กœ ๋งŒ๋“ค๊ธฐ ์œ„ํ•œ ์›๋ณธ ๋ฐ์ดํ„ฐ๋“ค์€ ์•„์ง ์ด์šฉ๊ฐ€๋Šฅํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ProteinNet 12์— ์“ฐ์ธ raw MSA data (4TB)๋Š” ์š”์ฒญ์— ์˜ํ•ด ์ œ๊ณต ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ [์—ฌ๊ธฐ](https://github.com/aqlaboratory/proteinnet/blob/master/docs/raw_data.md)๋ฅผ ํด๋ฆญํ•˜์„ธ์š” -## ๋™๊ธฐ +### ๋™๊ธฐ ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ ์˜ˆ์ธก์€ ์ƒํ™”ํ•™ ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ์–ด๋ ค์šด ๋ฌธ์ œ ์ค‘ ํ•˜๋‚˜ ์ž…๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋Š” ์ƒ๋ฌผํ•™๊ณผ ํ™”ํ•™๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์ฃผ์ œ์ด์ง€๋งŒ ๋จธ์‹ ๋Ÿฌ๋‹ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ๋Š” ์ƒ์†Œํ•œ ๋ถ„์•ผ์ž…๋‹ˆ๋‹ค. ์ด๋Š” ๋‘๊ฐ€์ง€ ์ด์œ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ถ”์ธก๋ฉ๋‹ˆ๋‹ค. 1. ๋†’์€ ์ง„์ž… ์žฅ๋ฒฝ 2. ํ‘œ์ค€ํ™”์˜ ๋ถ€์žฌ ์ด ๋‘๊ฐ€์ง€ ๋ฌธ์ œ๊ฐ€ ํ•ด๊ฒฐ๋œ๋‹ค๋ฉด ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ ์˜ˆ์ธก์€ ๋น„์ „์ธ์‹, ์Œ์„ฑ์ธ์‹๊ณผ ๋”๋ถˆ์–ด ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์ฃผ์š” ๋ถ„์•ผ๊ฐ€ ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. [ImageNet](http://www.image-net.org)์ด ์ปดํ“จํ„ฐ ๋น„์ „ ๊ธฐ์ˆ  ๋ฐœ์ „์˜ ์›๋™๋ ฅ์ด ๋˜์—ˆ๋“ฏ์ด ProteinNet์€ ๋จธ์‹  ๋Ÿฌ๋‹ ๋ถ„์•ผ์˜ ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ ๋ถ€๋ถ„์—์„œ ๋ˆ„๊ตฌ๋“  ์‰ฝ๊ฒŒ ์‹œ์ž‘ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ‘œ์ค€ํ™”๋œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ํŠธ๋ ˆ์ด๋‹, ํ‰๊ฐ€, ํ…Œ์ŠคํŠธ๋ฅผ ์ œ๊ณต ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค. -## ์ ‘๊ทผ๋ฒ• +### ์ ‘๊ทผ๋ฒ• CASP ๋Œ€ํšŒ๋Š” 2๋…„์— ํ•œ๋ฒˆ ์—ด๋ฆฝ๋‹ˆ๋‹ค. ์ด ๋Œ€ํšŒ์—์„œ๋Š” ์ตœ๊ทผ์— ๋ฐฃํ˜€ ์กŒ์ง€๋งŒ, ์•„์ง ๊ณต๊ฐœ๋˜์ง€ ์•Š์€ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์— ๋Œ€ํ•œ ๊ตฌ์กฐ๋ฅผ ์ „์„ธ๊ณ„ ์ฐธ๊ฐ€์ž๋“ค์ด ํ•ด๊ฒฐํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋Œ€ํšŒ ์ฐธ๊ฐ€์ž๋“ค์€ ์ด๋Ÿฐ ๊ตฌ์กฐ๋“ค์— ๋Œ€ํ•ด ๋ธ”๋ผ์ธ๋“œ ์˜ˆ์ธก์„ ํ•˜๊ณ  ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€๋ฐ›๊ฒŒ ๋๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ CASP ๊ตฌ์กฐ๋Š” ํŠน์ • ์‹œ์ ์—์„œ ์–ผ๋งˆ๋‚˜ ์˜ˆ์ธก์ด ์ž˜ ๋˜์—ˆ๋Š”๊ฐ€์— ๋Œ€ํ•œ ํ‘œ์ค€ํ™”๋œ ๊ธฐ์ค€์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ProteinNet์˜ ๊ธฐ๋ณธ์ ์ธ ์ƒ๊ฐ์€ CASP ํ…Œ์ŠคํŠธ ์…‹์„ ์‚ฌ์šฉํ•˜์—ฌ CASP์— ํŽธ์Šนํ•˜๋Š” ๊ฒƒ ์ž…๋‹ˆ๋‹ค. Proteinnet์€ ํ›ˆ๋ จ, ํ‰๊ฐ€์šฉ ๋ฐ์ดํ„ฐ๋ฅผ CASP ์‹คํ—˜ ์ด์ „์˜ ์กฐ๊ฑด์„ ์žฌ์„ค์ • ํ•จ์œผ๋กœ์จ ํ…Œ์ŠคํŠธ ์…‹์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ Proteinnet์€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์„œ์—ด๊ณผ ๊ตฌ์กฐ๋ฅผ ์‹œ์ž‘ ์ „์— ์ œํ•œํ•ฉ๋‹ˆ๋‹ค. ์ด๊ฑด [BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi)์™€ ๊ฐ™์€ ํ‘œ์ค€ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ํžˆ์Šคํ† ๋ฆฌ ๋ฒ„์ „์„ ์œ ์ง€ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. @@ -29,7 +29,7 @@ CASP ๋Œ€ํšŒ๋Š” 2๋…„์— ํ•œ๋ฒˆ ์—ด๋ฆฝ๋‹ˆ๋‹ค. ์ด ๋Œ€ํšŒ์—์„œ๋Š” ์ตœ๊ทผ์— ๋ฐฃํ˜€ ์ด๋Ÿฐ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ๋Š” ๋ชจ๋ธ์ด ๋ฐ์ดํ„ฐ์…‹์˜ ๋ถ„ํฌ ๋ณ€ํ™”๋ฅผ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ปค๋ฒ„ํ•˜๋Š”์ง€ ํ…Œ์ŠคํŠธํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ€๋ฐ˜์„ฑ ๋ฌธ์ œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฐ ์ ์„ Proteinnet์˜ ๊ฐ€์žฅ ์–ด๋ ค์šด ํ‰๊ฐ€ ์…‹์ด CASP FM๋ณด๋‹ค ์–ด๋ ต๋‹ค๋Š” ์ ์—์„œ ์•Œ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. -## ๋‹ค์šด๋กœ๋“œ +### ๋‹ค์šด๋กœ๋“œ Proteinnet์˜ ๊ธฐ๋ก์€ ๋‘๊ฐ€์ง€ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” ์‚ฌ๋žŒ๊ณผ ๊ธฐ๊ณ„ ๋ชจ๋‘ ์ฝ์„ ์ˆ˜ ์žˆ๋Š” ํ…์ŠคํŠธ ํŒŒ์ผ(ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ฐ€๋Šฅํ•œ ํŒŒ์ผ), ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ํ…์„œํ”Œ๋กœ์— ํŠนํ™”๋œ TFRecordํŒŒ์ผ์ž…๋‹ˆ๋‹ค. ํŒŒ์ผ ํ˜•์‹์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ์ •๋ณด๋Š” [์—ฌ๊ธฐ](https://github.com/aqlaboratory/proteinnet/blob/master/docs/proteinnet_records.md#file-formats)๋ฅผ ํด๋ฆญํ•˜์„ธ์š”. @@ -40,18 +40,24 @@ Proteinnet์˜ ๊ธฐ๋ก์€ ๋‘๊ฐ€์ง€ ํ˜•ํƒœ๋กœ ์ œ๊ณต๋ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” ์‚ฌ๋žŒ๊ณผ * CASP 12 ํ…Œ์ŠคํŠธ ์…‹์€ ๋ฏธ์™„์„ฑ์ž…๋‹ˆ๋‹ค.(์— ๋ฐ”๊ณ ์ค‘์ž„) ์— ๋ฐ”๊ณ  ๋๋‚˜๋ฉด ๊ณต๊ฐœํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. -## ๋ฌธ์„œ +### ๋ฌธ์„œ * [ProteinNet Records](docs/proteinnet_records.md) * [Splitting Methodology](docs/splitting_methodology.md) * [Raw Data](docs/raw_data.md) * [FAQ](docs/FAQ.md) -## PyTorch Parser -Proteinnet์€ ํ…์„œํ”Œ๋กœ๊ธฐ๋ฐ˜ ๊ณต์‹ ํŒŒ์„œ๋ฅด ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. +### PyTorch Parser +Proteinnet์€ ํ…์„œํ”Œ๋กœ๊ธฐ๋ฐ˜ ๊ณต์‹ ํŒŒ์„œ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ํŒŒ์ดํ† ์น˜๊ธฐ๋ฐ˜์˜ ํŒŒ์„œ๋Š” [Jeppe Hallgren](https://github.com/JeppeHallgren)์”จ๊ฐ€ ๋งŒ๋“ค์—ˆ๊ณ , [์—ฌ๊ธฐ](https://github.com/OpenProtein/openprotein/blob/master/preprocessing.py)์„œ ์ด์šฉ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. -# ํ•˜๋ฃจ ํ•œ์ค„ ๋ฒˆ์—ญ +### ์ธ์šฉ +์ธ์šฉ์€ [์—ฌ๊ธฐ](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-2932-0)์„œ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค.(BMC Bioinformatics ๋งํฌ์ž…๋‹ˆ๋‹ค) -๊ณต๋ถ€๋ชฉ์ ์œผ๋กœ ๋ฐฐ์šฐ๋Š” ๊นƒ ํŽ˜์ด์ง€ ๋ฒˆ์—ญ์ž…๋‹ˆ๋‹ค +### ๊ฐ์‚ฌ์˜ ๋ง +์ด๋ ‡๊ฒŒ ๋ฐ์ดํ„ฐ ์…‹์„ ๋งŒ๋“œ ์ˆ˜ ์žˆ์—ˆ๋˜ ๊ฒƒ์€ ์ „๋ถ€ [HMS Laboratory of Systems Pharmacology](http://hits.harvard.edu/the-program/laboratory-of-systems-pharmacology/about/), the [Harvard Program in Therapeutic Science](http://hits.harvard.edu/the-program/program-in-regulatory-science/about/), ๊ทธ๋ฆฌ๊ณ  [Harvard Medical School](https://hms.harvard.edu)์˜ the [Research Computing](https://rc.hms.harvard.edu) ๊ทธ๋ฃน ๋• ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  [Martin Steinegger](https://github.com/martin-steinegger)์™€ [Milot Mirdita](https://github.com/milot-mirdita)์—๊ฒŒ๋„ MMseqs2, HHblits software packages์— ๋Œ€ํ•œ ๋งŽ์€ ๋„์›€์— ์—ญ์‹œ ๊ฐ์‚ฌ๋ฅผ ํ‘œํ•ฉ๋‹ˆ๋‹ค. [Sergey Ovchinnikov](http://site.solab.org/)์—๊ฒŒ๋Š” metagenomic sequences ์ œ๊ณต์— ๋Œ€ํ•œ ๊ฐ์‚ฌ๋ฅผ ํ‘œํ•ฉ๋‹ˆ๋‹ค. [Andriy Kryshtafovych](http://predictioncenter.org/people/kryshtafovych/index.cgi)์—๊ฒŒ๋Š” CASP ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๋„์›€์— ๊ฐ์‚ฌ๋ฅผ ํ‘œํ•ฉ๋‹ˆ๋‹ค. ๋˜ [Sean Eddy](https://github.com/cryptogenomicon)์—๊ฒŒ๋Š” HMMer software package์— ๋Œ€ํ•œ ๋„์›€์„ ๋ฐ›์•„ ์ด์— ๊ฐ์‚ฌ๋ฅผ ํ‘œํ•ฉ๋‹ˆ๋‹ค. +์ด ๋ฐ์ดํ„ฐ ์…‹์€ ์ „๋ถ€ ํ•˜๋ฒ„๋“œ ๋Œ€ํ•™์˜ the [HMS Research Information Technology Solutions](https://rits.hms.harvard.edu) ๊ทธ๋ฃน์ด ์ฃผ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค. + +### ํŽ€๋”ฉ +์ด ํ”„๋กœ์ ํŠธ๋Š” NIGMS grant P50GM107618 and NCI grant U54-CA225088 # translate to korean by Bue-Von-hon hoping to be helpful....