Skip to content

human genome on tokyo cabinet for sub-sequence extraction

Notifications You must be signed in to change notification settings

nakao/tc-subseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

= tc-subseq =

Human genome sequence on tokyo cabinet for fast sub-sequence exstraction

== Usage ==

# Download chromFa.zip from UCSC.
rake data:hg

# Create TCF db from chr*.fa
rake tcf:create

# Check data on chr1.tcf TCF db
rake tcf:demo

# Extract sub-sequence specified by chromosome:START,STOP
ruby tcf_subseq.rb chr1:100010,100100
GTGTTGAGCTTAGTAAGTCACCAAACACCTTCTGCTCAGCAGCATAAAGGACATTTCCATGAAACCTCCCAGGGATAATCTTATTTACTCT


== Installation ==

rake test:subseq
rake data:hg
rake tcf:all

rake benchmark:tcf 
rake benchmark:chr1


== Description ==


== Statistics ==

* Size of nucleotide: 
* Size of fasta files: 3098800kb
* Size of tsv files:   3357404kb
* Size of index files:  3099084kb

 239M chr1.fa
 275M chr1.tsv
 239M chr1.tcf


* Benchmark

** TCFDB

$ rake benchmark:tcf:length -s
                       user     system      total        real
TCF   1 nt/440:    0.010000   0.020000   0.030000 (  0.045681)
TCF  10 nt/440:    0.020000   0.030000   0.050000 (  0.046877)
TCF 100 nt/440:    0.040000   0.030000   0.070000 (  0.071411)
TC F 1k nt/440:    0.080000   0.030000   0.110000 (  0.110665)
TCF 10k nt/440:    0.370000   0.030000   0.400000 (  0.423921)


$ rake benchmark:tcf:chr1 -s
                   user     system      total        real
14928 gets:    1.520000   0.820000   2.340000 (  2.369265)


** TCHDB

$ rake benchmark:tch:length -s
                       user     system      total        real
TCH   1 nt/440:    0.020000   0.030000   0.050000 (  0.079173)
TCH  10 nt/440:    0.030000   0.040000   0.070000 (  0.075760)
TCH 100 nt/440:    0.050000   0.040000   0.090000 (  0.105880)
TCH  1k nt/440:    0.100000   0.040000   0.140000 (  0.152413)
TCH 10k nt/440:    0.430000   0.070000   0.500000 (  0.531710)


$ rake benchmark:tch:chr1 -s
                   user     system      total        real
14928 gets:    1.970000   1.310000   3.280000 (  3.647801)



== Dependency ==

1. Ruby https://ruby-lang.org
2. Tokyo Cabinet https://tokyocabinet.sourceforge.net/
2. Tokyo Cabinet Ruby https://tokyocabinet.sourceforge.net/perlpkg/
3. curl
4. unzip

== Author ==

Mitsuteru Nakao <n AT bioruby.org>

== Licence ==

Same as Tokyo Cabinet, LGPL.

== Acknowledgement ==

The author would like to thank Dr. Itoshi Nikaido for his valuable comments.

About

human genome on tokyo cabinet for sub-sequence extraction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages