Skip to content

Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments (Pfam, Rfam, etc)

License

Notifications You must be signed in to change notification settings

marcom/BioStockholm.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioStockholm.jl

Build Status Aqua QA

Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments of protein, RNA, or DNA sequences (Pfam, Rfam, etc databases). This package uses Automa.jl under the hood to generate a finite state machine parser.

Installation

Enter the package mode from the Julia REPL by pressing ], then install with:

add BioStockholm

Usage

using BioStockholm

msa = MSA{Char}(;
    seq = Dict("human"   => "ACACGCGAAA.GCGCAA.CAAACGUGCACGG",
               "chimp"   => "GAAUGUGAAAAACACCA.CUCUUGAGGACCU",
               "bigfoot" => "UUGAG.UUCG..CUCGUUUUCUCGAGUACAC"),
     GC = Dict("SS_cons" => "...<<<.....>>>....<<....>>.....")
)

# read from file
# example2.sto contains an example Stockholm file
msa_path = joinpath(dirname(pathof(BioStockholm)), "..",
                    "test", "example2.sto")
msa_str = read(msa_path, String)
print(msa_str)

# read from a file or parse from a String
msa = read(msa_path, MSA)
msa = parse(MSA, msa_str)

# write to a file
write("foobar.sto", msa)

# pretty-print
print(msa)
print(stdout, msa)

Limitations / TODO

  • when writing, long sequences or text is never split over multiple lines
  • integrate with BioJulia string types

Related packages

MIToS.jl is a package for analysing protein sequences that also supports parsing the Stockholm format (and many more things).

About

Julia parser for the Stockholm file format (.sto) used for multiple sequence alignments (Pfam, Rfam, etc)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages