Unofficial Julia interface to the ViennaRNA library for RNA secondary structure prediction and analysis. Please cite the original ViennaRNA publications if you use this library.
Install ViennaRNA from the Julia package REPL, which can be accessed
by pressing ]
from the Julia REPL:
add ViennaRNA
using ViennaRNA, Unitful
The Unitful
library is needed to be able to specify units with
@u_str
, e.g. 4u"kcal/mol"
or 37u"°C"
. You can get the degree
symbol °
by typing \degree
and pressing the TAB key in the REPL or
in an editor with Julia syntax support.
The original C API functions can be found in the submodule
ViennaRNA.LibRNA
. Most functions can be called with a String
containing the RNA sequence instead of a FoldCompound
,
e.g. mfe("GGGAAACCC")
.
A FoldCompound
encapsulates nucleic acid strands and model details,
such as energy parameters, temperature, etc.
fc = FoldCompound("GGGGGAAAAACCCCCC";
options=[:mfe, :pf],
temperature=37u"°C",
uniq_ML=true,
circular=false)
Important keyword arguments
-
options
is a subset of[:default, :eval_only, :hybrid, :mfe, :pf, :window]
. -
temperature
is used to rescale the free energies with the formulaΔG = ΔH - TΔS
(the energy parameter sets contain enthalpy and entropy contributions). The default is37u"°C"
Model details (additional keyword arguments):
circular
: determines if the RNA strand is circular, i.e. the 5'-end and 3'-end are covalently bonded. Default isfalse
.dangles
: how to treat dangling base pairs in multiloops and the exterior loop. Can be 0, 1, 2, or 3. See ViennaRNA docs for details. Default is2
.gquadruplex
: allow G-quadruplexes in predictions. Default isfalse
.log_ML
: use logarithmic energy model for multiloops. Default isfalse
.max_bp_span
: maximum number of bases over which a basepair can span. Default value is-1
(which means unlimited).min_loop_length
: the minimum size of a loop (without the closing base pair). Default is3
.no_GU_basepairs
: disallow G-U basepairs. Default isfalse
.no_GU_closure
: disallow G-U basepairs as closing pairs for loops. Default isfalse
.no_lonely_pairs
: disallow isolated base pairs. Default isfalse
.special_hairpins
: use special hairpin energies for certain tri-, tetra- and hexloops. Default istrue
.uniq_ML
: use unique decomposition for multiloops, needed forsample_structures
andsubopt
. Default isfalse
.window_size
: window size to be used for local calculations performed in a window moving over the sequence. This value is ignored unless the:window
option is set in theFoldCompound
options
. The default value forwindow_size
is-1
.
ViennaRNA stores energy parameters in global variables after loading
them from a file. Each time a new FoldCompound
is created, the
parameters are copied from the global variables and saved inside the
FoldCompound
.
The global variables storing energy parameters can be changed by
calling a function specific to each parameter set, or via a Symbol
with ViennaRNA.params_load(:RNA_Turner1999)
. Subsequent calls to
FoldCompound
will use the new parameters and store a copy of the
parameters in the newly created FoldCompound
.
The default energy set loaded on startup is :RNA_Turner2004
.
ViennaRNA.params_load_defaults() # default is :RNA_Turner2004
ViennaRNA.params_load_DNA_Mathews1999()
ViennaRNA.params_load_DNA_Mathews2004()
ViennaRNA.params_load_RNA_Andronescu2007()
ViennaRNA.params_load_RNA_Langdon2018()
ViennaRNA.params_load_RNA_Turner1999()
ViennaRNA.params_load_RNA_Turner2004()
ViennaRNA.params_load_RNA_misc_special_hairpins()
ViennaRNA.params_load(:DNA_Mathews2004)
# Options are
# :DNA_Mathews1999, :DNA_Mathews2004,
# :RNA_Andronescu2007, :RNA_Langdon2018,
# :RNA_Turner1999, :RNA_Turner2004
Mutiple strands can be given by separating them with an &
, e.g.
FoldCompound("GGGG&CCCC")
.
Pass multiple sequences to the FoldCompound constructor for comparative mode (alifold):
FoldCompound(["GG-GAAAACCCC", "GCCGAAA-CGGC"])
.
It is currently not possible to have multiple strands in alifold mode.
# please excuse the excess precision printed when displaying -9.4 kcal/mol
mfe(fc) # => ("(((((.....))))).", -9.399999618530273 kcal mol^-1)
partfn(fc) # => ("(((((.....})))),", -9.81672180213034 kcal mol^-1)
energy(fc, "((((.......)))).") # => -6.199999809265137 kcal mol^-1
bpp(fc) # => 16×16 Matrix{Float64}
prob_of_structure(fc, "(((((.....))))).") # => 0.5085737925408758
ensemble_defect(fc, "(((((.....))))).") # => 0.33085374128228884
Sample from Boltzmann ensemble of secondary structures
sample_structures(fc) # => [ "((((......)))).." ]
sample_structures(fc; options=:nonredundant,
num_samples=20) # => 20-element Vector{String}
All suboptimal structures with energies delta above the MFE structure
subopt(fc; delta=4u"kcal/mol") # => Vector{Tuple{String, Quantity}}
Suboptimal structures with the method of Zuker
subopt_zuker(fc) # => Vector{Tuple{String, Quantity}}
mfe_window
saves all the results in a Vector
.
seq = "G"^50 * "A"^4 * "C"^50
mfe_window(seq; window_size=30)
fc = FoldCompound(seq; options=[:default, :window], window_size=30)
mfe_window(fc) # => Vector{ResultWindowMFE}
mfe_window_channel
returns a Channel
that can be used to
iteratively process the results.
seq = "G"^50 * "A"^4 * "C"^50
chan = mfe_window_channel(seq; window_size=30)
take!(chan)
fc = FoldCompound(seq; options=[:default, :window], window_size=30)
chan = mfe_window_channel(fc)
take!(chan)
Move set to reach neighboring structures of a given structure
neighbors(fc, Pairtable("((.....))")) # => Vector{Vector{Tuple{Int,Int}}}
bp_distance("....", "(())") # => 2
tree_edit_dist("(..)", "....") # => 4.0f0
Mean basepair distance of all structures to each other, weighted by the structure's Boltzmann probabilities
mean_bp_distance(fc) # => 5.266430215905888
Centroid structure of ensemble: structure with smallest sum of base-pair distances weighted by Boltzmann probabilities:
centroid(fc) # => ("(((((.....))))).", 4.799131457924728)
The gamma parameter trades off specificity (low gamma) and sensitivity (high gamma).
mea(fc; gamma=1.0) # => ("(((((.....))))).", 10.706348f0)
# starting temperature, end temperature, temperature increment
heat_capacity(fc, 10u"°C", 60u"°C") # => Vector{Tuple{Quantity,Quantity}}
heat_capacity(fc, 10u"°C", 60u"°C", 1u"°C"; mpoints=5)
# plot coordinates of a secondary structure, returns two arrays with
# x and y coordinates
plot_coords("(((...)))") # => Tuple{Float32[], Float32[]}
See PlotRNA.jl for more secondary structure plotting functionality.
inverse_fold("AAAAAAA", "((...))") a# => ("GCAAAGC", 2.0f0)
inverse_pf_fold("AAAAAAA", "((...))") # => ("GCCAAGC", 2.0244526863098145 kcal mol^-1)
ViennaRNA.init_rand_seed(42)
Energy parameters for modified bases can be used via ViennaRNA's soft constraints mechanism.
using ViennaRNA
fc = FoldCompound("AAACCCUUU")
partfn(fc) # -0.0025467473022687203 kcal mol^-1
ViennaRNA.sc_mod_pseudouridine!(fc, [7,8,9]) # modify positions 7, 8, 9
partfn(fc) # -0.004713416050703315 kcal mol^-1
These functions are currently available:
sc_mod_7DA!
sc_mod_dihydrouridine!
sc_mod_inosine!
sc_mod_m6A!
sc_mod_pseudouridine!
sc_mod_purine!
Please refer to the ViennaRNA section on modified bases for more details.
When creating many FoldCompound
s, running finalize
manually will
avoid excessive memory buildup.
for i = 1:100_000
fc = FoldCompound("ACGU")
# do something with fc
finalize(fc)
end