Skip to content

Commit

Permalink
add statistical measurement function
Browse files Browse the repository at this point in the history
  • Loading branch information
wangqinhu committed Jul 2, 2014
1 parent 8aeaf26 commit a901e7b
Show file tree
Hide file tree
Showing 3 changed files with 159 additions and 10 deletions.
Binary file modified doc/manual.pdf
Binary file not shown.
21 changes: 13 additions & 8 deletions doc/src/manual.tex
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ \section{How to install}

\subsection{Dependencies}

tsRFinder depended on the following programmes, please check and install the following \footnote{The dependency versions were based on the oldest test enviroment we have.} at first:
tsRFinder depends on the following programmes, please check and install them \footnote{The dependency versions were based on the oldest test enviroment we have.} at first:


\begin{itemize}
Expand All @@ -39,7 +39,7 @@ \subsection{Dependencies}
\item tRNAscan-SE, greater than v1.3.1, required, for tRNA prediction. \\https://lowelab.ucsc.edu/tRNAscan-SE
\item bowtie, greater than v1.0.0, required, for small RNA mapping. \\https://github.com/BenLangmead/bowtie
\item R, greater than v2.15.2, required, for small RNA data analysis and illustration. \\https://www.r-project.org
\item fastx\_toolkit, greater than v0.0.14, optional if you have already processed the raw sequencing data yourself. \\http:https://hannonlab.cshl.edu/fastx\_toolkit/
\item fastx\_toolkit, greater than v0.0.14, optional if you have already processed the raw sequencing data yourself. \\https:https://github.com/agordon/fastx\_toolkit

\end{itemize}

Expand All @@ -62,7 +62,7 @@ \subsection{Installation}

and then unzip master file.

When tsRFinder is cloned or unpacked, move the entire directory to an proper place and add the tsRFinder path to the environment. For example, if tsRFinder is placed in /Users/wangqinhu/tsRFinder, then typing following in the terminal if you are using bash.
When tsRFinder is cloned or unpacked, move the entire directory to an proper place and add the tsRFinder path to the environment. For example, if tsRFinder is placed in /your/path/of/tsRFinder, then type the following in the terminal if you are using bash.

{\small \begin{verbatim}
echo export PATH="/your/path/of/tsRFinder:$PATH" >> ~/.bashrc
Expand All @@ -84,7 +84,7 @@ \subsection{Running the pipeline}

tsRFinder supplies two ways for inputs, you can use both configuration file or command line option. We recommend you use a command line option for debugging and building your configuration file. Once your inputs had been determined, you can write it to a configuration file for your analysis.

If tsRFinder is properly installed, you can run tsRFinder from your terminal directly, see the usage of tsRFinder.
If tsRFinder is properly installed, you can run tsRFinder from your terminal directly, see the usage below.

{\small \begin{verbatim}
tsRFinder usage:
Expand All @@ -97,8 +97,8 @@ \subsection{Running the pipeline}
-t Reference tRNA sequence
-s Small RNA sequence
-a Adaptor sequence
-n min read length
-x max read length
-n Min read length
-x Max read length
-h Help
-v Version
Expand Down Expand Up @@ -189,14 +189,19 @@ \subsection{Demo output}
distribution : /Users/wangqinhu/tsRFinder/Abc/distribution.pdf
stat. by BDI :
Sensitivity : 0.940025252525252
Specificity : 0.783809523809524
Accuracy : 0.876447574334898
---------
\end{verbatim}
}

\begin{figure}[htbp]
\begin{center}
\includegraphics[width=12cm]{distribution.pdf}
\caption{Screenshot of color tmap}
\caption{Small RNA and tRNA reads distribution}
\label{distribution}
\end{center}
\end{figure}
Expand All @@ -223,7 +228,7 @@ \subsection{Visualization of tmap data}
\end{center}
\end{figure}

If you preferred a plain text view without highlighting, just open tsRNA.tmap with any kind of text editors you have.
If you prefer plain text view without highlighting, just open tsRNA.tmap with any kind of text editors you have.

\section{FAQ}

Expand Down
148 changes: 146 additions & 2 deletions tsRFinder.pl
Original file line number Diff line number Diff line change
Expand Up @@ -981,6 +981,9 @@ sub write_report {
# sRNA/tRNA distribution
print_log("distribution : $tsR_dir/$label/distribution.pdf\n");

# statistical measurement
stat_index();

print_log("---------");

}
Expand Down Expand Up @@ -1046,6 +1049,147 @@ sub srna_len_stat {

}

sub stat_index {

my ($sm_hash) = statistical_measure("$tsR_dir/$label/tsRNA.tmap");

my @sen = ();
my @spe = ();
my @acc = ();
my $i = 0;

foreach (keys $sm_hash) {
my $stat = $sm_hash->{$_};
my ($tp, $fn, $fp, $tn) = split /\t/, $stat;
$sen[$i] = sensitivity($tp, $fn);
$spe[$i] = specificity($tn, $fp);
$acc[$i] = accuracy($tp, $fn, $fp, $tn);
$i++;
}

my $sen = mean(@sen);
my $spe = mean(@spe);
my $acc = mean(@acc);

print_log("stat. by BDI :");
print_log(" Sensitivity : $sen");
print_log(" Specificity : $spe");
print_log(" Accuracy : $acc\n");

}

# Math: mean
sub mean {

my @num = @_;

my $mean = 0;
foreach (@num) {
$mean += $_;
}
$mean /= @num;

return $mean;

}

# True postive rate
sub sensitivity {

my ($tp, $fn) = @_;
my $sen = $tp / ($tp + $fn);
return $sen;

}

# True nagtive rate
sub specificity {

my ($tn, $fp) = @_;
my $spe = $tn / ($fp + $tn);
return $spe;

}

# Prediction accuracy
sub accuracy {

my ($tp, $fn, $fp, $tn) = @_;
my $acc = ($tp + $tn) / ($tp + $fn + $fp + $tn);
return $acc;

}

# Caculate ture/false postive/negative
sub statistical_measure {

my ( $tmap ) = @_;

my %tmap = ();
my $id = undef;
my ($tp, $fn, $fp, $tn) = (0, 0, 0, 0);

open (IN, $tmap) or die "Cannot open file $tmap: $!\n";
while (<IN>) {
if ( /^\<tmap/ ) {
$id = <IN>;
chomp $id;
my ($discard, $id) = split /\t/, $id;
my $dicard = <IN>;
my $seq1 = <IN>;
my $seq2 = undef;
my $bdi = undef;
my $next = <IN>;
if ( $next =~ /\*/) {
$seq2 = $next;
$bdi = <IN>;
} else {
$bdi = $next;
}
($seq1, $discard) = split /\t/, $seq1;
if (defined($seq2)) {
($seq2, $discard) = split /\t/, $seq2;
}
($bdi, $discard) = split /\t/, $bdi;
for (my $i = 0; $i< length($seq1); $i++) {
my $cbdi = substr($bdi,$i,1);
# if bdi > 1
if ( $cbdi >= 1 ) {
# if has tsRNA base => true postive
if (substr($seq1,$i,1) ne "*" ) {
$tp++;
} elsif (defined($seq2) && substr($seq2,$i,1) ne "*") {
$tp++;
# if has no tsRNA base => false negative
} else {
$fn++;
}
# if bdi = 0
} else {
# if has tsRNA base => false postive
if (substr($seq1,$i,1) ne "*" ) {
$fp++;
} elsif (defined($seq2) && substr($seq2,$i,1) ne "*") {
$fp++;
# if has no tsRNA base => true negative
} else {
$tn++;
}
}

#reset current bdi
$cbdi = 0;
}
$tmap{$id} = "$tp\t$fn\t$fp\t$tn";
# reset vars
($tp, $fn, $fp, $tn) = (0, 0, 0, 0);
}
}

return (\%tmap);

}

# usage
sub usage {

Expand All @@ -1061,8 +1205,8 @@ sub usage {
-t Reference tRNA sequence
-s Small RNA sequence
-a Adaptor sequence
-n min read length
-x max read length
-n Min read length
-x Max read length
-h Help
-v Version
Expand Down

0 comments on commit a901e7b

Please sign in to comment.