Skip to content

mostafa-ti/Bioinformatics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Dependencies:
Install python3, pip3

Install matplotlib.pyplot:
$ pip3 install matplotlib

Install python mpl_toolkits:
$ pip3 install matplotlib

install scipy:
$ pip3 install scipy

install numpy:
$ pip3 install numpy

Bioinformatics

Description:
This python script parses the genome based on two integer number which defines window size and number of steps, calculate coordinates of nodes for each window size based on Xn,Yn and Zn components and then construct a unique representation of the DNA sequence.

Definition of Z-curve components:
Xn is the distribution of Purines (A,G) and Pyrimidins(C,T) Xn = (An+Gn) - (Cn+Tn)

Yn is the distribution of Amino (A,C) and Keto (G,T)
Yn = (An+Cn) - (Gn+Tn)

Zn is the distribution of Weak hydrogen bonds (A,T) and Strong hydrogen bonds (C,G)
Zn = (An+Tn) - (Cn+Gn)

List of functions:
There are four defined functions used in this program.

  • readfasta function to read the fasta files.
  • frequency function to calculate the abondancy of each nucleotide.
  • slide_win function, this function parses the fasta file based on window size
    and step. Both window size and step should be defined by user.
  • Z_curve function to calculate the Z curve components based on the formula for the components X, Y and Z.

At the end The Z curves have been smoothed by using the interpolation module from the SciPy library in python.

Procedure:
Program name: Z_curve.py
fin: fasta_file.fna

Usage:
./Z_curve.py input.fna [integer window size] [integer step]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages