Skip to content

Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier's Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species

License

Notifications You must be signed in to change notification settings

bioinformer/GC123e

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GC123eHM

Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier’s Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species

https://mathworld.wolfram.com/e.html

Significance of HM: Harmonic Mean of a Set of Entropy values: It can be Expressed as; the Ratio of "Mutual Information to the Complement of Normalized Variation of Information" ( Reference Link= https://stats.stackexchange.com/questions/393386/why-not-normalizing-mutual-information-with-harmonic-mean-of-entropies )

An Explanation of Triplet-Block Entropy concept using "Biased" Coin-Toss with Probability(P) of Heads(H) = (1/4) and P of Tails(T) = (3/4) with 2^3=8 Binary states 0=H,1=T. https://youtu.be/B3dVuP0Kzg0

BASE-paper for this Work (Source of GC-1% , GC-2% , GC-3% Datasets):= Dapeng Wang, GCevobase: an evolution-based database for GC content in eukaryotic genomes, Bioinformatics, Volume 34, Issue 12, 15 June 2018, Pages 2129–2131, https://doi.org/10.1093/bioinformatics/bty068 AND http:https://www.nextgenbioinformatics.org/GCevobase/

1st Author's Self-Citations in this Context and IEEE-Information Theory Society Profile of the Ongoing work= https://www.itsoc.org/profile/9590

Run the TCBShE pipeline Yourself! Please Exclude "$" in CLI: Command Line Interface, it is just the Shell-prompt.


$ git clone https://github.com/bioinformer/GC123e.git
Cloning into 'GC123e'...
remote: Enumerating objects: 271, done.
remote: Counting objects: 100% (271/271), done.
remote: Compressing objects: 100% (268/268), done.
remote: Total 271 (delta 146), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (271/271), 74.25 KiB | 1.95 MiB/s, done.
Resolving deltas: 100% (146/146), done.

$ cd GC123e/
$ chmod a+x 0_TCBShE_Run_PipeLine.sh 
$ bash 0_TCBShE_Run_PipeLine.sh 

Run the 9d "Rscript" One Line at a Time, preferably on Rstudio 4.0.4 [exclude '>' Prompt]

NOTE: In the Classic Example outlined in Youtube video above, Overall Triplet-Block Entropy= 2.45 > (3 X 0.81 = 2.43). Hence what we calculated is an Over-estimate. Similarly, this leads to our Bold Hypothesis that Actual/ Expected TCBShE after Error-correction= (e/3)*3 = NAPIER's constant (Upto 5 Decimal Places)= 2.71828 , which is what is the EXACT estimate.

If you wish to Peruse this Work, and its further Developments on GitHub (GC123e), please Cite the Original Research Works below:-

Praharshit Sharma, & Kuralayanapalya Puttahonnappa Suresh. (2021). Source-Code: Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier's Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species (v-2.71828). Zenodo. https://doi.org/10.5281/zenodo.5137183

Praharshit Sharma, Kuralayanapalya Puttahonnappa Suresh, Divakar Hemadri, Sharanagouda Patil, & Anirban Guha. (2021). PRE-PRINT: Triplet Codon Block Shannon Entropy (TCBShE) in terms of GC(1,2,3)% equates to Napier Constant for Model Organisms, and Harmonically Averages to same approximately: a Penta-Clado-genic Quantitative Survey across ~14.45 million Transcripts Clustered by 1118 Species (Pre-Print_Stage). Zenodo. https://doi.org/10.5281/zenodo.5195094