Skip to content

Commit

Permalink
Merge pull request #25 from jacobjinkelly/master
Browse files Browse the repository at this point in the history
small typo
  • Loading branch information
alquraishi committed Aug 19, 2020
2 parents 7500620 + 089fb14 commit ba0dc92
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/proteinnet_records.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ProteinNet is comprised of ProteinNet Records which can be used to train machine
* Tertiary Structure
* Mask

**Sequences** are the primary amino acid chains that constitue a protein. They are represented by a string of characters with an alphabet size of 20. Our standard [parser](../code/tf_parser.py) converts this into a variable-length tensor comprised of 20-dimensional one-hot vectors; one dimension per amino acid, ordered alphabetically.
**Sequences** are the primary amino acid chains that constitute a protein. They are represented by a string of characters with an alphabet size of 20. Our standard [parser](../code/tf_parser.py) converts this into a variable-length tensor comprised of 20-dimensional one-hot vectors; one dimension per amino acid, ordered alphabetically.

**PSSMs**, a.k.a. [position-specific scoring matrices](https://en.wikipedia.org/wiki/Position_weight_matrix), summarize the propensity of each residue position along the protein chain to mutate to other amino acids. They are represented by a sequence of real-valued 20-dimensional vectors (one dimension for each amino acid, ordered alphabetically), normalized to range in value between 0 and 1. An additional dimension, corresponding to the information content of a residue, is concatenated with each vector to bring the total dimensionality to 21. We will provide multiple types of PSSMs, but this preliminary release of ProteinNet contains PSSMs derived using [JackHMMer](http:https://hmmer.org) from UniParc and metagenomic sequences.

Expand Down

0 comments on commit ba0dc92

Please sign in to comment.