Skip to content

Commit

Permalink
small typo
Browse files Browse the repository at this point in the history
  • Loading branch information
jacobjinkelly committed Aug 19, 2020
1 parent 7500620 commit 089fb14
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/proteinnet_records.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ ProteinNet is comprised of ProteinNet Records which can be used to train machine
* Tertiary Structure
* Mask

**Sequences** are the primary amino acid chains that constitue a protein. They are represented by a string of characters with an alphabet size of 20. Our standard [parser](../code/tf_parser.py) converts this into a variable-length tensor comprised of 20-dimensional one-hot vectors; one dimension per amino acid, ordered alphabetically.
**Sequences** are the primary amino acid chains that constitute a protein. They are represented by a string of characters with an alphabet size of 20. Our standard [parser](../code/tf_parser.py) converts this into a variable-length tensor comprised of 20-dimensional one-hot vectors; one dimension per amino acid, ordered alphabetically.

**PSSMs**, a.k.a. [position-specific scoring matrices](https://en.wikipedia.org/wiki/Position_weight_matrix), summarize the propensity of each residue position along the protein chain to mutate to other amino acids. They are represented by a sequence of real-valued 20-dimensional vectors (one dimension for each amino acid, ordered alphabetically), normalized to range in value between 0 and 1. An additional dimension, corresponding to the information content of a residue, is concatenated with each vector to bring the total dimensionality to 21. We will provide multiple types of PSSMs, but this preliminary release of ProteinNet contains PSSMs derived using [JackHMMer](http:https://hmmer.org) from UniParc and metagenomic sequences.

Expand Down

0 comments on commit 089fb14

Please sign in to comment.