Skip to content
This repository has been archived by the owner on Dec 13, 2019. It is now read-only.

Commit

Permalink
Merge pull request #81 from DoctorBud/docs-site
Browse files Browse the repository at this point in the history
Prettier PhenoPackets.org website using GH Pages

I'm going to cowboy-merge this, since it just adds a /docs directory and doesn't change anything else.
  • Loading branch information
DoctorBud committed Oct 15, 2016
2 parents 9f6b2c0 + fb7c5f6 commit 692592f
Show file tree
Hide file tree
Showing 53 changed files with 12,996 additions and 0 deletions.
1 change: 1 addition & 0 deletions docs/CNAME
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
phenopackets.org
23 changes: 23 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
### About this `/docs` directory

The contents of this directory are deployed via GitHub Pages.

The file `index.html` is primarily boilerplate HTML, with most of the actual
web-visible content specified in Markdown files that are dynamically loaded and rendered in the browser.


#### Local Development

```
cd phenopacket-format/
http-server -c-1 docs/
```


### Credits

- https://commons.wikimedia.org/wiki/File:DNA_com_GGN.jpg

- [Start Bootstrap](http:https://startbootstrap.com/)

- [Stylish Portfolio](http:https://startbootstrap.com/template-overviews/stylish-portfolio/)
20 changes: 20 additions & 0 deletions docs/content/Community.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Community

Historically, successful standards evolve gradually over time. They are not designed in the abstract, springing fully-formed from committee, but rather are developed incrementally as they are taken into the field and proven to successfully meet real-world challenges. Level 1 of the PXF is intentionally simple in order to ease wide adoption of the standard and thereby increase the value of the network of systems (see Figure 1) aiming to share Phenopackets for computational use.
The requirements for such a standard are:
1. Computable. The standard must be both human and machine-interpretable, enabling computing operations and validation on the basis of defined relationships between diagnoses, lab measurements, genotypic information, and medications.
2. Transferable. The standard must enable seamless transfer of data from a data source (e.g., a document describing the phenotype) to a data receiver (e.g., an application that receives and uses it). The standard can have multiple serializations, such as tab-separated-values, XML, or JSON.
3. Utilize an ontology for phenotypes. The standard must enable “fuzzy matching”, that is, the use of algorithms that leverage the logic within an ontology to match sets of phenotypes that are related but not exact matches. This is currently mission critical for rare disease, and we believe will also greatly facilitate precision medicine.
Journals can aid use of the PXF standard by supporting data citation to Phenopackets, essentially a metadata record, which will be made available as a separate online document resolvable by a Digital Object Identifier (DOI)18. Phenopackets can be deposited in the journal, a public phenotype data repository such as the Monarch Initiative, or in generic data repositories such as FigShare. This approach ensures that the phenotype data described within a manuscript is made computable outside the pay-wall of journals, and can be cited within the original article via the DOI. The PXF has been adopted as a recommended or mandatory standard by journals including the CSH Molecular Case Studies, the Orphanet Journal of Rare Disease, and XXX.

It is hoped that public data repositories will begin to accept phenotype data provided in PXF. For example, the Monarch Initiative19 is already pulling Phenopacket data from the aforementioned journals and also provides an online editor tool for creating them. A variety of international efforts that aim to standardize genotype-phenotype data, such as the International Rare Diseases Research Consortium (IRDIRC) and the Global Alliance for Genomics and Health (GA4GH), support the use of this new PXF standard for sharing phenotypic data related to variant and other genomic health data.



# Discussion

In many ways, the phenotype data exchange community is in a position similar to that of the genetics community in the early days of public sequence databases. Although the content of sequence descriptions has changed over the years, this evolution is a sign of success, not failure. Early descriptions played key roles both in promoting the effective use of sequence data and in understanding how that data should be recorded and communicated. The Phenopacket standard proposed here is tailored to function in the context of rare disease, and for precision medicine in cancer and other common diseases. We are currently in an exciting position and the standardization and exchange of a broad range of phenotype data can trigger a new wave of advances in medical discovery and realize the goal of precision medicine. Further, patient-centered phenotyping approaches offer the opportunity, if not the necessity, for affected individuals and their families to be involved and integrated into the wider context that is the future of precision medicine.
The documentation and use of data for patients with challenging to diagnose rare and genetic conditions is different than for more common diseases. The realization of this vision and the phenotype exchange requirement described here will require substantial effort. Given the relative immaturity of existing efforts, further research and prototypes of data capture and exchange systems will be necessary to better understand the issues. Such explorations will likely be undertaken by ongoing research efforts, many which do not have the luxury of waiting for the completion of an emerging consensus model.



Binary file added docs/content/DiseasePhenotypeAssociation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 20 additions & 0 deletions docs/content/Documentation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# References
1. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–8 (2011).
2. Council, N. R. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. (2011). at <http:https://www.nap.edu/catalog/13284/toward-precision-medicine-building-a-knowledge-network-for-biomedical-research>
3. Zuk, O. et al. Searching for missing heritability: designing rare variant association studies. Proc. Natl. Acad. Sci. U. S. A. 111, E455–64 (2014).
4. Krawitz, P., Buske, O., Zhu, N., Brudno, M. & Robinson, P. N. The genomic birthday paradox: how much is enough? Hum. Mutat. 36, 989–97 (2015).
5. Robinson, P. N. Deep phenotyping for precision medicine. Hum. Mutat. 33, 777–80 (2012).
6. Köhler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–74 (2014).
7. Bayés, A. et al. Characterization of the proteome, diseases and evolution of the human postsynaptic density. Nat. Neurosci. 14, 19–21 (2011).
8. Robinson, P. N. et al. Improved exome prioritization of disease genes through cross-species phenotype comparison. Genome Res. 24, 340–8 (2014).
9. Singleton, M. V. et al. Phevor Combines Multiple Biomedical Ontologies for Accurate Identification of Disease-Causing Alleles in Single Individuals and Small Nuclear Families. Am. J. Hum. Genet. 94, 599–610 (2014).
10. Javed, A., Agrawal, S. & Ng, P. C. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat. Methods 11, 935–937 (2014).
11. Sifrim, A. et al. eXtasy: variant prioritization by genomic data fusion. Nat. Methods 10, 1083–4 (2013).
12. Soden, S. E. et al. Effectiveness of exome and genome sequencing guided by acuity of illness for diagnosis of neurodevelopmental disorders. Sci. Transl. Med. 6, 265ra168 (2014).
13. Consortium, R. E. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
14. Castellano, S. et al. Patterns of coding variation in the complete exomes of three Neandertals. Proc. Natl. Acad. Sci. U. S. A. 111, 6666–71 (2014).
15. Haendel, M. Why the Human Phenotype Ontology? 2015 at <http:https://monarch-initiative.blogspot.com/2015/05/why-human-phenotype-ontology.html>
16. Winnenburg, R. & Bodenreider, O. Coverage of Phenotypes in Standard Terminologies. in Phenotype Day, ISMB (2014). at <http:https://phenoday2014.bio-lark.org/pdf/5.pdf>
17. 100,000 genomes. at <http:https://www.genomicsengland.co.uk/the-100000-genomes-project/>
18. The Digital Object Identifier system.
19. Mungall, C. J. et al. Use of model organism and disease databases to support matchmaking for human disease gene discovery. Hum. Mutat. 36, 979–84 (2015).
15 changes: 15 additions & 0 deletions docs/content/Idea.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Conceptual Model and Motivation

The health of an individual organism results from a complex interplay between its genes and environment. Although great strides have been made in standardizing the representation of genetic information for exchange, there are no comparable standards to represent phenotypes (e.g. patient symptoms and disease features) and environmental factors. Phenotypic abnormalities of individual organisms are currently described in diverse places and in diverse formats: publications, databases, health records, registries, clinical trials, and even social media. However, the lack of standardization, accessibility, and computability among these contexts makes it extremely difficult to effectively extract and utilize these data, hindering the understanding of genetic and environmental contributions to disease.

While great strides have been made in exchange formats for sequence and variation data (e.g. Variant Call Format; VCF1), complementary standards for phenotypes and environment are urgently needed. For individuals with rare and undiagnosed diseases, such standards could improve the speed and accuracy of diagnosis. For patients with common but hard-to-treat diseases, such standards can help us design personalized interventions and learn more about shared disease mechanisms2.

The development of a clinical phenotype data exchange standard is both necessary and timely. It is necessary because study sizes of well over 100,000 patients are thought to be required to effectively assess the role of rare variation in common disease3 or to discover the genomic basis for a substantial portion of Mendelian diseases4. It is timely because studies of this power are now becoming financially and technologically tractable.

![](./phenopacket-ecosystem_2016-02-18a.png)

**Figure 1** *Phenopacket data exchange in the biomedical ecosystem. Multiple providers of phenotypic data include patients and clinicians, via a variety of mechanisms. Such Phenopackets can be created by a variety of tools and consumed by journals, databases, patient matchmaking services, EHR systems, and genomic analysis tools.*

Phenotypic abnormalities of individuals are currently described in diverse places in diverse formats: publications, databases, health records, and even in social media. We propose that these descriptions a) contain a minimum set of fields and b) get transmitted alongside genomic sequence data, such as in VCF, between clinics, authors, journals, and data repositories. The structure of the data in the exchange standard will be optimized for integration from these distributed contexts. The implementation of such a system will allow the sharing of phenotype data prospectively, as well as retrospectively. Increasing the volume of computable data across a diversity of systems will support large-scale computational disease analysis using the combined genotype and phenotype data.

The terms ‘disease’ and ‘phenotype’ are often conflated. Here we use ‘phenotype’ to refer to a phenotypic feature, such as hypoglycemia, that is the component of a disease, such as diabetes mellitus type II. The Phenotype Exchange Formalism (PXF) proposed here is designed to support “deep phenotyping”, a process wherein individual components of each phenotype are observed and documented5. The PXF requires the use of a common ontology, a logically defined hierarchy of terms, that allows sophisticated algorithmic analysis over medically relevant abnormalities. The Human Phenotype Ontology6 (HPO) was built for this purpose and has been used for genomic diagnostics, translational research, genomic matchmaking, and systems biology applications7–14. The HPO is developed in the context of the Monarch Initiative, an international team of computer scientists, clinicians, and biologists in the United States, Europe, and Australia; HPO is being translated into multiple languages to support international interoperability. Due to its extensive phenotypic coverage beyond other terminologies15,16, HPO has recently been integrated into the Unified Medical Language System (UMLS) to support deep phenotyping in a variety of mainstream health care IT systems.
Binary file added docs/content/SubjectObjectAssociation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
172 changes: 172 additions & 0 deletions docs/content/Technology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,172 @@
# Technology - Standards and Implementation

The online supplementary material to this article presents the version 1.0 of the PXF standard proposed in this article. The format defines the required information expected to be transmitted about each individual – aka a “Phenopacket”; it includes items such as as patient identifier (non-PHI), age or age group, sex, and a list of one or more phenotypic abnormalities represented by ontology terms. The use of HPO is recommended but if not possible, an alternative terminology as represented in the International Committee of Human Phenotype Terminologies (ICPHT), an activity of the International Rare Disease Research Consortium (IRDiRC), is acceptable. Figure 1 provides a summary of the Phenopacket exchange ecosystem, and the online supplement provides concrete examples of PXF encoded in several exchange formats such as XML, JSON, and RDF. Note that PXF is designed to be compatible with a variety of rare disease phenotyping efforts, such as 100,000 genomes17.

### What is in a Phenopacket?

Achieving a functional and community-adopted PXF standard will require addressing several critical requirements, which are only partially fulfilled by this first release of Level 1 of the standard. Here we detail the basic level 1 components, and urge the community to participate in helping extend and evaluate the PXF:

Content Description Example
Phenotypes Representation of phenotypic features using an ontology term with a resolvable and versionable identifier. http:https://purl.obolibrary.org/obo/HP_0001943
‘Hypoglycemia’ Defined as: “A decreased concentration of glucose in the blood.”

### Age of onset
Each phenotype can be indicated the exact age, or age range, for which the phenotype first manifested. ISO year and month standards should be followed, or the use of ontology terms from HPO for age ranges are recommended.

`P43Y08M`
or
`Adult onset (HP:0003581)`

### Negation of phenotypes

Notable absence of a particular phenotype or phenotypic class.
`NOT Aortic regurgitation (HP:0001659)`

### Genomic data
Able to link to a VCF file, describing the patient’s genomic variants, or HGVS notation

### Family history
Able to link to a PED file, describing familial linkage to other PCF files.

### Quantitative specification
Quantitative phenotypes expressed in relative terms should be accompanied with reference population and values. A
bility to transmit not only qualitative ontology terms such as “Hyperglycemia” but also specific values such as “blood glucose 178 mg/dl”

### Evidence
Any of the above elements may be linked to one or more evidence assertions.
Evidence could include items of the following:
* EHR record numbers
* published papers
* functional assays
* computational models
* population studies
* clinical trials


# Tutorial


Phenopackets can be encoded in either JSON or YAML, there is no
difference between the two. We will use YAML here for compactness.

Our example involves a case study with three people (phenopackets can
be used to describe other kinds of entities such as variants, examples
on these cases will follow).

We list all people inside a `persons` block:

```yaml
persons:
- id: "#1"
date_of_birth: 1999-01-01
sex: M
- id: "#2"
sex: M
- id: "#3"
sex: M
```

Note that in YAML, a `-` denotes an element in a list. The value of the `persons`
property is always a list.

Here we are providing a DOB for the first person, and biological sexes
for all persons.

Note the identifiers used. There are strict rules on the structure of
identifiers used in phenopackets, and on the rules for mapping these
to real-world entities. We will return to these in more detail
later. In this particular case we are using hash identifiers; we use
these when the identifiers are local to the packet and are not
intended to be referenced from outside. If we had a global identifier
for the person, we could use this instead.

Next we will describe the conditions for these persons. In
phenopackets, there are two distinct types of conditions: phenotypes
and diseases.

We first list any disease diagnoses. We only have one, for person
number 1:

```yaml
disease_diagnoses:
- entity: "#1"
disease_occurrence:
types:
- id: OMIM:615426
label: amyotrophic lateral sclerosis type 20
```

Note that the diseases block is a separate block from the persons
block. We refer back to individuals in the block using the id, rather
than nesting the disease inside the person block. This allows for more
flexibility in how persons and diagnoses are exchanged as messages.

The value for `types` is a list. Although there will
typically only be one disease here, there are reasons for having a
uniform list representation, which we will return to later.

We have not specified any diseases for person 2 and 3. Although we
might assume these are not disease carriers, this is not explicitly
stated and cannot be known for sure. We will return to how to make
negative assertions later on.

Next up is phenotype associations. Let's start with a two phenotypes
for person 1:

```yaml
phenotype_profile:
- entity: "#1"
phenotype:
types:
- id: HP:0003560
label: Muscular dystrophy
- entity: "#1"
phenotype:
types:
- id: HP:0007354
label: Amyotrophic lateral sclerosis
```

Each element of the list is a phenotype *association*. The concept of
an association is a recurring feature of the phenopacket format.

Although the structure may appear overly nested here, this because we
use a highly normalized model that allows maximal hooks for
extensibility. For example, we can add `onset` into the phenotype
object, where onset is described either with an ontology term or with
a quantitative range. We can also attach a natural language
description to the phenotype, to complement or extend the ontological
one.

Similarly, the association itself can have evidence and additional
provenance and audit information associated with it. This is shown in
the following example:

```yaml
phenotype_profile:
- entity: "#1"
phenotype:
types:
- id: HP:0003560
label: Muscular dystrophy
onset:
types:
- id: HP:0003584
label: Late onset
description: additional notes on this phenotype here
evidence:
- types:
id: TAS
source:
id: PMID:23455423
title: Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS
- entity: "#1"
phenotype:
types:
- id: HP:0007354
label: Amyotrophic lateral sclerosis
```

The example can be visualized as:

![](./person-example.png)
Binary file added docs/content/partners/logo-biolark.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-do.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-force11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-globalalliance.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-go.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-hpo.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-irdirc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-jackson.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-lbnl.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-mgi.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-monarch.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-mygene.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/partners/logo-orphanet.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/content/person-example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 692592f

Please sign in to comment.