Skip to content

Commit

Permalink
small requirements document for networKit
Browse files Browse the repository at this point in the history
  • Loading branch information
patflick committed Nov 11, 2013
1 parent 2ecfe7d commit 9a275cb
Show file tree
Hide file tree
Showing 3 changed files with 151 additions and 0 deletions.
9 changes: 9 additions & 0 deletions docs/NetworKit_reqs/biblio.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

@INPROCEEDINGS{ChanWang:functional_modules,
author={Gang Chen and Jianxin Wang},
booktitle={Bioinformatics and Biomedicine Workshops (BIBMW), 2012 IEEE International Conference on},
title={Identifying functional modules in tissue specific protein interaction network},
year={2012},
pages={581-586},
keywords={biochemistry;biological tissues;biology computing;genetics;genomics;molecular biophysics;proteins;CFinder clustering algorithm;GO enrichment analysis;cell cycle;computational biologists;network-based identification;original PPI network;protein functional modules;static protein interaction network;tissue specific gene expression data;topological analysis;virtual tissue specific protein interaction network;Biological tissues;Clustering algorithms;Gene expression;Humans;Protein engineering;Proteins;Biological Network;Clustering;Protein Interaction;Tissue Specificity},
doi={10.1109/BIBMW.2012.6470204},}
8 changes: 8 additions & 0 deletions docs/NetworKit_reqs/makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
all: reqs.pdf


reqs.pdf: reqs.tex biblio.bib
pdflatex reqs.tex
bibtex reqs.aux
pdflatex reqs.tex
pdflatex reqs.tex
134 changes: 134 additions & 0 deletions docs/NetworKit_reqs/reqs.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
\documentclass{article}
\title{Analysis of tissue/cell type specific protein-protein interaction networks}
\author{Patrick Flick}

\begin{document}

\maketitle

\section{Short introduction}

\subsection{A very short introduction to protein-protein interaction networks}

Proteins are large macromolecules which can perform a variety of
biological and chemical functions.
In order to fulfill their function, proteins bind to other substances
(molecules, ions, DNA, etc.) or to other proteins.

A protein-protein interaction network (short \emph{PPI} network) is a graph
where each protein is represented by a node and each interaction
between two proteins is represented by an edge.

A number of databases of these protein-protein interactions are available,
most of which are publicly accessible. The source of the interaction
information comes from very different experiments. Some PPI networks
are the result of high-throughput lab experiments, which test whether
the proteins interact in vivo or in vitro. Other PPI networks are
literature curated (taken from many small scale experiments and literature
based knowledge), while yet others are purely computational predicted
interactions. A few PPIs exists, where a number of available PPI networks
are combined and each interaction is labeled with a reliability score
depending on the source (and number of agreeing sources) of the
interactions.


\subsection{A very short introduction to protein expression}

The human organism is made up of many different cell and tissue types,
for example skin cells, liver cells, neural cells etc.
Not all proteins exist in all cell and tissue types. On the contrary,
many proteins have specific functions and only exists in one class
of cells or tissues. A protein is called \emph{expressed} in a certain cell
or tissue type if it is present in that cell or tissue type. Accordingly, a
protein is called \emph{non-expressed} in a certain cell or tissue type
if it does not exist in that cell or tissue type.

A protein expression database (of which there are many different kinds)
supplies this kind of information. For each protein and tissue or cell type
such a database holds an expression value, which is mostly supplied as
a data source specific expression level. This expression level is then
classified into either \emph{expressed} or \emph{non-expressed}.

The resulting binary (expressed vs non-expressed) expression pattern
combined with a PPI network, implicitly defines sub-networks/sub-graphs
of the \emph{global} PPI network.


\subsection{A formal definition}

Let $\mathcal{P}$ be the set of all proteins with
$\left| \mathcal{P} \right| = n$. A PPI network is the graph $G=(V,E)$ with
$V = \mathcal{P}$ and the edges $E$ given by the protein interactions,
i.e. $(p_i, p_j) \in E \Leftrightarrow p_i$ interacts with $p_j$.

In order to formalize the expression data, each protein $p \in \mathcal{P}$
gets associated with a binary expression vector
$expr(p) = (e_0, e_1, e_2, \ldots, e_{k-1})$ where $e_t \in \{0,1\}$
and $k$ is the number of tissues or cell types in the current expression data
set, and $e_t = 1 \Leftrightarrow $ the protein $p$ is expressed in the
tissue/cell type $t$.

The tissue/cell type specific PPI network $G_t$ of the tissue/cell type $t$ is
defined as the graph $G_t = (V_t, E_t)$, where
$p_i \in V_t \Leftrightarrow expr(p_i) = 1$
and
$(p_i, p_j) \in E_t \Leftrightarrow expr(p_i) = 1 \wedge expr(p_j) = 1$.
I.e. a protein is part of the tissue/cell type specific PPI network if it is
expressed in that tissue/cell type and an edge is part of the specific PPI
if both interacting proteins are expressed in that tissue/cell type.
In other words, nodes that represent non-expressed proteins are deleted
from the graph.


\section{Adaptions/optimizations to network analysis}

\subsection{Network/Graph analysis}

In order to answer biological questions and to find new insights,
network analysis and clustering algorithms are used on the global
and tissue/cell type specific networks.

A straightforward way of doing this is to generate a subgraph from the global
graph for each tissue/cell type and analyze this graph.

A more efficient approach would be to use the binary expression profile
($expr(p) = (e_0, e_1, e_2, \ldots, e_{k-1})$) as node labels and adapt
the analysis algorithms to use those labels to analyse the all subgraphs
concurrently. This for one saves the generation of all subgraphs and can
reduce the total number of iterations through the graph that are needed.

\subsection{Clustering}

In order to identify modules/communities of proteins in the PPI network,
graph clustering and community detection can be used. Previous research
\cite{ChanWang:functional_modules}
has shown that \emph{clique percolation clustering} finds more functionally
related (verified via GO-terms) clusters in the tissue/cell type specific
network in comparison to the global network.

Instead of clustering only in the specific networks, it might be interesting
to cluster in the global network but with regard to the proteins expression
patterns (the binary vector). This means that proteins with similar expression
pattern are clustered together with higher priority than proteins with
much different expression patterns. The exact definition of what \emph{similar}
and \emph{higher priority} can mean in this context has yet to be made/developed.

The graph analysis and clustering tool \emph{NetworKit} could be a
good starting point for such adaptions.

The output of the clustering algorithm should be small local clusters, that
are in themselves highly connected and expressed similarly. Hopefully these
clusters are more functionally related than the results of clustering
in the global network and in the specific PPI networks.


%\subsubsection{Possible co-expression scores}
%
%So far I have come up with the following possibilities to ``score''
%how similar two proteins are expressed (i.e. their co-expression).
%

\bibliographystyle{plain}
\bibliography{biblio}

\end{document}

0 comments on commit 9a275cb

Please sign in to comment.