% -*- mode: noweb; noweb-default-code-mode: R-mode; -*-
%\VignetteIndexEntry{1. MMG primer}
%\VignetteKeywords{MMG}
%\VignettePackage{MMG}
\documentclass[a4paper]{article}

\IfFileExists{lmodern.sty}{%
  \usepackage[T1]{fontenc}
  \usepackage{lmodern}
}{}
\IfFileExists{geometry.sty}{%
  \usepackage{geometry}
}{}
\IfFileExists{hyperref.sty}{%
  \usepackage{hyperref}
}{%
  \newcommand*{\url}[1]{\texttt{#1}}
}

\newcommand*{\code}[1]{\texttt{#1}}

\author{J.\ Noirel, G.\ Sanguinetti, and P.\ C.\ Wright\\
        Chemical and Process Engineering --- Computer Science\\
        University of Sheffield, United Kingdom}
\title {Description of the package MMG\\
        (version 1.2.2)}
\year = 2008 \month = 5 \day = 20

\usepackage{/Library/Frameworks/R.framework/Resources/share/texmf/Sweave}
\begin{document}

\maketitle

\tableofcontents

\section{Introduction}

The \code{MMG} package is part of the BioConductor\footnote{\url{http://www.bioconductor.org/}} project.

\medbreak

\begin{list}{}{}
  \item[\itshape Abstract of the orginal paper]
A fundamental task in systems biology is the identification of groups of genes that are involved in the cellular response to particular signals. At its simplest level, this often reduces to identifying biological quantities (mRNA abundance, enzyme concentrations, etc.) which are differentially expressed in two different conditions. Popular approaches involve using t-test statistics, based on modelling the data as arising from a mixture distribution. A common assumption of these approaches is that the data are independent and identically distributed; however, biological quantities are usually related through a complex (weighted) network of interactions, and often the more pertinent question is which subnetworks are differentially expressed, rather than which genes. Furthermore, in many interesting cases (such as high-throughput proteomics and metabolomics), only very partial observations are available, resulting in the need for efficient imputation techniques.

We introduce Mixture Model on Graphs (MMG), a novel probabilistic model to identify differentially expressed submodules of biological networks and pathways. The method can easily incorporate information about weights in the network, is robust against missing data and can be easily generalized to directed networks. We propose an efficient sampling strategy to infer posterior probabilities of differential expression, as well as posterior probabilities over the model parameters. We assess our method on artificial data demonstrating significant improvements over standard mixture model clustering. Analysis of our model results on quantitative high-throughput proteomic data leads to the identification of biologically significant subnetworks, as well as the prediction of the expression level of a number of enzymes, some of which are then verified experimentally.
\end{list}

\medbreak

MMG relies on a~Gibbs sampler to evaluate the posterior probability of certain genes to be down-regulated, up-regulated or merely unchanged depending on their location in the metabolic network and proteomic measurements.  (Other networks and other sources of relative measurement may be considered.)  This package furthermore implements a~function to retrieve subnetworks that seem to behave consistently (which are overall down-regulated or up-regulated, for instance).

\bigbreak

\noindent \textbf{Note:} If you use this package please cite
%
\begin{list}{}{}
\item
  Sanguinetti,~G.\ \emph{et~al.},
  ``MMG: a probabilistic tool to identify submodules of metabolic pathways'',
  \emph{Bioinformatics}, \textbf{8}, pp.~1078-1084, 2008.
\end{list}

\section{Quick start}

There are three important functions:
%
\begin{description}
\item[MMG.compute]   Runs the Gibbs sampler.
\item[MMG.cut.graph] Identify parts of the network that behave consistently.
\item[MMG.make.dot]  Produces a DOT~file in order to visualise the result of \code{MMG.cut.grahp}
\end{description}

The inputs are as follows:
%
\begin{itemize}
\item \code{MMG.compute} uses an~input file that described the network of interest and the measurements that are available.  The network may be directed.  The $k$th line must resemble:
%
\begin{list}{}{}
\item
  $k$\qquad $m_k$\qquad $n_{k1}$\ $w_{k1}$\qquad
  $n_{k2}$\ $w_{k2}$\qquad \dots\qquad
  $n_{kN}$\ $w_{kN}$
\end{list}

The interpretation of the line is the following:
%
\begin{itemize}
\item $k$ is provided for legibility's sake (it could be any number),
\item $m_k$ is the relative measurement ($\log_2$-ratio, or \code{NA} when not available),
\item there are $N$ neighbours within the network, $n_{ki}$ ($1 \leq i\leq N$),
\item the weight of the connexion $i \to k$ is $w_{ki}$ ($1 \leq i\leq N$) and must be positive.
\end{itemize}
\end{itemize}

\subsection{Calling the package}

\begin{Schunk}
\begin{Sinput}
> library("MMG")
\end{Sinput}
\end{Schunk}

\subsection{Running the Gibbs sampler}

The following file represents a~hexagonal network:
%
\begin{verbatim}
1  NA   6 1  2 1
2  NA   1 1  3 1
3  -1.0 2 1  4 1
4  NA   3 1  5 1
5  NA   4 1  6 1
6  +1.0 5 1  1 1
\end{verbatim}
%
Two nodes have measurements, one up, the other down.  The file containing the data is \code{hexag.dat}.  We run \code{MMG.compute} on this file:

\begin{Schunk}
\begin{Sinput}
> r <- MMG.compute(file.name = "hexag.dat", steps = 1e+05, alpha = 0.1)