\name{run.analysis}
\alias{run.analysis}
\title{Test for Significant Peaks in FT-ICR MS by Controlling FDR}

\description{
Takes the file generated by \code{\link{run.cluster.matrix}} and tests the 
peaks using Benjamini-Hochberg to control the False Discovery Rate.}

\usage{
run.analysis(form, covariates, FDR = 0.1, norm.post.repl = FALSE, 
             norm.peaks = c("common", "all", "none"), normalization, 
             add.norm = TRUE,  repl.method = "max", use.model = "lm",
             pval.fcn = "default", lrg.only = TRUE, masses = NA,
             isotope.dist = 7, root.dir = ".", lrg.dir,
             lrg.file = lrg_peaks.RData, res.dir,
             res.file = "analyzed.RData", overwrite = FALSE,
             use.par.file = FALSE, par.file = "parameters.RData",
             bhbysubj = TRUE, subs, ...)
}

\arguments{
    \item{form}{object of class \dQuote{\code{\link{formula}}} to be used by \code{use.model} for testing using \code{covariates}}
    \item{covariates}{data frame containing covariates used in analysis}
    \item{FDR}{False Discovery Rate in Benjamini-Hochberg test}
    \item{norm.post.repl}{logical; whether to normalize after combining replicates}
    \item{norm.peaks}{which peaks to use in normalization}
    \item{normalization}{type of normalization to use on spectra before statistical analysis; kept for compatibility (see below)}
    \item{add.norm}{logical; whether to normalize additively or multiplicatively on the log scale}
    \item{repl.method}{function or string representing the name of a function; how to deal with replicates}
    \item{use.model}{function or string representing the name of a function; what test to apply to data}
    \item{pval.fcn}{function to extract \emph{p}-values; default is overall \emph{p}-value of test}
    \item{lrg.only}{logical; whether to consider only peaks that have at least one \dQuote{large} peak; i.e., identified by \code{run.lrg.peaks}}
    \item{masses}{specific masses to test}
    \item{isotope.dist}{maximum distance for declaring isotopes}
    \item{root.dir}{directory for parameters file and raw data}
    \item{lrg.dir}{directory for large peaks file; default is \code{paste(root.dir, "/Large_Peaks", sep = "")}}
    \item{lrg.file}{name of file to store large peaks in}
    \item{res.dir}{directory for results file; default is \code{paste(root.dir, "/Results", sep = "")}}
    \item{res.file}{name for results file}
    \item{overwrite}{logical; whether to replace existing files with new ones}
    \item{use.par.file}{logical; if \code{TRUE}, then parameters are read from \code{par.file} in directory \code{root.dir}}
    \item{par.file}{string containing name of parameters file}
    \item{bhbysubj}{logical; whether to look for number of large peaks by subject (i.e., combining replicates) or by spectrum}
    \item{subs}{subset of spectra to use for analysis; see below}
    \item{...}{additional parameters to be passed to \code{use.model}}
}

\details{
Reads in information from file created by \code{\link{run.cluster.matrix}} and
creates a file named \code{res.file} in directory \code{res.dir} which contains
the following variables:
\tabular{ll}{ \tab \cr
    \code{amps} \tab matrix of transformed amplitudes of alignment peaks \cr
    \code{bysubjvar} \tab a vector which tells which rows of \code{covariates} are identified as the same subject \cr
    \code{centers} \tab matrix of calculated masses of alignment peaks \cr
    \code{clust.mat} \tab matrix of transformed amplitudes of peaks used in statistical testing \cr
    \code{min.FDR} \tab FDR level required to get at least one significant test given the starting set of peaks \cr
    \code{sigs} \tab matrix containing all tests which are significant under at least one scenario \cr
    \code{which.sig} \tab matrix containing all peaks tested \cr
    \code{parameter.list} \tab if \code{use.par.file = TRUE}, a list generated by \code{\link{extract.pars}}; otherwise not defined \cr
}
}

\value{
No value returned; the file is simply created.
}

\references{
Barkauskas, D.A. and D.M. Rocke.  (2009a) \dQuote{A general-purpose baseline 
estimation algorithm for spectroscopic data}.  to appear in \emph{Analytica 
Chimica Acta}.  doi:10.1016/j.aca.2009.10.043   

Barkauskas, D.A. \emph{et al}. (2009b) \dQuote{Analysis of MALDI FT-ICR mass 
spectrometry data: A time series approach}.  \emph{Analytica Chimica Acta}, 
\bold{648}:2, 207--214.

Barkauskas, D.A. \emph{et al}. (2009c) \dQuote{Detecting glycan cancer 
biomarkers in serum samples using MALDI FT-ICR mass spectrometry data}.  
\emph{Bioinformatics}, \bold{25}:2, 251--257.

Benjamini, Y. and Hochberg, Y.  (1995) \dQuote{Controlling the false discovery 
rate: a practical and powerful approach to multiple testing.}  \emph{J. Roy. 
Statist. Soc. Ser. B}, \bold{57}:1, 289--300.
}

\author{Don Barkauskas (\email{barkda@wald.ucdavis.edu})}

\note{
If \code{use.par.file == TRUE} and other parameters are entered into the function
call, then the parameters entered in the function call overwrite those read in
from the file.  Note that this is opposite from the behavior for
\pkg{\link{FTICRMS}} versions 0.7 and earlier.


\code{norm.peaks} determines the peaks used for normalization: \code{"common"} 
normalizes each spectrum using the average peak height of the alignment peaks 
from that spectrum in \code{amps}; \code{"all"} normalizes each spectrum using 
the average peak height of all peaks in that spectrum.  

\code{normalization} is obsolete but is included for compatibility with previous 
versions of the package.  The valid normalization schemes translate to the new
scheme as follows: \code{"common"} is \code{norm.post.repl = FALSE} and 
\code{norm.peaks = "common"}; \code{"postbase"} is \code{norm.post.repl = FALSE} 
and \code{norm.peaks = "all"}; \code{"postrepl"} is \code{norm.post.repl = TRUE} 
and \code{norm.peaks = "all"}; and \code{"none"} is \code{norm.peaks = "none"} 
(and \code{norm.post.repl = FALSE}, although this value is irrelevant).

Replicates for the same subject are assumed to be determined by the unique
values of \code{covariates$subj}.  (Future implementations will allow for
other methods of defining this.)  To analyze replicates as independent samples,
use \code{repl.method = "none"}.  This will also speed up the run time if there
are no replicates in the data set.

The argument \code{subs} can be logical or numeric or character; if it is
defined, then \code{covariates} is modified to \code{covariates[subs,,drop=F]}.

If \code{masses} is not \code{NULL}, then the listed masses plus anything that 
could be in the first \code{isotope.dist - 1} isotope peaks of each mass are
tested.

If something other than the \emph{p}-value for the overall test statistic is
needed, then the user-defined function for \code{pval.fcn} should have the form
\code{pval.fcn = function(x){\dots}}, where \code{x} is a model object of the
type returned by \code{use.model}; and should have a return value of the desired
\emph{p}-value.

If \code{use.model} evaluates to \code{\link{t.test}}, then the difference
between the two groups for each peak is recorded in \code{which.sig$Delta} and
\code{sigs$Delta}; otherwise, these columns consist entirely of \code{NA}
entries.

Each rowname of \code{sigs} and \code{which.sig} represents the range of masses
that were used to form that peak.  The columns of those objects give the
\emph{p}-value of the peaks in each row, the number of samples that had large
peaks for each row, and the significance of each test, coded as
\tabular{ll}{ \tab \cr
    \code{NA} \tab peak not eligible for B-H \cr
    \code{0} \tab peak eligible for B-H but not declared significant \cr
    \code{1} \tab peak declared significant \cr
}    
The \dQuote{\code{S}} labels refer to the number of large peaks that were
necessary for a row to be eligible.  For example, the column labeled \code{S5}
in \code{sigs} used as its starting set of \emph{p}-values all rows which had
\code{which.sig$num.lrg >= 5}.  If \code{bhbysubj == TRUE}, then the entries of
\code{num.lrg} are obtained by going subject-by-subject and for each mass
counting the number of subjects who had at least one spectrum with a large peak
at that mass; otherwise, \code{num.lrg} for each mass is simply the total number
of spectra that had a large peak at that mass.
}

\seealso{\code{\link{run.strong.peaks}}}

\examples{}
