%\VignetteIndexEntry{Analysing interactions of fitted models}
\documentclass[english]{article}
\usepackage{float,graphicx,geometry}
\geometry{top=2.5cm,bottom=2.5cm,left=3cm,right=3cm}
\usepackage{amsmath}
\usepackage{Sweave}
\usepackage{babel}

\newcommand{\pkg}[1]{{\textbf{#1}}}
\newcommand{\R}{\textsf{R}}
\newcommand{\code}[1]{{\texttt{#1}}}

<<echo=FALSE>>=
options(useFancyQuotes = FALSE, width=85)
car.citation <- citation("car")
car.citation$key <- "Fox-CAR2011"
nlme.citation <- citation("nlme")[1]
nlme.citation$key <- "Pinheiro-nlme"
lme4.citation <- citation("lme4")[1]
lme4.citation$key <- "Bates-lme4"
@

\begin{filecontents}{bibextra.bib}
<<echo=FALSE, results=tex>>=
toBibtex(car.citation)
toBibtex(nlme.citation)
toBibtex(lme4.citation)
@
\end{filecontents}

\begin{document}
\setkeys{Gin}{height=0.4\textheight, keepaspectratio=true}
\DefineVerbatimEnvironment{Sinput}{Verbatim}{fontsize=\small,fontshape=sl}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{fontsize=\small,fontshape=n}
\DefineVerbatimEnvironment{Scode}{Verbatim}{fontsize=\small,fontshape=n}

\title{Analysing interactions of fitted models}


\author{Helios De Rosario Mart\'inez}
\maketitle
\begin{abstract}
This vignette presents a brief review about the existing approaches
for the \emph{post-hoc} analysis of interactions in factorial experiments,
and describes how to perform some of the cited calculations and tests
with the functions of the package \pkg{phia} in \R{}. Those
functions include the calculation and plotting of cell means, and
testing simple effects, residual effects, and interaction contrasts,
among other possibilities. They can be applied to linear and generalized
linear models, with or without covariates, and to mixed or multivariate linear
models for repeated measures experiments.
\end{abstract}

\section{Introduction}

The \emph{post-hoc} analysis of interactions in factorial ANOVA is
a controversial issue, that has generated many discussions and a variety
of methods. Traditionally, the most frequent practice has been the
analysis of \emph{simple main effects}, i.e. the main effect of one
factor at fixed values of the other factors. This type of analysis,
however, has severly been criticized for its mixing both main and
interaction effects. Marascuilo and Levin stated in 1970 that analysing
the simple effects of a significant interaction was a typical case
of the so-called ``Type IV error'': a wrong interpretation
of a correctly rejected null hypothesis, since that analysis does
not investigate the hypothesis that is presumably being tested \cite{marascuilo_appropriate_1970}.
On the other hand, they proposed the analysis of \emph{interaction
contrasts} (crossed contrasts of different factors) or the \emph{interaction
effects} (the value of the interaction after removing low-order effects).
The latter option was avidly supported by Rosnow and Rosenthal as well,
who called that concept \emph{residual} or \emph{leftover contrasts}
\cite{rosenthal_definition_1989,rosnow_things_1995}.

However, such proposals have not produced an established ``correct''
practice at all. In fact, the analysis of interactions has been a
heated field of debate for years; see Games' defence of simple effect
tests \cite{games_type_1973} and Levin's and Marascuilo's response
\cite{levin_type_1973}, or the criticisms of Meyer and Petty et
al. to Rosnow's and Rosenthal's proposals \cite{meyer_misinterpretation_1991,petty_understanding_1996},
as well as the answer to the latter \cite{rosnow_contrasts_1996}.
Although the theoretical issues of simple effects tests are generally
acknowledged, an eventual consensus about the ``best'' alternative
method is probably difficult to achieve. A general valid procedure
is not possible in the first place, since the correct test depends
on the specific problem addressed by the experiment.

Unfortunately, many researchers do not choose a method
depending on the question they want to answer. In spite of the criticism
received by the analysis of simple effects, various reviews of published
research in the three last decades have shown that it still is by
far the most frequent practice \cite{rosenthal_definition_1989,ottenbacher_interpretation_1991,pardo_interaccion_2007}.
According to Pardo's et al. interpretation, this is partly due to
the limitations of commonplace software packages, which do not provide
direct facilities for analysing the contrasts that isolate interaction
effects \cite{pardo_interaccion_2007}.

On the other hand, the flexibility of \R{} does allow to analyse
any kind of contrast across the factors of fitted models, even beyond
the two- or three-factorial designs that are normally discussed in the literature.
After all, any contrast can be described as a linear combination of
the model coefficients. Thus, since the mathematical details of fitted
models (like their matrices of coefficients and covariances) are easily
available in \R{}, the values and errors of those contrasts
can be calculated without difficulty, and moreover there are contributed functions
that facilitate the statistical tests based on such combinations of
coefficients, like \code{linearHypothesis} in the package \pkg{car}
\cite{Fox-CAR2011}, or \code{glht} in \pkg{multcomp}
\cite{multcomp-Hothorn}, which is specially suited for testing main
effects.

The package \pkg{phia} (Post-Hoc Interaction Analysis) provides
a usable interface for calculating different types of contrasts,
that are mentioned in the literature related to the analysis of interactions,
as well as other combinations of factors that could be of interest
for the researcher. The functionality of this package also extends
to more complex models, like generalized linear models, mixed-effects models,
multivariate linear models for repeated measures designs, and models with covariates.
The testing procedures provided by this package are the ones covered
by \code{linearHypothesis}, i.e. tests based on \emph{F} and $\chi^{2}$
statistics, with adjusted \emph{p}\nobreakdash-values if needed.
The following sections of this paper give a more detailed description
of the main types of contrasts that can be used for analysing interactions,
and examples of using the functions of this package for calculating
and testing them.


\section{Mathematical formulation of interactions\label{sec:maths-interactions}}

Interactions are often described in terms of a linear, two-way factorial
model, where the response variable $Y$ is a function of the factors $A$ and $B$.
Interactions are said to exist when a change in the level of one factor has
different effects on the response variable, depending on the value of the other
factor \cite{dodge_concise_2008}. This is often represented by the
following formula:

\begin{equation}
Y_{ijk}=\mu+\alpha_{i}+\beta_{j}+\left(\alpha\beta\right)_{ij}+\varepsilon_{ijk},\end{equation}
where $\alpha_{i}$, $\beta_{j}$ represent the {}``main effect''
of the $i$\nobreakdash-th and $j$\nobreakdash-th levels of $A$
and $B$, respectively, $\left(\alpha\beta\right)_{ij}$ is the effect
of the interaction in that combination of levels, and $\varepsilon_{ijk}$
is the error term of the $k$\nobreakdash-th observation in that
combination.

\R{} analyses such models in the more general framework of linear
models, defined by the following matrix equation:

\begin{equation}
\underset{(n\times m)}{\mathbf{Y}}=\underset{(n\times(r+1))}{\mathbf{X}}\underset{((r+1)\times m)}{\mathbf{B}}+\underset{(n\times m)}{\mathbf{E}},\label{eq:glm-matrix}\end{equation}
where $\mathbf{Y}$ contains the $n$ observations of the $m$\nobreakdash-dimensional
response variable (with $m$ often equal to 1), $\mathbf{X}$ is the
model matrix that only depends on the observed values of the predictor
variables and the structure of the model, $\mathbf{B}$ is the coefficient
matrix for that model structure and data (with $r$ degrees of freedom
--- d.o.f.), and $\mathbf{E}$ contains the error term.

The structure of $\mathbf{X}$ and $\mathbf{B}$ is very simple for
linear regression models, where all predictors are numeric variables.
Let us take a regression model with an univariate response and two
regressors ($X_{1}$ and $X_{2}$), including their interaction. In
this case \eqref{eq:glm-matrix} would just be the formulation in
matrix form of the regression equation:

\begin{equation}
\left(\begin{array}{c}
Y_{1}\\
\vdots\\
Y_{n}\end{array}\right)=\left(\begin{array}{cccc}
1 & X_{11} & X_{21} & X_{11}X_{21}\\
\vdots & \vdots & \vdots & \vdots\\
1 & X_{1n} & X_{2n} & X_{1n}X_{2n}\end{array}\right)\left(\begin{array}{c}
\beta_{0}\\
\beta_{1}\\
\beta_{2}\\
\beta_{12}\end{array}\right)+\left(\begin{array}{c}
\varepsilon_{1}\\
\vdots\\
\varepsilon_{n}\end{array}\right)\label{eq:regression-matrix}\end{equation}


This model has 3 terms, each with one d.o.f., that are represented
by different columns of $\mathbf{X}$ and coefficients of $\mathbf{B}$
(besides the intercept represented by the column of ones and $\beta_{0}$).
Two of these terms are the main effects of the regressors, represented by
their values in $\mathbf{X}$ and the ``slopes'' $\beta_{1}$,
$\beta_{2}$; the other one is the interaction term, represented by
the product $X_{1i}X_{2i}$ and the coefficient $\beta_{12}$. For
more complex regression models, there may be as many terms as possible
products of regressors, such that if there are $k$ regressors, there may
be up to $k^{2}-1$ terms.

If some or all the predictor variables are factors, the same equations
hold, but the representation of the terms in the model matrix would
not be scalar values as $X_{1i}$, $X_{2i}$, or $X_{1i}X_{2i}$.
The main term of each factor would be represented by a set of ``dummy
variables'', whose number would be equal to the d.o.f. of the factor
(the number of levels minus 1), and the interactions would be represented
by all the possible products of the corresponding dummy variables.
Thus, for instance, if there are two factors $A$ and $B$ with $3$
and $4$ levels, respectively, the term of $A$ would be represented
by $2$ dummy variables, $B$ by $3$ of them, and their interaction
by $2\times3=6$ dummy variables.

The problem is that the coefficients that define the interactions
in this framework are not always useful for describing the model in
practical terms. The products of regressors are usually meaningless
variables, and this poses a difficulty in the interpretation of the
associated coefficients (see section \ref{sec:covariates} below). This issue
is further aggravated when there are factors involved in the interactions,
since the meaning of the dummy variables may be even more opaque. That is one
of the reasons that motivate the different ways of describing interactions,
which are commented on next.


\section{Analysis of simple effects in factorial models}


\subsection{Calculation and plots of ``cell means''}

Let us take, for this and later sections, an example data set based
on R.J. Boik's hypothetical data \cite{boik_interactions_1979}, which
he used for demonstrating how to analyse interaction contrasts in linear models,
although it will be used here for a larger variety of interaction analyses.
It represents a hypothetical experiment, where people affected by
hemophobia were treated with different fear reduction therapies and
different doses of antianxiety medication, in a balanced factorial
design, and the effect of these combined treatments was measured by
their electrodermal response in an experimental session. 

First we need to create the linear model from the data. We will use
the data frame \code{Boik} also included in the package \pkg{phia},
that has the response \code{edr}, and two factors (\code{therapy},
with levels \code{control}, \code{T1}, and \code{T2}; and \code{medication},
with levels \code{placebo}, \code{D1}, \code{D2}, and \code{D3}).
We use the function \code{some} of the package \pkg{car} (imported
by \pkg{phia}) to see some cases:

<<>>=
library(phia)
some(Boik)
@

Before proceeding with detail analyses of the interactions, we should
first check if the factorial model is coherent with the data, and if the interaction
between both factors is actually significant. We can do this by examining
the residuals of the model (see figure \ref{fig:mod-boik-diagnos})
and the ANOVA table.

<<modboikresid>>=
mod.boik <- lm(edr ~ therapy*medication, data=Boik)
par(mfcol=c(1,2))
plot(mod.boik, 1:2) # Plot diagnostics for the model
Anova(mod.boik)
@

%
\begin{figure}
\begin{centering}
<<fig=TRUE, echo=FALSE, results=hide>>=
<<modboikresid>>
@
\par\end{centering}
\caption{Residuals vs. fitted values and Q-Q plot of \code{mod.boik}\label{fig:mod-boik-diagnos}}
\end{figure}

Although the plots of figure \ref{fig:mod-boik-diagnos} show a minor departure
of normality for the residuals, specially due to a couple of extremely
low observations, for the sake of balance we will keep all data, and
assume that the model assumptions hold. Then we see in the ANOVA table
that the interaction between \code{therapy} and \code{medication}
is significant, so it does makes sense to investigate this effect.%
\footnote{The data set is based on the results reported in Boik's paper for
the different tests, but not directly copied from his original work
(that actually gives no data set). Thus, the residual plots are irrespective
of Boik's paper, and due to rounding inaccuracies, the figures presented in this
vignette and the ones of Boik's tables differ in the last decimals.
Regarding the ANOVA calculations, the \code{Anova} function from the package
\pkg{car} has been used to be consistent with later
sections, although for this set of balanced data the results would
be the same if we had used \code{anova} from the base \R{}
package.%
}

In factorial experiments like this one, the dependency between factor
levels and the response variable is usually represented in a contingency
table, where the rows and columns are related to the different levels
of both treatments, and each cell contains the adjusted mean of the
response for the corresponding interaction of factors.
When there is an interaction effect, the cell means are taken as the
most straightforward way of representing this effect. These values and their
standard errors can be obtained from the model coefficients with the function
\code{interactionMeans} in the package \pkg{phia}, using the fitted
model as first (and in this case only) argument:

<<>>=
(boik.means <- interactionMeans(mod.boik))
@

This function calculates by default the cell means for the interactions of
highest order between factors. To obtain means of lower-order
interactions, the optional argument \code{factors} admits a character vector
with the names of the factors that are included in the desired interaction.%
\footnote{But consider the contradiction of this approach with the marginality
principle discussed in the next section.%
} If this argument gives only one factor, the result will be the means
of its zeroth-order interaction (i.e. the marginal means for that factor).
Thus, for instance:

<<>>=
interactionMeans(mod.boik, factors="therapy")
@

The output of \code{interactionMeans} can be plotted via the generic
\code{plot} method, that produces the set of plots shown in figure
\ref{fig:mod-boik-plot}.
The off-diagonal panels are the typical interaction plots, that can also
be created by \code{interaction.plot} from the columns of \code{boik.means},
where the lack of parallelism between lines reveals
how one factor changes the effect of the other one.
In this case, we see that the control group hardly obtains
any benefit from the medication, whereas with the other therapies
(\code{T1} and \code{T2}) the fear to blood is reduced proportionally
to the medication dose, and more markedly for the former. On the other
hand, the diagonal panels represent the marginal means of each factor.

%
\begin{figure}
\begin{centering}
<<fig=TRUE, echo=FALSE>>=
plot(boik.means)
@
\par\end{centering}
\caption{Result of \code{plot(boik.means)}\label{fig:mod-boik-plot}}
\end{figure}


If the interaction involved more than two factors, the graphical device
would have as many rows and columns as factors, and the off-diagonal
panels would show the first-order interaction means for each pair
of factors. For interactions with many factors, the matrix of panels
may be cluttered, so it would be more convenient to show them in separate
figures. The argument \code{multiple} (\code{TRUE} by default)
can be modified for this purpose:

<<>>=
plot(boik.means, multiple=FALSE) # Not printed in this paper
@


\subsection{\emph{Caveat}: low-order interactions and the marginality principle}

The basic methods of statistical analysis in \R{} favour the
so-called ``marginality principle'', whereby the main effects
of factors with non-null interactions should not be interpreted or
tested \cite{nelder_reformulation_1977}. Now, the same warning applies
to interactions that are themselves contained in interactions of higher
order. Thus, although the plots described in the previous section
are commonplace in the study of interactions, they are not necessarily
meaningful in all circumstances.

For instance, since the interaction between \code{therapy} and \code{medication}
is significant, according to the marginality principle we should not be
concerned with the main effects of those factors, so the diagonal
panels of figure \ref{fig:mod-boik-plot} would be irrelevant for this model.
Likewise, if a model has more than two factors and an interaction
of second or higher order is significant, no plot containing those
factors would actually be of interest, since the method \code{plot}
on the result of \code{interactionMeans} only represents main effects
and first-order interactions. (This is a limitation of the graphic
representation alone, since the data frame can represent higher-order
interactions).

A suitable alternative for representing higher-order interactions
are the functions \code{effect} and\\ \code{allEffects} from the package
\pkg{effects} \cite{Fox:Hong:2009:JSSOBK:v32i01}.  Those functions
are specially devised to analyse and plot the interactions of highest
order in the model. For factorial
models, their outputs contain the same type of data as \code{interactionMeans},
although they handle the numeric predictors of the model in another
way, and the types of models that can be analysed is different (see
sections \ref{sec:repeated-measures} and \ref{sec:covariates}).


\subsection{Testing simple effects}

The tabulation or graphical representation of cell means may give
us a hint of the underlying structure of interactions, but they do
not suffice to verify whether a specific change in the factors plays
a significant role in an interaction. As commented on above, the most
frequent approach for solving this issue consists in testing the simple effects,
as an extension of the \emph{post-hoc} methods that are widely applied to 
the study of main factor effects.

The available methods for the \emph{post-hoc} analysis of main effects
are manifold. The most basic procedure consists in evaluating multiple
contrasts between factor levels, possibly with corrections of the
\emph{p}\nobreakdash-value in order to protect the family-wise error
rate. Pairwise comparisons between levels are usually a default strategy
when the researcher has no previous plan, although this is inefficient
when the factor has many levels. Tukey's method for testing pairwise
contrasts, and Scheff\'e's method for all possible contrasts within 
a factor, are probably the most popular ones.
The package \pkg{multcomp} provides useful tools for this
kind of main effects contrasts.

Testing simple main effects for interactions consists in evaluating
contrasts across the levels of one factor, when the values of the
other interaction factors are fixed at certain levels. This test
is then repeated at other fixed levels, and the results are compared.
For instance, we could test the effect of \code{medication} at the
different levels of \code{therapy}. This can be done with the function
\code{testInteractions}, using the arguments \code{fixed} and
\code{across} to define the factors that are fixed and tested across
their levels in each test:

<<>>=
testInteractions(mod.boik, fixed="therapy", across="medication")
@

The columns \code{medication1}, $\ldots$ \code{medication3} in
the resulting table contain the value of the three orthogonal contrasts
across the levels of \code{medication}, for each level of \code{therapy}
(the only fixed factor in this example).%
\footnote{The specific contrasts that are calculated depend on various elements.
In this case, since the original data frame defines \code{medication}
as an ordered factor, polynomial contrasts are computed by default.
For unordered factors they would have been {}``sum-to-zero contrasts''.
This default behaviour can be overriden by setting other contrast
in the original data frame, the fitted model, or with additional arguments
in \code{testInteractions}.%
} The rest of columns show the information of the multivariate test
applied to those contrasts. These tests just quantify the qualitative
interpretation that was made from the plots: the medication does not
have a significant effect for the control therapy group, but its effect
is remarkable for the other groups.

The criticism often posed to this method is that interactions are
mixed with main effects (or lower-order interactions within), so the
tests are not really related to the term that is supposedly under
investigation. Using this example, the \emph{post-hoc} analysis of
the term \code{therapy:medication} is being performed because the ANOVA
told us that it is significant; and this means that the coefficients of
the matrix $\mathbf{B}$ related to this term are unlikely to be null.
However, the tests of simple effects that have just been described
do not only involve those coefficients, but also the coefficients
related to the lower-order terms \code{therapy} and \code{medication}.%
\footnote{\code{interactionTest} does multiple calls to the function \code{testFactors},
which in its turn defines a linear combination of the model coefficients
and passes it down to \code{linearHypothesis} from \pkg{car}. The hypothesis matrices
used in these tests can be looked at to see what coefficients are
actually involved.%
}

On the other hand, many researchers like simple effects for their
relatively straightforward interpretation. Moreover, the interference
of lower-order coefficients may be regarded a lesser issue when the
marginality principle is considered. In this theoretical framework,
the presence of a high-order interaction makes lower order terms meaningless,
so that their effects are absorbed by the interaction. Therefore,
the coefficients of lower order terms would partially be related to
the interaction effect as well.


\section{Analysis of residual effects}

To address the conceptual problems of simple effects, Rosnow and Rosenthal
encouraged the analysis of residual effects, by ``peeling away''
the lower-order effects from cell means \cite{rosenthal_definition_1989}.
For instance, let us see the cell means calculated in \code{boik.means},
in a table with the marginal means for both factors and the grand
mean:

<<>>=
boik.mtable <- xtabs(boik.means$"adjusted mean" ~ therapy+medication, boik.means)
boik.mtable <- addmargins(boik.mtable, FUN=mean, quiet=TRUE)
print(boik.mtable, digits=4)
@

The ``corrected means'' would be obtained by subtracting the lowest-order
effect (the grand mean) from the rest of values of the table, and
then sweeping out the corrected marginal means from the individual
cells.

<<>>=
boik.resid <- boik.mtable - boik.mtable[4,5] # Subtract the mean
boik.resid <- sweep(boik.resid, 1, boik.resid[,5]) # Subtract row means
boik.resid <- sweep(boik.resid, 2, boik.resid[4,]) # Subtract column means
print(boik.resid, digits=4)
@

These values can also be calculated (and moreover tested) by \code{testInteractions}
via the argument \code{residuals}, instead of \code{fixed} or
\code{across}:

<<>>=
testInteractions(mod.boik,residual=c("therapy","medication"))
@

However, these results may cause confusion, and their actual interest
may be dubious in many cases, including the analysis of this model.
We can plot the corrected means (ommiting the margins of the table)
for a clearer inspection of what is happening; see figure \ref{fig:residual-plot}:

<<residualsplot>>=
matplot(t(boik.resid[-4,-5]), type="b", xaxt="n", ylab="Interaction residuals")
axis(1, at=1:4, labels=levels(Boik$medication))
@

%
\begin{figure}
\begin{centering}
<<fig=TRUE, echo=FALSE>>=
<<residualsplot>>
@
\par\end{centering}
\caption{Residual effects of \code{mod.boik}\label{fig:residual-plot}}
\end{figure}


The lines of this plot (representing the three therapies) show the
typical symmetry of residual effects. The line labelled with 1 (the
control group without specific therapy) shows a negative residual
effect of the placebo (a lower electrodermal response), that goes
up to positive values as the medication dose increases. The residual
effects of the group treated with therapy \code{T1} are the opposite,
whereas the the \code{T2} group has a trend similar to the controls,
but quite smaller.

The first problem is that a hasty interpretation of these values would
lead to nonsense. Of course, they do not mean that the hemophobia
of people that did not participate in any theraphy worsened with the
medication! The correct interpretation is that for these people, the
effect of increasing medication was lower than value expected \emph{by
the average of marginal means}, and the opposite happened with the
group treated with therapy \code{T1}.

But this reasoned interpretation, albeit mathematically sound, only
has sense as long as the expectations based on the marginal averages
mean anything, and this is contrary to the principle of marginality
commented on above. Since the factors \code{therapy} and \code{medication}
do interact, the marginal means depend on the distribution of the
sample across the factors; they are not reliable information about
the response that we can expect for the different factor levels, except
for a population with the same distribution as the sample (in this
case a balanced distribution). See, for instance, Games' and Meyer's
discussion on this issue (although they do not explicitly refer to
the ``principle of marginality'') \cite{games_type_1973,meyer_misinterpretation_1991}.

That principle can be neglected if there is a good reason
to study the marginal means that are obtained with a specific experimental
design, but these circumstances are rare and special \cite{Venables00exegeseson}.
In the current example there is no compelling reason to do so, therefore
the significant differences found for the placebo and the highest
dose in different therapies are rather uninformative.


\section{Interaction contrasts}

Another alternative to simple effects is the study of interaction
contrasts, which were in fact the subject of the paper where our
working data is derived from, although Boik used a slightly different
procedure for their analysis. Like in the analysis of interaction residuals,
the hypothesis tested by interaction contrasts is not affected by the
coefficients of main effects, but this approach overcomes the commented issue
of interaction residuals, because it does not make use
of marginal means, it only uses the data of the cells \cite{levin_type_1973,meyer_misinterpretation_1991}.
Interaction contrasts are defined as ``differential effects'',
or more descriptively as ``differences of differences'', or ``contrasts
between contrasts''. They basically consist in calculating one or
more contrasts across a factor, and then iterating on the results
of that operation across the remaining factors.

For instance, the test of simple effects previously calculated for
\code{mod.boik} could be transformed into a test of interaction
contrasts, if instead of fixing the levels of \code{therapy} for
evaluating the contrasts across \code{medication}, we do pairwise
contrasts between \code{therapy} levels. For this we must use the
argument \code{pairwise} instead of \code{fixed}:

<<>>=
testInteractions(mod.boik, pairwise="therapy", across="medication")
@

These tests show how the contrasts across \code{medication} differ
between pairs of \code{therapy} groups. We can see that the medication effect
with the therapy \code{T1} differs from the effect in controls and for the
other therapy; the medication effect with \code{T2} also differs from the effect
in controls, but this variation is not so remarkable. This conclusion was not so clear
from the simple effects tests. Moreover, these results are not disturbed
by the main effects of factors at all, because the calculation of
contrasts has removed them for both factors, without having defined
them explicitly.

The most basic interaction contrasts involve pairwise contrasts for
all factors. That is what \code{testInteractions} does by default,
when only the model is specified.

<<>>=
testInteractions(mod.boik)
@

If all the factors of the model had 2 levels, this would have been
an optimal strategy for analysing the interaction, since the result
would have been reduced to one test, corresponding to the single d.o.f.
of such an interaction. But the factors with more levels heavily
increase the number of tests, so that for our $3\times4$ factorial
design, with $2\times3=6$ d.o.f., we obtain 18 overredundant tests.
Such a high number of tests is difficult to interpret, let aside the
lack of reliability of the \emph{p}\nobreakdash-values (with or without
corrections, that can be set by the argument \code{adjustment} in
\code{testInteractions}).

A more sensible strategy consists in defining a small number of meaningful
contrasts that can be of interest for the researcher. For instance,
we might be interested in knowing the effect of crossing the following
contrasts for each factor
\begin{enumerate}
\item For \code{therapy}: controls vs. any therapy, and one therapy vs.
the other.
\item For \code{medication}: placebo vs. any real dose, the minimum dose
vs. the maximum, and the medium dose vs. the average of all doses.
\end{enumerate}

The function \code{testInteractions} also allows to define such custom
contrasts, via the argument \code{custom}. This argument must be
a named list of matrices, one per factor, with the vectors of coefficients
that define the contrasts arranged in columns. The auxiliary function
\code{contrastCoefficients} provides a convenient interface to generate that
list, from symbolic formulas that represent the contrasts:%
\footnote{These matrices are transformed combinations of Helmert and polynomial
contrasts, so they could have been defined by the functions \code{contr.helmert}
and \code{contr.poly} as well.%
}

<<>>=
(custom.contr <- contrastCoefficients(
therapy ~ control - (T1 + T2)/2,          # Control vs. both therapies
therapy ~ T1 - T2,                        # Therapy T1 vs. T2
medication ~ placebo - (D1 + D2 + D3)/3,  # Placebo vs. all doses
medication ~ D1 - D3,                     # Min. dose vs. max
medication ~ D2 - (D1 + D2 + D3)/3,       # Med. dose vs. average
data=Boik, normalize=TRUE))               # Normalize to homogeinize the scale
@

Then use this list to define the contrasts in \code{testInteractions}
(after renaming the columns of the matrices for a clearer interpretation of
the output):

<<>>=
colnames(custom.contr$therapy) <- c("cntrl.vs.all", "T1.vs.T2")
colnames(custom.contr$medication) <- c("plcb.vs.all", "D1.vs.D3", "D2.vs.avg")
testInteractions(mod.boik,custom=custom.contr)
@

This table of results is much clearer than the former. Moreover, all
these contrasts are orthogonal to each other (none of them can be
obtained by combination of the others), so the tests are independent,
and the adjustment of \emph{p}\nobreakdash-values is reliable. Taking
some care about the meaning of positive and negative figures of the
\code{Value} column, we can obtain the following conclusions:
\begin{enumerate}
\item According to the first two tests, the benefit of taking medication
(pooling over the three doses) is greater if the subject also receives
some therapy, and this effect is specially marked for therapy \code{T1}.
\item And according to the second two tests, the therapies interact in the
same manner with the benefit of increasing the medication from the
minimum to the maximum. On the other hand, we cannot say that the
therapy influences the difference between the medium dose and the average
of all doses.
\end{enumerate}

\section{Multivariate models for repeated-measures\label{sec:repeated-measures}}

Repeated-measures experiments are common in many disciplines, including
psychology and agriculture, although in the latter they are usually
found with the specific strucutre and name of ``split-plot'' designs.
The classical approach for analysing this kind of experiments is via
multi-strata ANOVA or univariate mixed-effects models, where the subjects
or plots are introduced as factors with random effects, added to the
error term (see section \ref{sec:mixed-models}). However, when the design is
balanced and adequately sized, the multivariate approach is recommended if it
possible, since it does not depend on the sphericity assumption and the results
are more robust \cite{keselman_testing_1998}.

An example of such an analysis in \R{} is published in the web appendices to
Fox's and Weisberg's \emph{R Companion to Applied Regression} \cite{Fox-appendix-mlm-car}.
We will use that example, based on the \code{OBrienKaiser} data set
from \pkg{car} \cite{obrien_manova_1985}.

<<>>=
some(OBrienKaiser, 6)
@

That data set has 16 rows with observations of people classified by
two between-subjects factor (\code{gender}, with levels \code{F},
\code{M}; and \code{treatment}, with levels \code{control},
\code{A}, and \code{B}), so that each subject has 15 measures
distributed in columns. These 15 columns correspond to 5 consecutive
observations in 3 different phases (pre-test, post-test, and follow-up);
this within-subjects model may be coded in a data frame with two crossed
factors:

<<>>=
(idata <- expand.grid(hour=ordered(1:5), phase=c("pre", "post","fup")))
@

The between-subjects factor \code{treatment}, however, is not balanced
in this case, as can be seen in the following frequency table:

<<>>=
addmargins(table(OBrienKaiser[c("gender","treatment")]))
@

We skip the exploration of the data that is already done in \cite{Fox-appendix-mlm-car},
and proceed to defining the multivariate model.

<<>>=
mod.ok <- lm(cbind(pre.1, pre.2, pre.3, pre.4, pre.5,
    post.1, post.2, post.3, post.4, post.5,
    fup.1, fup.2, fup.3, fup.4, fup.5) ~ treatment*gender,
    data=OBrienKaiser)
@

The multivariate ANOVA with response transformation for repeated measures
may be done with the function \code{Anova} in \pkg{car}, using
the auxiliary data frame \code{idata}, and the formula \code{idesign}
with the within-subjects design. For the sake of coherence with the
published example, we report a type\nobreakdash-III test.

<<>>=
Anova(mod.ok, idata=idata, idesign=~phase*hour, type=3)
@

Besides the intercept, the only significant effects at $\alpha=0.05$
are the main effects of \code{phase} and \code{hour}. Nevertheless,
let us suppose that we have reasons to be more liberal, and want to
investigate the interaction \code{treatment:phase} that is near
the $\alpha$ level of significance. (The main effect \code{treatment}
is also near that level, but we may ignore it since we will focus
on its interaction with another factor).

The operations previously presented for univariate linear models can also be
used in this case, as convenient wrappers of the procedures recommended for
the \emph{post-hoc} analysis of multivariate models \cite{R-help:141760,R-help:235586}.

First we may explore and
plot the cell means of this interaction with \code{interactionMeans},
using the auxiliary data frame \code{idata} to specify the within-subjects
model (\code{idesign} is not needed). We will use the argument \code{errorbar}
to draw the asymptotic $95\%$ confidence intervals of the adjusted means,
instead of their standard errors.%
\footnote{These confidence intervals are not exact, but an approximation assuming that
the parameters of the model are random results, normally distributed around their
``true'' values. This assumption is  met asymptotically, if the samples
of data are large enough.%
} The figure in the string \code{"ci95"} might be changed to calculate other confidence
intervals, like \code{"ci90"}, \code{"ci99"} for $90\%$, $99\%$, etc.

%
<<okplot>>=
ok.means <- interactionMeans(mod.ok, c("treatment","phase"), idata=idata)
plot(ok.means, errorbar="ci95")
@
\begin{figure}
\begin{centering}
<<fig=TRUE, echo=FALSE, results=hide>>=
<<okplot>>
@
\par\end{centering}
\caption{Means of the \code{treatment:phase} interaction for the O'Brien
and Kaiser model\label{fig:ok-treatmentxphase}}
\end{figure}


The plot of figure \ref{fig:ok-treatmentxphase} shows that in the
post-test and follow-up phases, the response of the control group
more or less remains at the same level as in the pre-test phase, whereas
the response for the other treatments increases. However, the confidence
intervals are relatively large, compared with the variations between adjusted
means, and there is a lot of overlap.

An analysis of all the possible interaction pairwise contratsts between \code{treatment}
and \code{phase} help us tell what differences are really significant:

<<>>=
testInteractions(mod.ok, pairwise=c("treatment", "phase"), idata=idata)
@

Although the interaction contrasts result in one-dimensional values,
we have trasformated a multivariate response to obtain them, so the
rows of this table report Pillai tests of MANOVA. These tests show
that the only significant contrast is between the pre-test and follow-up
phases, when compared between the control group and treatment \code{B}.
Given the similarity betwen the means of treatments \code{A} and \code{B},
we could have expected a significant difference between controls and
treatment \code{A} as well, but the test does not reject the null
hypothesis in that case, because the number of observations for treatment
\code{A} is lower, and therefore the confidence intervals for the estimations
are greater.

The main effect of \code{hour} could be analysed similarly. And 
there are other possible ways of analysing this interaction, using
other options of \code{testInteractions}, or by other methods
as proposed by Keselman \cite{keselman_testing_1998}. The reader
is encouraged to try these alternatives.


\section{Linear models with covariates\label{sec:covariates}}

Data sets may include numeric predictors, as mentioned in section \ref{sec:maths-interactions}.
When combined with factors, they are called \emph{covariates} and
ANOVA is called ANCOVA (Analysis of Covariance). The analysis of interactions
that involve these variables is also different. A factorial model
has a finite number of factor combinations where the adjusted mean
of the response can be evaluated, but the possible values of a covariate
are infinite. Therefore, the effects of covariates are usually represented
as continuous functions within the range of their observed values
(the model may allow the calculation of effects beyond that range,
but such predictions would normally have little reliability).
% For instance, the functions \code{effect} and \code{allEffects} from the
% package \pkg{effects} evaluate the adjusted mean of the response
% at a sample of values of the covariates, either automatically calculated
% by the function or specified by the user.

We have seen that in factorial models, the effects of factors may be analysed
by the contrasts between their levels. Obviously, the number of ``contrasts''
between possible values of a covariate would be infinite, although they are
constrained by the d.o.f. of the model. Let us take a pure linear regression model
without factors, and two covariates that do not interact:

\begin{equation}
Y_{i}=\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\varepsilon_{i}\end{equation}


For the effect of $X_{1}$, there are infinite pairs of values $X_{1a}\neq X_{1b}$
at which we could estimate such ``contrasts'', but the expected value of the
result would always be proportional to the difference between $X_{1a}$
and $X_{1b}$. The ratio between both differences would be equal to
the derivative of $E\left(Y\right)$ with respect to $X_{1}$, which
is equal to the model coefficient for $X_{1}$:

\begin{equation}
\frac{\Delta E\left(Y\right)}{\Delta X_{1}}=\frac{\partial E\left(Y\right)}{\partial X_{1}}=\beta_{1}\end{equation}


Thus, when covariates do not interact, their effects can just be described
by the values of their corresponding model coefficients. They are
a measure of the ``slope'' along the covariate, or the increment
in the expected value of the response, when the covariate increases
in one unit. Thus, if the researcher also wants the adjusted value
of the response for different values of the covariate, 
% as given by the functions of \pkg{effects},
the only additional information
that he or she needs is the adjusted mean at an arbitrary point.

The functions of the package \pkg{phia} can report the values of those
slopes. Let us take the model for the prestige of Canadian occupations,
defined in \cite{Fox-CAR2011}, p. 165. That model uses the data set \code{Prestige}
from \pkg{car}, that contains several variables related to 102 different occupations:

<<>>=
str(Prestige)
@	

In this model, the prestige score of each profession (\code{prestige}) is fitted
to a linear model that depends on \code{education} (average years of education),
\code{income} (average income), and the factor \code{type}, that has three levels:
\code{bc} (blue collar), \code{prof} (professional), and \code{wc} (white
collar). The two former variables are continuous covariates (\code{income}
transformed to logarithmic scale to improve the normality of the residuals),
with different responses for the three types
of ocupation (an interaction with \code{type}). We fit this model
and do an ANOVA of it:

<<>>=
mod.prestige <- lm(prestige ~ (log2(income)+education)*type, data=Prestige)
Anova(mod.prestige)
@

This analysis detects significant main effects of all the predictors, plus a
significant interaction between \code{log2(income)} and the factor \code{type}.
This interaction can explored with \code{interactionMeans} (see figure \ref{fig:interaction-covariates})
and tested with \code{testInteractions}, by just giving the name of the
relevant covariate(s) in the argument \code{slope}. 
We also tell the name of the factor plotted at the \emph{X}\nobreakdash-axis
and for which the pairwise contrasts are calculated (\code{type}), although
it might be ommited in this case because there is no other factor in the model.

<<prestigeplot>>=
plot(interactionMeans(mod.prestige, atx="type", slope="log2(income)"))
testInteractions(mod.prestige, pairwise="type", slope="log2(income)")
@
%
\begin{figure}
\begin{centering}
<<fig=TRUE, echo=FALSE, results=hide>>=
<<prestigeplot>>
@
\par\end{centering}
\caption{Plot of the interaction \code{log2(income):type\label{fig:interaction-covariates}}}
\end{figure}

The plot and ANOVA table show the adjusted values of the slope with
respect to \code{log2(income)} (instead of the adjusted mean of the
response), for the levels and contrasts of the factor \code{type}.
That slope, i.e. the proportional relation between the occupation's
income and its prestige, is greater for blue collar occupations, whereas
the difference is smaller between the other two types. However, the
tests only reveal significant differences between blue collar and
professionals, due to the unbalancedness of data. We can see in the
ANOVA table that although the the adjusted slope of \code{bc-prof}
is relatively similar to \code{bc-wc}, the sums of squares are far
greater in the former case, since there are many more cases of \code{prof}:

<<>>=
table(Prestige$type) # Frequencies of occupation types
@

If there had been a significant interaction between the covariate \code{education}
and \code{type}, we could have analysed it independently, since
both covariates have an additive effect (they do not interact). Things
may become more complicated if there are interactions between covariates.
In that case, the slopes are not constant for given combinations of
factors, but they are a function of the interacting covariates. For
instance, in a regression with two interacting variables:

\begin{equation}
Y_{i}=\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\beta_{12}X_{1}X_{2}+\varepsilon_{i},\label{eq:model-covariates}\end{equation}
the slope with respect to $X_{1}$ is $\left(\beta_{1}+\beta_{12}X_{2}\right)$,
and the slope for $X_{2}$ is $\left(\beta_{2}+\beta_{12}X_{1}\right)$.

This means that the results of previous calculations may depend on
the values of the other covariates. For instance, let us take another
model for the \code{Prestige} data, where we consider that \code{log2(income)}
and \code{education} can interact (the influence of the occupation's
income may vary with the average years of education), and we discard
the data of white collar occupations to have a more balanced design
and simpify things.

<<>>=
mod.prestige2 <- update(mod.prestige, formula=.~.+(log2(income):education)*type,
subset=(Prestige$type!="wc"))
Anova(mod.prestige2)
@

The significant effects shown by the ANOVA table are the same as for
the previous model, including the interaction \code{log2(income):type}.
And since \code{type} now has only two levels, the post-hoc test
of that interaction gives virtually the same result as that table:

<<>>=
testInteractions(mod.prestige2, pairwise="type", slope="log2(income)")
@

However, the algorithm used by this test (the function \code{testFactors})
sets the covariates at definite values, and the interaction between
covariates considered by the model makes the test sensitive to those
values. The default values of the covariates are the averages of the
cases observed in the model fit. Nevertheless, the user may choose
other arbitrary values, using the extra argument \code{covariates}
that is passed down to \code{testFactors}. This argument may contain
a named numeric vector with the custom values of any covariate. For
instance, we might want to set \code{education} at its 75th quantile,
and see what happens with the interaction \code{log2(income):type}.

<<>>=
# Look quantiles of the model frame (a subset of the original data)
quantile(model.frame(mod.prestige2)$education)
testInteractions(mod.prestige2, pairwise="type", slope="log2(income)",
covariates=c(education=14))
@

Altering the value of \code{education} has substantially changed
the result of the test on a term that contains \code{log2(income)},
because of the interaction between both covariates. See, however,
that this interaction, and other terms that contain it, are not significant
according to the ANOVA of the model. So it could be wise to simplify
the model (turning back to the original one), and remove this needlessly
problematic interaction.

Now, if an interaction between covariates were really significant,
the interest of the researcher should focus on it. The high-order
interactions in linear models have a constant effect at any combination
of the covariates, so the problem of the arbitrary values used by
the tests disappears. If the argument \code{slope} of \code{interactionMeans}
or \code{testInteractions} has the names of two or more covariates
of the model, the calculations will be done on the values of the coefficients
related to those interactions (and higher-order terms that might contain
it). As commented on at the beginning of this document, these coefficients
may be more difficult to understand, since the {}``slope'' along
a product of covariates may seem meaningless, but it can just be interpreted
as an extension of the analogy previously used between factor contrasts
and derivatives with respect to covariates.

We have already seen that when two factors interact, their effect
can be evaluated by means of {}``contrasts between contrasts''.
Likewise, when two covariates interact, we can study a {}``derivative
of the derivative'', i.e. the second-order partial derivative of
the response, which is represented by the interaction coefficient.
For the regression model of equation \eqref{eq:model-covariates}:

\begin{equation}
\frac{\partial^{2}E\left(Y\right)}{\partial X_{1}\partial X_{2}}=\beta_{12}\end{equation}


This interpretation can be extended to third-order or higher interactions
between covariates as well, although models with that complexity are
rarer, and the very meaning of such interactions will probably more
difficult to interpret than the coefficients used for representing
them.


\section{Generalized linear models\label{sec:glm}}

Generalized linear models (GLM) are very much like classical (Gaussian)
linear models in most aspects, let aside the distribution of the error
term and the expected value of the response, $\mu=E\left(Y\right)$,
that is related to the linear predictor by means of a \emph{link function}
$\eta\left(\mu\right)$. Accordingly, the interactions of such models
can be analysed by the methods explained in previous sections, although
the interpretaton of adjusted values in cell means and contrasts may
be a  bit more involved.

Let us take the example of the AMS survey about PhDs in mathematical
sciences. This example uses the data set \code{AMSsurvey} of the package
\pkg{car}, that contains a cross-classification of all the PhDs
awarded in the mathematical sciences for different periods in US, assigned
to 24 different categories depending on various characteristics of
the doctorate students \cite{Phipps2010}.

<<>>=
str(AMSsurvey)
@

The data are structured in a data frame, with one row per category.
The variable \code{counts} tells the number of PhDs of each category in 2008-2009,
and the previous variables are factors that identify the category.%
\footnote{\code{AMSsurvey} also contains data of other periods, but we will focus
on the period 2008-2009. The analysis will be based on a parsimonious model for
those data, which is already fitted in \cite[pp. 253--255]{Fox-CAR2011}.%
} One of these factors is \code{type},
the type of institution the doctorate was affiliated with, and has
6 levels: \code{I(Pr)}, \code{I(Pu)}, \code{II}, \code{III},
\code{IV}, and \code{Va};\code{ I} to \code{III} are math
departments in universities of progressively lower-quality --- with
\emph{pr}ivate and \emph{pu}blic institutions distinguished by the
parenthetical abbreviations, \code{IV} are statistics and biostatistics
departments, and \code{Va} are applied mathematics departments.
The other factors are \code{sex} (the gender of the
doctorate, a factor with levels \code{Female} and \code{Male}),
and \code{citizen} (the citizenship status, a factor with levels
\code{Non-US} and \code{US}). The contents of the data frame are clearer if shown
as a contingency table:

<<>>=
ftable(xtabs(count ~ sex + citizen + type, AMSsurvey))
@

A saturated model of these data would represent \code{count} as
a function of the three-way interaction of \code{type}, \code{sex},
and \code{citizen}. But the influences of \code{sex} and \code{citizen}
are independent to each other, so we can use a simplified model \cite{Fox-CAR2011}.
Since we are dealing with count data, the appropriate family would be a
Poisson distribution. All in all, the model could be defined as follows:

<<>>=
mod.ams <- glm(count ~ type*(sex+citizen), family=poisson, data=AMSsurvey)
@

The ANOVA of this model confirms that the high-order interaction terms
that remain are significant:

<<>>=
Anova(mod.ams)
@

The terms of interest are the interactions of \code{type} with \code{sex}
and \code{citizen}. We may have a look on the adjusted means with
\code{interactionMeans}, as in other models.
We tell that we want \code{type} in the \emph{X}\nobreakdash-axis, and
the other factors coded as lines, with the arguments \code{atx} and
\code{traces}, respectively:

<<amsplot>>=
ams.means <- interactionMeans(mod.ams)
plot(ams.means, atx="type", traces=c("sex","citizen"))
@

%
\begin{figure}
\begin{centering}
<<fig=TRUE, echo=FALSE, results=hide>>=
<<amsplot>>
@
\par\end{centering}
\caption{Adjusted (geometric) means of the interactions in \code{mod.ams}\label{fig:plot-ams}}
\end{figure}


The resulting plots (see figure \ref{fig:plot-ams}) are very similar
to the ones shown in previous section, with the exception of the \emph{Y}\nobreakdash-axis
scale, which is nonlinear in this case. The reason is that for GLM,
\code{interactionMeans} does not average over values of the response
variable (the counts of awarded PhDs); the calculations are actually
done in the link function domain, and the plots are drawn in its scale,
although the resulting averages and the \emph{Y}\nobreakdash-axis
labels are eventually transformed to show values of the response.
In this case the link function is $\eta=\log\left(\mu\right)$; therefore,
the adjusted means are \emph{geometric} (not arithmetic) means of
the response variable,%
\footnote{The adjusted link function \emph{is} an arithmetic mean, but then:
$\mu\left[\sum\left(\eta_{i}\right)/n\right]=\exp\left[\sum\left(\log\mu_{i}\right)/n\right]=\prod\mu_{i}^{1/n}$.%
} and the \emph{Y}\nobreakdash-axis is plotted in a logarithmic scale.

Nevertheless, the interpretation of the plot is not very affected
by this issue. The mean number of male doctorates is higher in all
institutions, but the difference seems larger in ``first-class''
universities (both public and private), and smaller in `third-class''
universities or statistics departments (group \code{IV}). On the
other hand, the influence of US-citizenship seems negligible except
for statistics departments, where there are more foreign doctorates.
The tables of \code{testInteractions} just confirm that these differences
are significant:

<<>>=
testInteractions(mod.ams, pairwise=c("type","sex")) # test type:sex
testInteractions(mod.ams, pairwise=c("type","citizen")) #test type:citizen
@

Notice, however, that the interpretation of the column \code{Value}
in these tables requires considering the relation between the link
function and the response variable. We have combined pairwise contrasts,
that result in differences in the domain of the link function; for
instance if we focus on the interaction contrast \code{I(Pu)-II
: Female-Male}, the adjusted value of the link is:

\begin{equation}
\left(\eta_{I(Pu),F}-\eta_{I(Pu)M}\right)-\left(\eta_{II,F}-\eta_{II,M}\right)\end{equation}


But $\eta_{i,j}=\log\left(\mu_{i,j}\right)$, therefore the previous
equation is equivalent to:

\begin{equation}
\log\left(\frac{\mu_{I(Pu),F}}{\mu_{I(Pu),M}}\div\frac{\mu_{II,F}}{\mu_{II,M}}\right)\end{equation}


And if we go back to the response variable domain, the logarithm
is cancelled, so we get that the interaction contrasts is, in this
case, a ``ratio of ratios'', rather than a ``difference of
differences''. In other words, the figure \code{0.47} in the column
\code{Value} for the mentioned interaction contrast means that the
proportion of females vs. males in {}``first-class'' public universities
is $0.47$ times (less than half) the proportion in {}``second-class''
universities. For other families of GLM, the interpretation of the contrasts
in the domain of the response variable may be more obscure; so if desired,
the argument \code{link=TRUE} may be used in all the functions of \pkg{phia}
to force the representation of the adjusted values in the domain of the link function.

GLM may include covariates, and the analysis of interactions may involve
terms that contain them. In that case, the means and contrasts reported
by \code{interactionMeans} or \code{testInteractions} are always
in the link function domain, since the response variable keeps no
linear relation with the predictors at all, and therefore there is
no such thing as {}``slopes'' of that variable, except in very local,
differential environments.

Anyway, if there is some reason to do it, it is possible to obtain
the derivatives of the response variable with respect to the covariates,
via the {}``chain rule''. The slopes reported by the functions of
\pkg{phia} are derivatives of the link function, $\partial\eta/\partial X_{i}$,
and if \code{model} is the name of the object with the fitted GLM,
we can obtain the derivative of the expected response with respect
to the link, $d\mu/d\eta$, as a function:

<<eval=FALSE>>=
dm.de <- family(model)$mu.eta
@

Now, to get the derivative of $\mu$ with respect to the covariate
$X_{i}$ at a specific value $X$, we just have to evaluate \code{de.dm}
at that value --- type \code{eval(dm.de(X))}, with the desired value
bound to \code{X} ---, and then mutliply:

\begin{equation}
\frac{\partial\mu}{\partial X_{i}}\left(X\right)=\frac{\partial\eta}{\partial X_{i}}\frac{d\mu}{d\eta}\left(X\right)\end{equation}


More generally, all the calculations that may be done with classical
linear models can also be applied to GLM, but it is necessary to consider
that the model is defined in terms of the link function, whereas the
outcomes are usually reported in units the response variable, and
the calculations must be defined according to the domain where they
operate.

\section{Mixed-effects models \label{sec:mixed-models}}
\subsection{Differences with fixed-effects models}

Linear and generalized linear models assume error independency between
observations, i.e. that there is no correlation between the values of the residual
error. However, this assumption may be violated in many experiments, where the
observations are clustered in groups associated to noise factors, e.g. sets of
measures taken in different environments, with different instruments, or in
different subjects. Such factors are normally uninteresting for the
purposes of the study, but their influence may nontheless be significant.
This conflict is solved by including the influence of those factors
in the model, so that \eqref{eq:glm-matrix} is expanded as follows
(simplifying the model to an univariate response):

\begin{equation}
\underset{(n\times{}1)}{\mathbf{Y}}=\underset{(n\times{}(r+1))}{\mathbf{X}}\underset{((r+1)\times{}1)}{\mathbf{\beta}}+\underset{(n\times{}s)}{\mathbf{Z}}\underset{(s\times{}1)}{\mathbf{u}}+\underset{(n\times{}1)}{\mathbf{\varepsilon}},\label{eq:mixed-matrix}\end{equation}
where the additional coefficients $\mathbf{u}$ and their corresponding
model matrix $\mathbf{Z}$ are related to the $s$ groups of observations associated
to the noise factors (plus their possible interctions with other factors
and covariates).

Now, if it is possible to assume that the effects of the noise
factors (i.e. the coefficients represented in $\mathbf{u}$) are random
values from a normal distribution, the model may be simplified, so
that instead of fitting the individual values of $\mathbf{u}$, it
is only necessary to fit their covariance matrix $\mathbf{\Psi}$, that
is usually constrained as a function of one or a few parameters. This
is what is known as a \emph{mixed-effects model} (with a combination
of both ``fixed'' and ``random'' effects).%
\footnote{The structure of the random part of the model is similar to the error
term. In fact, in the limit case where all the groups of random factors
only contain one observation, $\mathbf{Z}$ is equivalent
to the identity matrix, and the coefficients $\mathbf{u}$ cannot
be distinguished from the residual errors $\mathbf{\varepsilon}$.
Obviously, in that case no special technique would be necessary for
fitting the model, and the linear regression method would just do
the work (although with an increased residual error).
It is also possible to work with \emph{generalized mixed-effects models}, where
both the coefficients of random effects and the residual error follow a non-normal
distribution.%
} These models may be analysed in \R{} with the functions included in the packages
\pkg{nlme} \cite{Pinheiro-nlme} and \pkg{lme4} \cite{Bates-lme4}.
In the following examples we will use the latter, which is
more flexible and efficient, although the procedure for analysing interactions
is the same in both cases.

The functions of \pkg{phia} for the post-hoc analysis of mixed-effects
models are used just like in the case of linear and generalized linear models.
Multivariate responses in mixed-effects models are not supported, however,
although in some cases this may be worked around by a modification
of the data structure, as will be shown in the examples. 
From a theoretical point of view, another difference is that the tests are not
exact, so there may be concerns about the reliability of the reported
\emph{p}\nobreakdash-values, as will be discussed in the examples as well.

Note, moreover, that the analysis is limited to the fixed effects, for good reasons.
Random effects are normally noise that must be controlled, but are not interesting
for the conclusions of the study, and that is in fact why their effect is
generally simplified as a set of random values.
Accordingly, the parameters that are fitted in the model are not the specific
values of the random effects, but their covariances or other parameters that
define their distribution. In other words: the model does not really tell
anything of those specific values, so we should not interrogate it about them.

Although the level of abstraction needed to understand this principle may be
high in a complex model, it may be better grasped in a trivial model,
defined just as the average of a set of random values, plus the variance of the residual error.
Doing a post-hoc about a random effect in a mixed-effects model is like asking in
this case if two arbitrary observations are too far apart. This may be a legitimate
diagnostic question at best, but a positive answer would just mean that the
chosen model is inadequate for those data.

\subsection{Fitting and analysing a mixed-effects model}

As an example to illustrate the post-hoc analysis of mixed-effects models,
we will use Snijders' and Bosker's data with language achievements of 2287
high-school students \cite{Snijders1999}, included in the package \pkg{nlme}.
Following one of the examples worked by Snijders and Bosker,
we will study the scores of a language test as a regression of the students' IQ
in verbal tasks and their socio-economical status (SES), possibly influenced by
their gender, and the average SES and IQ of their schoolmates.
In addition, we will explore the strategy to analyse repeated-measures with a
mixed-effects model, by comparing the achievements in two repetitions of the
language test: at the start and end of the school year.
All in all, the variables of interest in the original data frame \code{bdf} are:

\begin{description}
\item[\code{langPRET}:] Score in the first repetition of the test.
\item[\code{langPRET}:] Score in the second repetition of the test.
\item[\code{pupilNR}:] Student code.
\item[\code{IQ.ver.cen}:] Student IQ (school-centered).
\item[\code{ses}:] Student socio-economical status (SES).
\item[\code{sex}:] Student gender.
\item[\code{schoolNR}:] School code.
\item[\code{schoolSES}:] School average SES
\item[\code{avg.IQ.ver.cen}:] School average IQ.
\end{description}

We first create a subset of the data frame with those variables,
taken from the namespace of \pkg{nlme} without attaching it
(with the operator \code{::}).%
\footnote{ Since the namespaces of \pkg{nlme} and \pkg{lme4} partially overlap,
it is not advisable to load both packages in the same session.%
} We will also add meaningful labels to the \code{sex} factor, and simplify
some variable names for the sake of clarity.

<<>>=
Snijders <- nlme::bdf[c("langPRET","langPOST",   # Outcomes
    "pupilNR", "IQ.ver.cen", "ses", "sex",       # Student-related variables
    "schoolNR", "schoolSES", "avg.IQ.ver.cen")]  # School-related variables
Snijders$sex <- factor(Snijders$sex, labels=c("F","M"))
names(Snijders) <-
    c("score.1","score.2","student","IQ","SES","sex","school","avgSES","avgIQ")
@

Following Snijders and Bosker, for a better model fit we will also include
quadratic components of IQ around $0$ (remember that the \code{IQ} variable is
centered) \cite[p.~113]{Snijders1999}.

<<>>=
Snijders$IQ2.pos <- with(Snijders, (IQ > 0)*IQ^2)
Snijders$IQ2.neg <- with(Snijders, (IQ < 0)*IQ^2)
@

The model proposed by Snijders and Bosker was focused on the result of one of
the tests (say \code{score.2}, and contained the following fixed terms:
the crossed linear effect of IQ and SES, both at student level
(\code{IQ*ses}) and at school level (\code{avgIQ*avgSES}), the quadratic
effects of IQ (\code{IQ2.pos}, \code{IQ2.neg}), and the effect of gender
(\code{sex}). They also considered that the average scores and the linear
effect of IQ could be grouped by school. Such a model could be fitted as follows:

<<results=hide>>=
library(lme4)
@
<<eval=FALSE>>=
form1 <- score.2 ~ 
    IQ * SES + IQ2.pos + IQ2.neg + sex + avgIQ * avgSES +    # Fixed part
    (IQ | school)                                            # Random part
mod.snijders.1 <- lmer(form1, data=Snijders)
@

Now, this is one of the cases where we cannot use the multivariate approach
described in section \ref{sec:repeated-measures} for analysing the effect of repeating
the test. We could attempt a similar analysis step-by-step, fitting two models
with different dependent variables, corresponding to the response transformations
that are used in the analysis of within-subjects effects.
Both models would have the same right-hand side of the formula as \code{form1}
above. The dependent variable in one of them would be the average of \code{score.1}
and \code{score.2}; in the other one it would be their difference:

<<eval=FALSE>>=
form2.1 <- update(form1, (score.1+score.2)/2~.)
form2.2 <- update(form1, (score.2-score.1)~.)
@

If we consider that \code{score.1} and \code{score.2} are two subsets of the same
variable, distinguished by a two-level factor (say \code{repetition}),
the model fitted with \code{form2.1} would tell us the effects of the terms defined
in that formula on the pooled variable, whereas the model fitted with \code{form2.2}
would define the effects of the interaction between \code{repetition} and those
terms (i.e. \code{repetition:IQ}, \code{repetition:SES}, etc.).

On the other hand, we may fit a single model that accounts for all the terms that
we want to analyse with more flexibility. For that, we first have to transform the
``wide'' data frame with \code{score.1} and \code{score.2} as different columns,
into a ``long'' data frame with both of them in one column.
We will also need the \code{repetition} factor explicitly defined in the data frame,
and a variable that identifies each original row (the already existing variable
\code{student} in the original frame can do that work).
We can get all this with the \code{reshape} function:

<<>>=
Snijders.long <- reshape(Snijders, direction="long", idvar="student",
    varying=list(c("score.1","score.2")), v.names="score", timevar="repetition")
# The within-subjects factor must be coded as a factor
Snijders.long$repetition <- as.factor(Snijders.long$repetition)
# See the variables of the long data frame
str(Snijders.long)
@

Now we can use this data frame to fit a model based on the formula \code{form1},
adding the interactions of \code{repetition} with the terms we want (in the
multivariate approach, all the interactions of between-subjects and
within-subjects terms are always included). The price that is paid for that is
the addition of the random effects of \code{student}, since now we have more than
one observation for each value of this factor.

We will consider that the repetition only interacts with the student-related
terms. Regarding the random effects, the \code{student} factor can only influence
the \emph{intercept} term, since all the other terms of the model are constant
for the two measures of each student. Thus, the new formula and the model fitted
with it will be:

<<>>=
form3 <- score ~ 
    repetition * (IQ * SES + IQ2.pos + IQ2.neg + sex) +   # Student-related
    avgIQ * avgSES +                                      # School-related
    (IQ | school) + (1 | student)                         # Random part
mod.snijders.3 <- lmer(form3, data=Snijders.long)
# See the main parameters of the model (ommit correlations table)
print(mod.snijders.3, correlation=FALSE)
@

As said above, the methods for analysing a model like this are similar to the
ones used in fixed-effects models. However, there are important \emph{caveats}
about the results that must be considered.

The main objective of many researchers in their using statistical techniques for
data analysis is to obtain \emph{p}\nobreakdash-values to accept or reject their
hypothesis, but the calculation of those values in mixed-effects models is
problematic. That is the reason why the table of coefficients of a model fitted
by \code{lmer} does not show \emph{p}\nobreakdash-values, as may be seen above, nor
are they presented in the ANOVA table given by the standard \code{anova} method
\cite{R-help:094765}.%
\footnote{Mixed models fitted with \code{lme} from the package \code{nlme} do
show \emph{p}\nobreakdash-values, but this does not mean that they should always be trusted.%
}

There are different techniques for working around this problem, like the
Kenward-Roger approximation or parametric bootstrap, which are implemented
in the package \pkg{pbkrtest} \cite{Halekoh-pbkrtest}. An alternative approach is assuming that the
covariances of the random part of the model are equal to the estimated values,
and then use standard tests \cite{Faraway2006}. The function \code{Anova} from
\pkg{car} produces tables with \emph{p}\nobreakdash-values based on
Wald tests, following the latter assumption \cite{R-sig-ME:014095}, and by
extension, the same happens with the test functions of the package \pkg{phia},
which are also dependent on the function \code{linearHypohtesis} from \pkg{car}.

This being said, we can first look what terms are significant according to the
Wald tests:

<<>>=
Anova(mod.snijders.3)
@

The ANOVA table shows that most terms of our model are significant for the
Wald test at $\alpha{}=0.05$. The exceptions are the terms related to \code{avgSES}, and
the second-order interaction of \code{repetition} with \code{IQ} and \code{SES}.

All the remaining main and interaction effects of factors and covariates could
further be investigated, as explained in the previous sections. Just as
an example, let us focus on the only interaction between factors:
\code{repetition:sex}. We can see the table of average scores, and calculate the
simple main effects and pairwise interactions:

<<>>=
# Cell means
interactionMeans(mod.snijders.3)
# Simple effect of sex at each repetition
testInteractions(mod.snijders.3, fixed="repetition", across="sex")
# Pairwise interactions (default)
testInteractions(mod.snijders.3)
@

These results show that female students had significantly lower scores than boys,
and that in the second repetition that difference was two times the initial one.
Note, however, that the size of the difference is small in spite of the statistical
significance (gender differences were between 1 and 3 points, for scores that
were normally over 30 points).

That is something that often happens when models are fitted to very large data
bases, and that reminds us that \emph{p}\nobreakdash-values must always be taken
carefully and with a critical thought.
In this case, where we used thousands of observations, we may suspect that many
of the effects with low \emph{p}\nobreakdash-values in the ANOVA table may be
irrelevant. In fact, if we look at the coefficients of the model (in the table
above) and the ranges of the associated covariates in the data frame, we can
see that most effects are negligible in size when compared with the
differences associated to \code{repetition} and the linear effect of \code{IQ}.
This means that they are not relevant from a practical point of view, even
though they may contribute to a significantly better fit of the model.


\section*{Acknowledgements}

I thank Professors John Fox and Michael Friendly, for their encouragement
and advise in the creation of the functions of \pkg{phia}.



\bibliographystyle{ieeetr}
\bibliography{bibliography,bibextra}

\end{document}
<<>>=
graphics.off()
@

