\name{mudfold}
\alias{mudfold}
%- Also NEED an '\alias' for EACH other topic documented here.
\title{MUDFOLD: Van Schuur's nonparametric IRT model for dichotomous responses that have been generated by an unfolding process. 
}
\description{
This function is used to fit a unidimensional unfolding scale to the responses of individuals on a set of categorically scored attitudinal items. Fitting is done through Van Schuur's scaling algorithm that determines if a set of items are indicators of the same unobserved latent contstruct such as preference, attitude, ideology etc. Core in this model are the scalability coefficients that are used to assess the fit of the scale and the items to the data.

Diagnostic statistics that are used to test the model assumptions are borrowed from the nonparametric unfolding model of Post(1992). Uncertainty estimates for the scalability coefficients and the diagnostic statistics both for the scale and the individual items are obtained using nonparametric ordinary bootstrap. A bootstrap estimate of the scale is obtained as the most frequently observed scale in \eqn{R} bootstrap iterations.
}
\usage{
mudfold( data, estimation, lambda1, lambda2, start.scale, 
nboot, missings, nmice, seed, mincor, ...)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
\item{data}{: A binary matrix or data frame containing the responses of \code{nrow(data)} persons
to \code{ncol(data)} items. Missing values in \code{data} are not allowed.
}
\item{estimation}{: This argument controls the nonparametric estimation method for person locations. By deafult this argument equals to \code{"rank"} and implies that Van Schuur's estimator will be used in order to estimate person parameters. The user can set this argument to \code{"quantile"} and then an estimator proposed by Johnson is applied to obtain the person locations.
}
\item{lambda1}{: User specified numerical value that is used as a lower boundary for the scalability criterion of the first step of the item selection algorithm, and in the item scalability criterion at the end of the scale expansion. Default value is \eqn{\lambda_1=0.3} but it can be any value between \eqn{-\infty} and \eqn{1} (i.e., \eqn{\lambda_1 \in \left(-\infty,1\right]}). The higher the value of \eqn{\lambda_1} the stricter the scalability criteria of the algorithm. 
}
\item{lambda2}{: User specified numerical value that controls explicitly the first scalability criterion of the scale expansion. In the default settings \eqn{\lambda_2=0}, however, the user can choose a negative value for \eqn{\lambda_2}, which leads to less strict scalability criterion in the beginning of the scale expansion.
}
\item{start.scale}{: An ordered character vector with item names from \code{colnames(data)}. The length of this vector should be greater than or equal to \eqn{3} and less than or equal to \code{ncol(data)}. This ordered item set is used as a startset for the scale extension phase of MUDFOLD method. If \code{start.scale=NULL} the standard MUDFOLD method is fitted to the data.
}

\item{nboot}{: Argument that controls the number of bootstrap iterations. If \code{nboot=NULL} (default) no bootstrap is applied. 
}

\item{missings}{: Argument that controls how the missing values should be treated. If \code{missings="omit"} (default) list-wise deletion is applied to \code{data}. If \code{missings="impute"} then the mice function is applied to \code{data} in order to impute the missings \code{nmice} times.
}

\item{nmice}{: Argument that controls the number of mice imputations (This argument is used only when \code{missings="impute"} and \code{nboot=NULL}. 
}

\item{seed}{: Argument that is used for reproducibility of bootstrap results. 
}
\item{mincor}{: This can be scalar, numeric vector (of size \code{ncol(data)}) or numeric matrix (square, of size \code{ncol(data)} specifying the minimum threshold(s) against which the absolute correlation in the data is compared. See \code{?mice:::quickpred} for more details.
}
\item{...}{: Any additional arguments that are passed to the \code{boot} function from the package \pkg{boot}. See \code{?boot::boot}.
}

}
\details{
This function incorporates a two-step algorithm that determines an unfolding scale from observed binary \code{data}. In the first step of the algorithm the best minimal scale that consists of three items is determined. In the second step, the minimal scale from the first step is expanded iteratively by adding the best fitting item in each iteration. The first step of the algorithm can be skiped with the argument \code{start} which can be used for setting manually an item rank order that will be extended in the second step of the item selection algorithm. The resulting scale consists of the best \code{m} fitting items based on scalability criteria (where \code{m} \eqn{\le} \code{ncol(data)}). 

In \code{mudfold} function, the user can specify a value \eqn{\lambda_1} that will be used as a lower bound in the scalability criteria of the MUDFOLD algorithm. By default, the lower bound for the scalability coefficients is \code{lambda1=0.3}. The user can choose a second value \eqn{\lambda_2} that will be used as a lower bound only for the second step of the algorithm (by default, \code{lambda2=0}). The parameter \eqn{\lambda_2} is used mostly, in order to relax the first scalability criterion of the second step. Generally, values greater than \eqn{0.3} for \eqn{\lambda_1}, and \eqn{\lambda_2} lead to very strict criteria while negative values relax these criteria.

Uncertainty estimates of the MUDFOLD statistics can be calculated with the argument \code{nboot} of the \code{mudfold} function. When \code{nboot} is an integer then \code{nboot} bootstrap iterations will run to obtain the variance parameter for each MUDFOLD statistic. Missing values are either list-wise deleted or they are imputed \code{nmice} times when \code{nboot=NULL} and \code{missings="impute"}. If the argument \code{nboot} is not \code{NULL} and  \code{missings="impute"} then each resampled dataset in bootstrap iterations is imputed once before we fit a MUDFOLD scale.

Moreover, the user is able to choose between two nonparametric estimation methods in order to obtain person parameters that are estimated using the item ranks from the MUDFOLD algorithm. The default setting (i.e., \code{estimation="rank"}) uses an estimation proposed by Van Schuur(1984) based on item ranks. Alternatively, an estimation method described by Johnson(2005), which uses item quantiles for estimating person parameters, can be used by setting \code{estimation="quantile"}.
}

\value{
The function \code{mudfold} returns a list of class \code{"mdf"} with the following components:
  \item{CALL}{A list where its components provide information for the function call.}
  \item{CHECK}{A list where its components provide information from the data checking step.}
  \item{DESCRIPTIVES}{A list with descriptive statistics for the \code{data}.}
  \item{MUDFOLD_INFO}{A list with three main components. The first component is called \code{triple_stats} and is a list where in each element contains the observed errors, expected errors, and scalability coefficients for each item triple. The second element is a list called \code{first_step} and contains the results of the first step of the MUDFOLD item selection algorithm. The third element of this list is called \code{second_step}  and is a list with the MUDFOLD statistics and parameter estimates for the given scale.}
If bootstrap is applied, then, an additional component is included in the output. This component is called \code{BOOTSTRAP} and is a list that contains the output of \code{nboot} bootstrap iterations. 

}

\author{Spyros E. Balafas (auth.), Wim P. Krijnen (auth.), Wendy J. Post (contr.), Ernst C. Wit (auth.)

Maintainer: Spyros E. Balafas (\email{s.balafas@rug.nl})
}

\references{

W.H. Van Schuur.(1984). \emph{Structure in Political Beliefs: A New Model for
Stochastic Unfolding with Application to European Party Activists}. CT Press.

W.J. Post. (1992). \emph{Nonparametric Unfolding Models: A Latent Structure Approach}. M
& T series. DSWO Press.

W.J. Post. and T.AB. Snijders. (1993).Nonparametric unfolding models for dichotomous
data. Methodika.

M.S. Johnson. (2006). Nonparametric Estimation of Item and Respondent Locations from Unfolding-type Items. Psychometrica


}

\examples{
\dontrun{
#####################################
#### MUDFOLD method on real data ####
#####################################



###########################################################################
###### MUDFOLD method on ANDRICH data (see Post and Snijders pp.147) ######
###########################################################################
data(ANDRICH)
## fit MUDFOLD on ANDRICH data ##
fit_andr <- mudfold(ANDRICH)

## generic functions for the S3 class .mdf object fit ##
## print.mdf
print(fit_andr)
## summary.mdf
summary(fit_andr)
## plot.mdf
plot(fit_andr)


## fit MUDFOLD on ANDRICH data with bootsrap ##
fit_andr_boot <- mudfold(ANDRICH, nboot=100)

## generic functions for the S3 class .mdf object fit ##
## print.mdf
print(fit_andr_boot)
## summary.mdf
summary(fit_andr_boot, boot=TRUE)
## plot.mdf
plot(fit_andr_boot)

############################################
###### MUDFOLD method on EURPAR2 data ######
############################################
data("EURPAR2")

## fit MUDFOLD on EURPAR2 data ##
fit_eurp <- mudfold(EURPAR2)

## print
print(fit_eurp)

## summary
summary(fit_eurp)

## plot
plot(fit_eurp)

###########################################
###### MUDFOLD method on Plato7 data ######
###########################################

data("Plato7")

## transform to binary data
## using as threshold the mean
## per row of Plato7

dat_plato <- pick(Plato7)

## fit MUDFOLD on Plato7 data ##
fit_plato <- mudfold(dat_plato, nboot=1000)

## print
print(fit_plato)

## summary
summary(fit_plato, boot=TRUE)

## plot
plot(fit_plato, plot.type="scale")
plot(fit_plato, plot.type="IRF")
plot(fit_plato, plot.type="persons")


##########################################
#### MUDFOLD method on simulated data ####
##########################################

### Data with the responses of
### n=3000 on p=20 items

simulation1 <- mudfoldsim(N=20, n=3000, gamma1=2, gamma2=-10, zeros=FALSE,seed = 1)
dat_sim1 <- simulation1$dat

## fit MUDFOLD on simulated data ##
fit.sim1 <- mudfold(dat_sim1)

# print
fit.sim1

# summary
summary(fit.sim1)

# plot
plot(fit.sim1)

### Data with the responses of
### n=3000 on N=26 items

simulation2 <- mudfoldsim(N=26, n=3000, gamma1=2, gamma2=-10, zeros=FALSE,seed = 1)
dat_sim2 <- simulation2$dat

## fit MUDFOLD on simulated data ##
fit.sim2 <- mudfold(dat_sim2)

# print
fit.sim2

# summary
summary(fit.sim2)

# plot
plot(fit.sim2, plot.type="scale")
plot(fit.sim2, plot.type="IRF")
plot(fit.sim2, plot.type="persons")

}
}

