% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/RcppExports.R
\name{optimPibbleCollapsed}
\alias{optimPibbleCollapsed}
\title{Function to Optimize the Collapsed Pibble Model}
\usage{
optimPibbleCollapsed(
  Y,
  upsilon,
  ThetaX,
  KInv,
  AInv,
  init,
  n_samples = 2000L,
  calcGradHess = TRUE,
  b1 = 0.9,
  b2 = 0.99,
  step_size = 0.003,
  epsilon = 1e-06,
  eps_f = 1e-10,
  eps_g = 1e-04,
  max_iter = 10000L,
  verbose = FALSE,
  verbose_rate = 10L,
  decomp_method = "cholesky",
  optim_method = "lbfgs",
  eigvalthresh = 0,
  jitter = 0,
  multDirichletBoot = -1,
  useSylv = TRUE,
  ncores = -1L,
  seed = -1L
)
}
\arguments{
\item{Y}{D x N matrix of counts}

\item{upsilon}{(must be > D)}

\item{ThetaX}{D-1 x N matrix formed by Theta*X (Theta is Prior mean
for regression coefficients)}

\item{KInv}{D-1 x D-1 precision matrix (inverse of Xi)}

\item{AInv}{N x N precision matrix given by \eqn{(I_N + X'*Gamma*X)^{-1}}}

\item{init}{D-1 x N matrix of initial guess for eta used for optimization}

\item{n_samples}{number of samples for Laplace Approximation (=0 very fast
as no inversion or decomposition of Hessian is required)}

\item{calcGradHess}{if n_samples=0 should Gradient and Hessian
still be calculated using closed form solutions?}

\item{b1}{(ADAM) 1st moment decay parameter (recommend 0.9) "aka momentum"}

\item{b2}{(ADAM) 2nd moment decay parameter (recommend 0.99 or 0.999)}

\item{step_size}{(ADAM) step size for descent (recommend 0.001-0.003)}

\item{epsilon}{(ADAM) parameter to avoid divide by zero}

\item{eps_f}{(ADAM) normalized function improvement stopping criteria}

\item{eps_g}{(ADAM) normalized gradient magnitude stopping criteria}

\item{max_iter}{(ADAM) maximum number of iterations before stopping}

\item{verbose}{(ADAM) if true will print stats for stopping criteria and
iteration number}

\item{verbose_rate}{(ADAM) rate to print verbose stats to screen}

\item{decomp_method}{decomposition of hessian for Laplace approximation
'eigen' (more stable-slightly, slower) or 'cholesky' (less stable, faster, default)}

\item{optim_method}{(default:"lbfgs") or "adam"}

\item{eigvalthresh}{threshold for negative eigenvalues in
decomposition of negative inverse hessian (should be <=0)}

\item{jitter}{(default: 0) if >=0 then adds that factor to diagonal of Hessian
before decomposition (to improve matrix conditioning)}

\item{multDirichletBoot}{if >0 then it overrides laplace approximation and samples
eta efficiently at MAP estimate from pseudo Multinomial-Dirichlet posterior.}

\item{useSylv}{(default: true) if N<D-1 uses Sylvester Determinant Identity
to speed up calculation of log-likelihood and gradients.}

\item{ncores}{(default:-1) number of cores to use, if ncores==-1 then
uses default from OpenMP typically to use all available cores.}

\item{seed}{(random seed for Laplace approximation -- integer)}
}
\value{
List containing (all with respect to found optima)
\enumerate{
\item LogLik - Log Likelihood of collapsed model (up to proportionality constant)
\item Gradient - (if \code{calcGradHess}=true)
\item Hessian - (if \code{calcGradHess}=true) of the POSITIVE LOG POSTERIOR
\item Pars - Parameter value of eta at optima
\item Samples - (D-1) x N x n_samples array containing posterior samples of eta
based on Laplace approximation (if n_samples>0)
\item Timer - Vector of Execution Times
\item logInvNegHessDet - the log determinant of the covariacne of the Laplace
approximation, useful for calculating marginal likelihood
\item logMarginalLikelihood - A calculation of the log marginal likelihood based on
the laplace approximation
}
}
\description{
See details for model. Should likely be followed by function
\code{\link{uncollapsePibble}}. Notation: \code{N} is number of samples,
\code{D} is number of multinomial categories, and \code{Q} is number
of covariates.
}
\details{
Notation: Let \eqn{Z_j} denote the J-th row of a matrix Z.
Model:
\deqn{Y_j \sim Multinomial(\pi_j)}{Y_j \sim Multinomial(Pi_j)}
\deqn{\pi_j = \Phi^{-1}(\eta_j)}{Pi_j = Phi^(-1)(Eta_j)}
\deqn{\eta \sim T_{D-1, N}(\upsilon, \Theta X, K, A)}{Eta \sim T_{D-1, N}(upsilon, Theta*X, K, A)}
Where \eqn{A = I_N + X  \Gamma X'}{A = I_N + X  Gamma  X'}, K is a (D-1)x(D-1) covariance
matrix, \eqn{\Gamma}{Gamma} is a Q x Q covariance matrix, and \eqn{\Phi^{-1}}{Phi^(-1)} is ALRInv_D
transform.

Gradient and Hessian calculations are fast as they are computed using closed
form solutions. That said, the Hessian matrix can be quite large
[N*(D-1) x N*(D-1)] and storage may be an issue.

Note: Warnings about large negative eigenvalues can either signal
that the optimizer did not reach an optima or (more commonly in my experience)
that the prior / degrees of freedom for the covariance (given by parameters
\code{upsilon} and \code{KInv}) were too specific and at odds with the observed data.
If you get this warning try the following.
\enumerate{
\item Try restarting the optimization using a different initial guess for eta
\item Try decreasing (or even increasing )\code{step_size} (by increments of 0.001 or 0.002)
and increasing \code{max_iter} parameters in optimizer. Also can try
increasing \code{b1} to 0.99 and decreasing \code{eps_f} by a few orders
of magnitude
\item Try relaxing prior assumptions regarding covariance matrix. (e.g., may want
to consider decreasing parameter \code{upsilon} closer to a minimum value of
D)
\item Try adding small amount of jitter (e.g., set \code{jitter=1e-5}) to address
potential floating point errors.
}
}
\examples{
sim <- pibble_sim()

# Fit model for eta
fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta\%*\%sim$X, sim$KInv, 
                             sim$AInv, random_pibble_init(sim$Y))  
}
\references{
S. Ruder (2016) \emph{An overview of gradient descent
optimization algorithms}. arXiv 1609.04747

JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee.
\emph{Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes}.
2022, Journal of Machine Learning
}
\seealso{
\code{\link{uncollapsePibble}}
}
