% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/tune_grid.R
\name{tune_grid}
\alias{tune_grid}
\alias{tune_grid.model_spec}
\alias{tune_grid.workflow}
\title{Model tuning via grid search}
\usage{
tune_grid(object, ...)

\method{tune_grid}{model_spec}(
  object,
  preprocessor,
  resamples,
  ...,
  param_info = NULL,
  grid = 10,
  metrics = NULL,
  control = control_grid()
)

\method{tune_grid}{workflow}(
  object,
  resamples,
  ...,
  param_info = NULL,
  grid = 10,
  metrics = NULL,
  control = control_grid()
)
}
\arguments{
\item{object}{A \code{parsnip} model specification or a \code{\link[workflows:workflow]{workflows::workflow()}}.}

\item{...}{Not currently used.}

\item{preprocessor}{A traditional model formula or a recipe created using
\code{\link[recipes:recipe]{recipes::recipe()}}.}

\item{resamples}{An \code{rset()} object.}

\item{param_info}{A \code{\link[dials:parameters]{dials::parameters()}} object or \code{NULL}. If none is given,
a parameters set is derived from other arguments. Passing this argument can
be useful when parameter ranges need to be customized.}

\item{grid}{A data frame of tuning combinations or a positive integer. The
data frame should have columns for each parameter being tuned and rows for
tuning parameter candidates. An integer denotes the number of candidate
parameter sets to be created automatically.}

\item{metrics}{A \code{\link[yardstick:metric_set]{yardstick::metric_set()}} or \code{NULL}.}

\item{control}{An object used to modify the tuning process.}
}
\value{
An updated version of \code{resamples} with extra list columns for \code{.metrics} and
\code{.notes} (optional columns are \code{.predictions} and \code{.extracts}). \code{.notes}
contains warnings and errors that occur during execution.
}
\description{
\code{\link[=tune_grid]{tune_grid()}} computes a set of performance metrics (e.g. accuracy or RMSE)
for a pre-defined set of tuning parameters that correspond to a model or
recipe across one or more resamples of the data.
}
\details{
Suppose there are \emph{m} tuning parameter combinations. \code{\link[=tune_grid]{tune_grid()}} may not
require all \emph{m} model/recipe fits across each resample. For example:

\itemize{
\item In cases where a single model fit can be used to make predictions
for different parameter values in the grid, only one fit is used.
For example, for some boosted trees, if 100 iterations of boosting
are requested, the model object for 100 iterations can be used to
make predictions on iterations less than 100 (if all other
parameters are equal).
\item When the model is being tuned in conjunction with pre-processing
and/or post-processing parameters, the minimum number of fits are
used. For example, if the number of PCA components in a recipe step
are being tuned over three values (along with model tuning
parameters), only three recipes are trained. The alternative
would be to re-train the same recipe multiple times for each model
tuning parameter.
}

The \code{foreach} package is used here. To execute the resampling iterations in
parallel, register a parallel backend function. See the documentation for
\code{\link[foreach:foreach]{foreach::foreach()}} for examples.

For the most part, warnings generated during training are shown as they occur
and are associated with a specific resample when \code{control(verbose = TRUE)}.
They are (usually) not aggregated until the end of processing.
}
\section{Parameter Grids}{


If no tuning grid is provided, a semi-random grid (via
\code{\link[dials:grid_max_entropy]{dials::grid_latin_hypercube()}}) is created with 10 candidate parameter
combinations.

When provided, the grid should have column names for each parameter and
these should be named by the parameter name or \code{id}. For example, if a
parameter is marked for optimization using \code{penalty = tune()}, there should
be a column named \code{penalty}. If the optional identifier is used, such as
\code{penalty = tune(id = 'lambda')}, then the corresponding column name should
be \code{lambda}.

In some cases, the tuning parameter values depend on the dimensions of the
data. For example, \code{mtry} in random forest models depends on the number of
predictors. In this case, the default tuning parameter object requires an
upper range. \code{\link[dials:finalize]{dials::finalize()}} can be used to derive the data-dependent
parameters. Otherwise, a parameter set can be created (via
\code{\link[dials:parameters]{dials::parameters()}}) and the \code{dials} \code{update()} function can be used to
change the values. This updated parameter set can be passed to the function
via the \code{param_info} argument.
}

\section{Performance Metrics}{


To use your own performance metrics, the \code{\link[yardstick:metric_set]{yardstick::metric_set()}} function
can be used to pick what should be measured for each model. If multiple
metrics are desired, they can be bundled. For example, to estimate the area
under the ROC curve as well as the sensitivity and specificity (under the
typical probability cutoff of 0.50), the \code{metrics} argument could be given:

\preformatted{
  metrics = metric_set(roc_auc, sens, spec)
}

Each metric is calculated for each candidate model.

If no metric set is provided, one is created:
\itemize{
\item For regression models, the root mean squared error and coefficient
of determination are computed.
\item For classification, the area under the ROC curve and overall accuracy
are computed.
}

Note that the metrics also determine what type of predictions are estimated
during tuning. For example, in a classification problem, if metrics are used
that are all associated with hard class predictions, the classification
probabilities are not created.

The out-of-sample estimates of these metrics are contained in a list column
called \code{.metrics}. This tibble contains a row for each metric and columns
for the value, the estimator type, and so on.

\code{\link[=collect_metrics]{collect_metrics()}} can be used for these objects to collapse the results
over the resampled (to obtain the final resampling estimates per tuning
parameter combination).
}

\section{Obtaining Predictions}{


When \code{control(save_preds = TRUE)}, the output tibble contains a list column
called \code{.predictions} that has the out-of-sample predictions for each
parameter combination in the grid and each fold (which can be very large).

The elements of the tibble are tibbles with columns for the tuning
parameters, the row number from the original data object (\code{.row}), the
outcome data (with the same name(s) of the original data), and any columns
created by the predictions. For example, for simple regression problems, this
function generates a column called \code{.pred} and so on. As noted above, the
prediction columns that are returned are determined by the type of metric(s)
requested.

This list column can be \code{unnested} using \code{\link[tidyr:nest]{tidyr::unnest()}} or using the
convenience function \code{\link[=collect_predictions]{collect_predictions()}}.
}

\section{Extracting Information}{


The \code{extract} control option will result in an additional function to be
returned called \code{.extracts}. This is a list column that has tibbles
containing the results of the user's function for each tuning parameter
combination. This can enable returning each model and/or recipe object that
is created during resampling. Note that this could result in a large return
object, depending on what is returned.

The control function contains an option (\code{extract}) that can be used to
retain any model or recipe that was created within the resamples. This
argument should be a function with a single argument. The value of the
argument that is given to the function in each resample is a workflow
object (see \code{\link[workflows:workflow]{workflows::workflow()}} for more information). There are two
helper functions that can be used to easily pull out the recipe (if any)
and/or the model: \code{\link[=extract_recipe]{extract_recipe()}} and \code{\link[=extract_model]{extract_model()}}.

As an example, if there is interest in getting each model back, one could use:
\preformatted{
  extract = function (x) extract_fit_parsnip(x)
}

Note that the function given to the \code{extract} argument is evaluated on
every model that is \emph{fit} (as opposed to every model that is \emph{evaluated}).
As noted above, in some cases, model predictions can be derived for
sub-models so that, in these cases, not every row in the tuning parameter
grid has a separate R object associated with it.
}

\examples{
\donttest{
library(recipes)
library(rsample)
library(parsnip)
library(workflows)
library(ggplot2)

# ---------------------------------------------------------------------------

set.seed(6735)
folds <- vfold_cv(mtcars, v = 5)

# ---------------------------------------------------------------------------

# tuning recipe parameters:

spline_rec <-
  recipe(mpg ~ ., data = mtcars) \%>\%
  step_ns(disp, deg_free = tune("disp")) \%>\%
  step_ns(wt, deg_free = tune("wt"))

lin_mod <-
  linear_reg() \%>\%
  set_engine("lm")

# manually create a grid
spline_grid <- expand.grid(disp = 2:5, wt = 2:5)

# Warnings will occur from making spline terms on the holdout data that are
# extrapolations.
spline_res <-
  tune_grid(lin_mod, spline_rec, resamples = folds, grid = spline_grid)
spline_res


show_best(spline_res, metric = "rmse")

# ---------------------------------------------------------------------------

# tune model parameters only (example requires the `kernlab` package)

car_rec <-
  recipe(mpg ~ ., data = mtcars) \%>\%
  step_normalize(all_predictors())

svm_mod <-
  svm_rbf(cost = tune(), rbf_sigma = tune()) \%>\%
  set_engine("kernlab") \%>\%
  set_mode("regression")

# Use a space-filling design with 7 points
set.seed(3254)
svm_res <- tune_grid(svm_mod, car_rec, resamples = folds, grid = 7)
svm_res

show_best(svm_res, metric = "rmse")

autoplot(svm_res, metric = "rmse") +
  scale_x_log10()

# ---------------------------------------------------------------------------

# Using a variables preprocessor with a workflow

# Rather than supplying a preprocessor (like a recipe) and a model directly
# to `tune_grid()`, you can also wrap them up in a workflow and pass
# that along instead (note that this doesn't do any preprocessing to
# the variables, it passes them along as-is).
wf <- workflow() \%>\%
  add_variables(outcomes = mpg, predictors = everything()) \%>\%
  add_model(svm_mod)

set.seed(3254)
svm_res_wf <- tune_grid(wf, resamples = folds, grid = 7)
}
}
\seealso{
\code{\link[=control_grid]{control_grid()}}, \code{\link[=tune]{tune()}}, \code{\link[=fit_resamples]{fit_resamples()}},
\code{\link[=autoplot.tune_results]{autoplot.tune_results()}}, \code{\link[=show_best]{show_best()}}, \code{\link[=select_best]{select_best()}},
\code{\link[=collect_predictions]{collect_predictions()}}, \code{\link[=collect_metrics]{collect_metrics()}}
}
