% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/PipeOpTaskPreproc.R
\docType{data}
\name{PipeOpTaskPreproc}
\alias{PipeOpTaskPreproc}
\title{PipeOpTaskPreproc}
\format{Abstract \code{\link{R6Class}} inheriting from \code{\link{PipeOp}}.}
\description{
Base class for handling most "preprocessing" operations. These
are operations that have exactly one \code{\link[mlr3:Task]{Task}} input and one \code{\link[mlr3:Task]{Task}} output,
and expect the column layout of these \code{\link[mlr3:Task]{Task}}s during input and output
to be the same.

Users must implement \code{$train_task()} and \code{$predict_task()}, which have a \code{\link[mlr3:Task]{Task}}
input and should return that \code{\link[mlr3:Task]{Task}}. The \code{\link[mlr3:Task]{Task}} should, if possible, be
manipulated in-place, and should not be cloned.

Alternatively, the \code{$train_dt()} and \code{$predict_dt()} functions can be implemented, which operate on
\code{\link[data.table:data.table]{data.table}} objects instead. This should generally only be done if all
data is in some way altered (e.g. PCA changing all columns to principal components) and not if only
a few columns are added or removed (e.g. feature selection) because this should be done at the \code{\link[mlr3:Task]{Task}}-level
with \code{$train_task()}. The \code{$select_cols()} function can be overloaded for \code{$train_dt()} and \code{$predict_dt()}
to operate only on subsets of the \code{\link[mlr3:Task]{Task}}'s data, e.g. only on numerical columns.

If the \code{can_subset_cols} argument of the constructor is \code{TRUE} (the default), then the hyperparameter \code{affect_columns}
is added, which can limit the columns of the \code{\link[mlr3:Task]{Task}} that is modified by the \code{\link{PipeOpTaskPreproc}}
using a \code{\link{Selector}} function. Note this functionality is entirely independent of the \code{$select_cols()} functionality.

\code{\link{PipeOpTaskPreproc}} is useful for operations that behave differently during training and prediction. For operations
that perform essentially the same operation and only need to perform extra work to build a \code{$state} during training,
the \code{\link{PipeOpTaskPreprocSimple}} class can be used instead.
}
\section{Construction}{
\preformatted{PipeOpTaskPreproc$new(id, param_set = ParamSet$new(), param_vals = list(), can_subset_cols = TRUE, packages = character(0), task_type = "Task")
}
\itemize{
\item \code{id} :: \code{character(1)}\cr
Identifier of resulting object. See \code{$id} slot of \code{\link{PipeOp}}.
\item \code{param_set} :: \code{\link[paradox:ParamSet]{ParamSet}}\cr
Parameter space description. This should be created by the subclass and given to \code{super$initialize()}.
\item \code{param_vals} :: named \code{list}\cr
List of hyperparameter settings, overwriting the hyperparameter settings given in \code{param_set}. The
subclass should have its own \code{param_vals} parameter and pass it on to \code{super$initialize()}. Default \code{list()}.
\item \code{can_subset_cols} :: \code{logical(1)}\cr
Whether the \code{affect_columns} parameter should be added which lets the user limit the columns that are
modified by the \code{\link{PipeOpTaskPreproc}}. This should generally be \code{FALSE} if the operation adds or removes
rows from the \code{\link[mlr3:Task]{Task}}, and \code{TRUE} otherwise. Default is \code{TRUE}.
\item packages :: \code{character}\cr
Set of all required packages for the \code{\link{PipeOp}}'s \code{$train} and \code{$predict} methods. See \code{$packages} slot.
Default is \code{character(0)}.
\item \code{task_type} :: \code{character(1)}\cr
The class of \code{\link[mlr3:Task]{Task}} that should be accepted as input and will be returned as output. This
should generally be a \code{character(1)} identifying a type of \code{\link[mlr3:Task]{Task}}, e.g. \code{"Task"}, \code{"TaskClassif"} or
\code{"TaskRegr"} (or another subclass introduced by other packages). Default is \code{"Task"}.
}
}

\section{Input and Output Channels}{

\code{\link{PipeOpTaskPreproc}} has one input channel named \code{"input"}, taking a \code{\link[mlr3:Task]{Task}}, or a subclass of
\code{\link[mlr3:Task]{Task}} if the \code{task_type} construction argument is given as such; both during training and prediction.

\code{\link{PipeOpTaskPreproc}} has one output channel named \code{"output"}, producing a \code{\link[mlr3:Task]{Task}}, or a subclass;
the \code{\link[mlr3:Task]{Task}} type is the same as for input; both during training and prediction.

The output \code{\link[mlr3:Task]{Task}} is the modified input \code{\link[mlr3:Task]{Task}} according to the overloaded
\code{$train_task()}/\code{$predict_taks()} or \code{$train_dt()}/\code{$predict_dt()} functions.
}

\section{State}{

The \code{$state} is a named \code{list}; besides members added by inheriting classes, the members are:
\itemize{
\item \code{affect_cols} :: \code{character}\cr
Names of features being selected by the \code{affect_columns} parameter, if present; names of \emph{all} present features otherwise.
\item \code{intasklayout} :: \code{\link{data.table}}\cr
Copy of the training \code{\link[mlr3:Task]{Task}}'s \code{$feature_types} slot. This is used during prediction to ensure that
the prediction \code{\link[mlr3:Task]{Task}} has the same features, feature layout, and feature types as during training.
\item \code{outtasklayout} :: \code{\link{data.table}}\cr
Copy of the trained \code{\link[mlr3:Task]{Task}}'s \code{$feature_types} slot. This is used during prediction to ensure that
the \code{\link[mlr3:Task]{Task}} resulting from the prediction operation has the same features, feature layout, and feature types as after training.
\item \code{dt_columns} :: \code{character}\cr
Names of features selected by the \code{$select_cols()} call during training. This is only present if the \code{$train_dt()} functionality is used,
and not present if the \code{$train_task()} function is overloaded instead.
}
}

\section{Parameters}{

\itemize{
\item \code{affect_columns} :: \code{function} | \code{\link{Selector}} | \code{NULL} \cr
What columns the \code{\link{PipeOpTaskPreproc}} should operate on. This parameter is only present if the constructor is called with
the \code{can_subset_cols} argument set to \code{TRUE} (the default).\cr
The parameter must be a \code{\link{Selector}} function, which takes a \code{\link[mlr3:Task]{Task}} as argument and returns a \code{character}
of features to use.\cr
See \code{\link{Selector}} for example functions. Defaults to \code{NULL}, which selects all features.
}
}

\section{Internals}{

\code{\link{PipeOpTaskPreproc}} is an abstract class inheriting from \code{\link{PipeOp}}. It implements the \code{$train_internal()} and
\code{$predict_internal()} functions. These functions perform checks and go on to call \code{$train_task()} and \code{$predict_task()}.
A subclass of \code{\link{PipeOpTaskPreproc}} may implement these functions, or implement \code{$train_dt()} and \code{$predict_dt()} instead.
This works by having the default implementations of \code{$train_task()} and \code{$predict_task()} call \code{$train_dt()} and \code{$predict_dt()},
respectively.

The \code{affect_columns} functionality works by unsetting columns by removing their "col_role" before
processing, and adding them afterwards by setting the col_role to \code{"feature"}.
}

\section{Fields}{

Fields inherited from \code{\link{PipeOp}}.
}

\section{Methods}{

Methods inherited from \code{\link{PipeOp}}, as well as:
\itemize{
\item \code{train_task}\cr
(\code{\link[mlr3:Task]{Task}}) -> \code{\link[mlr3:Task]{Task}}\cr
Called by the \code{\link{PipeOpTaskPreproc}}'s implementation of \code{$train_internal()}. Takes a single \code{\link[mlr3:Task]{Task}} as input
and modifies it (ideally in-place without cloning) while storing information in the \code{$state} slot. Note that unlike
\code{$train_internal()}, the argument is \emph{not} a list but a singular \code{\link[mlr3:Task]{Task}}, and the return object is also \emph{not} a list but
a singular \code{\link[mlr3:Task]{Task}}. Also, contrary to \code{$train_internal()}, the \code{$state} being generated must be a \code{list}, which
the \code{\link{PipeOpTaskPreproc}} will add additional slots to (see Section \emph{State}). Care should be taken to avoid name collisions between
\code{$state} elements added by \code{$train_task()} and \code{\link{PipeOpTaskPreproc}}.\cr
By default this function calls the \code{$train_dt()} function, but it can be overloaded to perform operations on the \code{\link[mlr3:Task]{Task}}
directly.
\item \code{predict_task}\cr
(\code{\link[mlr3:Task]{Task}}) -> \code{\link[mlr3:Task]{Task}}\cr
Called by the \code{\link{PipeOpTaskPreproc}}'s implementation of \code{$predict_internal()}. Takes a single \code{\link[mlr3:Task]{Task}} as input
and modifies it (ideally in-place without cloning) while using information in the \code{$state} slot. Works analogously to
\code{$train_task()}. If \code{$predict_task()} should only be overloaded if \code{$train_task()} is overloaded (i.e. \code{$train_dt()} is \emph{not} used).
\item \code{train_dt(dt, levels, target)} \cr
(\code{\link{data.table}}, named \code{list}, \code{any}) -> \code{\link{data.table}} | \code{data.frame} | \code{matrix} \cr
Train \code{\link{PipeOpTaskPreproc}} on \code{dt}, transform it and store a state in \code{$state}. A transformed object must be returned
that can be converted to a \code{data.table} using \code{\link{as.data.table}}. \code{dt} does not need to be copied deliberately, it
is possible and encouraged to change it in-place.\cr
The \code{levels} argument is a named list of factor levels for factorial or character features. The \code{target} argument
contains the \code{$truth()} information of the training \code{\link[mlr3:Task]{Task}}; its type depends on the \code{\link[mlr3:Task]{Task}}
type being trained on.\cr
This method can be overloaded when inheriting from \code{\link{PipeOpTaskPreproc}}, together with \code{$predict_dt()} and optionally
\code{$select_cols()}; alternatively, \code{$train_task()} and \code{$predict_task()} can be overloaded.
\item \code{predict_dt(dt, levels)} \cr
(\code{\link{data.table}}, named \code{list}) -> \code{\link{data.table}} | \code{data.frame} | \code{matrix} \cr
Predict on new data in \code{dt}, possibly using the stored \code{$state}. A transformed object must be returned
that can be converted to a \code{data.table} using \code{\link{as.data.table}}. \code{dt} does not need to be copied deliberately, it
is possible and encouraged to change it in-place.\cr
The \code{levels} argument is a named list of factor levels for factorial or character features.\cr
This method can be overloaded when inheriting \code{PipeOpTaskPreproc}, together with \code{$train_dt()} and optionally
\code{$select_cols()}; alternatively, \code{$train_task()} and \code{$predict_task()} can be overloaded.
\item \code{select_cols(task)} \cr
(\code{\link[mlr3:Task]{Task}}) -> \code{character} \cr
Selects which columns the \code{\link{PipeOp}} operates on, if \code{$train_dt()} and \code{$predict_dt()} are overloaded. This function
is not called if \code{$train_task()} and \code{$predict_task()} are overloaded. In contrast to
the \code{affect_columns} parameter. \code{select_cols} is for the \emph{inheriting class} to determine which columns
the operator should function on, e.g. based on feature type, while \code{affect_columns} is a way for the \emph{user}
to limit the columns that a \code{\link{PipeOpTaskPreproc}} should operate on.\cr
This method can optionally be overloaded when inheriting \code{\link{PipeOpTaskPreproc}}, together with \code{$train_dt()} and
\code{$predict_dt()}; alternatively, \code{$train_task()} and \code{$predict_task()} can be overloaded.\cr
If this method is not overloaded, it defaults to selecting all columns.
}
}

\seealso{
Other mlr3pipelines backend related: \code{\link{Graph}},
  \code{\link{PipeOpTaskPreprocSimple}},
  \code{\link{PipeOp}}, \code{\link{mlr_pipeops}}

Other PipeOps: \code{\link{PipeOpEnsemble}},
  \code{\link{PipeOpImpute}}, \code{\link{PipeOp}},
  \code{\link{mlr_pipeops_boxcox}},
  \code{\link{mlr_pipeops_branch}},
  \code{\link{mlr_pipeops_chunk}},
  \code{\link{mlr_pipeops_classbalancing}},
  \code{\link{mlr_pipeops_classifavg}},
  \code{\link{mlr_pipeops_classweights}},
  \code{\link{mlr_pipeops_colapply}},
  \code{\link{mlr_pipeops_collapsefactors}},
  \code{\link{mlr_pipeops_copy}},
  \code{\link{mlr_pipeops_encodeimpact}},
  \code{\link{mlr_pipeops_encodelmer}},
  \code{\link{mlr_pipeops_encode}},
  \code{\link{mlr_pipeops_featureunion}},
  \code{\link{mlr_pipeops_filter}},
  \code{\link{mlr_pipeops_fixfactors}},
  \code{\link{mlr_pipeops_histbin}},
  \code{\link{mlr_pipeops_ica}},
  \code{\link{mlr_pipeops_imputehist}},
  \code{\link{mlr_pipeops_imputemean}},
  \code{\link{mlr_pipeops_imputemedian}},
  \code{\link{mlr_pipeops_imputenewlvl}},
  \code{\link{mlr_pipeops_imputesample}},
  \code{\link{mlr_pipeops_kernelpca}},
  \code{\link{mlr_pipeops_learner}},
  \code{\link{mlr_pipeops_missind}},
  \code{\link{mlr_pipeops_modelmatrix}},
  \code{\link{mlr_pipeops_mutate}},
  \code{\link{mlr_pipeops_nop}},
  \code{\link{mlr_pipeops_pca}},
  \code{\link{mlr_pipeops_quantilebin}},
  \code{\link{mlr_pipeops_regravg}},
  \code{\link{mlr_pipeops_removeconstants}},
  \code{\link{mlr_pipeops_scalemaxabs}},
  \code{\link{mlr_pipeops_scalerange}},
  \code{\link{mlr_pipeops_scale}},
  \code{\link{mlr_pipeops_select}},
  \code{\link{mlr_pipeops_smote}},
  \code{\link{mlr_pipeops_spatialsign}},
  \code{\link{mlr_pipeops_subsample}},
  \code{\link{mlr_pipeops_unbranch}},
  \code{\link{mlr_pipeops_yeojohnson}},
  \code{\link{mlr_pipeops}}
}
\concept{PipeOps}
\concept{mlr3pipelines backend related}
\keyword{datasets}
