% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/scream.R
\name{scream}
\alias{scream}
\title{\if{html}{\Sexpr[stage=render,results=rd]{"\U0001f631"}} Scream.}
\usage{
scream(data, ptype, allow_novel_levels = FALSE)
}
\arguments{
\item{data}{A data frame containing the new data to check the structure
of.}

\item{ptype}{A data frame prototype to cast \code{data} to. This is commonly
a 0-row slice of the training set.}

\item{allow_novel_levels}{Should novel factor levels in \code{data} be allowed?
The safest approach is the default, which throws a warning when novel levels
are found, and coerces them to \code{NA} values. Setting this argument to \code{TRUE}
will ignore all novel levels.}
}
\value{
A tibble containing the required columns after any required structural
modifications have been made.
}
\description{
\code{scream()} ensures that the structure of \code{data} is the same as
prototype, \code{ptype}. Under the hood, \code{\link[vctrs:vec_cast]{vctrs::vec_cast()}} is used, which
casts each column of \code{data} to the same type as the corresponding
column in \code{ptype}.

This casting enforces a number of important structural checks,
including but not limited to:
\itemize{
\item \emph{Data Classes} - Checks that the class of each column in \code{data} is the
same as the corresponding column in \code{ptype}.
\item \emph{Novel Levels} - Checks that the factor columns in \code{data} don't have any
\emph{new} levels when compared with the \code{ptype} columns. If there are new
levels, a warning is issued and they are coerced to \code{NA}. This check is
optional, and can be turned off with \code{allow_novel_levels = TRUE}.
\item \emph{Level Recovery} - Checks that the factor columns in \code{data} aren't
missing any factor levels when compared with the \code{ptype} columns. If
there are missing levels, then they are restored.
}
}
\details{
\code{scream()} is called by \code{\link[=forge]{forge()}} after \code{\link[=shrink]{shrink()}} but before the
actual processing is done. Generally, you don't need to call \code{scream()}
directly, as \code{forge()} will do it for you.

If \code{scream()} is used as a standalone function, it is good practice to call
\code{\link[=shrink]{shrink()}} right before it as there are no checks in \code{scream()} that ensure
that all of the required column names actually exist in \code{data}. Those
checks exist in \code{shrink()}.
}
\examples{
# ---------------------------------------------------------------------------
# Setup

train <- iris[1:100,]
test <- iris[101:150,]

# mold() is run at model fit time
# and a formula preprocessing blueprint is recorded
x <- mold(log(Sepal.Width) ~ Species, train)

# Inside the result of mold() are the prototype tibbles
# for the predictors and the outcomes
ptype_pred <- x$blueprint$ptypes$predictors
ptype_out <- x$blueprint$ptypes$outcomes

# ---------------------------------------------------------------------------
# shrink() / scream()

# Pass the test data, along with a prototype, to
# shrink() to extract the prototype columns
test_shrunk <- shrink(test, ptype_pred)

# Now pass that to scream() to perform validation checks
# If no warnings / errors are thrown, the checks were
# successful!
scream(test_shrunk, ptype_pred)

# ---------------------------------------------------------------------------
# Outcomes

# To also extract the outcomes, use the outcome prototype
test_outcome <- shrink(test, ptype_out)
scream(test_outcome, ptype_out)

# ---------------------------------------------------------------------------
# Casting

# scream() uses vctrs::vec_cast() to intelligently convert
# new data to the prototype automatically. This means
# it can automatically perform certain conversions, like
# coercing character columns to factors.
test2 <- test
test2$Species <- as.character(test2$Species)

test2_shrunk <- shrink(test2, ptype_pred)
scream(test2_shrunk, ptype_pred)

# It can also recover missing factor levels.
# For example, it is plausible that the test data only had the
# "virginica" level
test3 <- test
test3$Species <- factor(test3$Species, levels = "virginica")

test3_shrunk <- shrink(test3, ptype_pred)
test3_fixed <- scream(test3_shrunk, ptype_pred)

# scream() recovered the missing levels
levels(test3_fixed$Species)

# ---------------------------------------------------------------------------
# Novel levels

# When novel levels with any data are present in `data`, the default
# is to coerce them to `NA` values with a warning.
test4 <- test
test4$Species <- as.character(test4$Species)
test4$Species[1] <- "new_level"

test4$Species <- factor(
  test4$Species,
  levels = c(levels(test$Species), "new_level")
)

test4 <- shrink(test4, ptype_pred)

# Warning is thrown
test4_removed <- scream(test4, ptype_pred)

# Novel level is removed
levels(test4_removed$Species)

# No warning is thrown
test4_kept <- scream(test4, ptype_pred, allow_novel_levels = TRUE)

# Novel level is kept
levels(test4_kept$Species)

}
