% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/EdgeBoostFilter.R
\name{edgeBoostFilter}
\alias{edgeBoostFilter}
\alias{edgeBoostFilter.default}
\alias{edgeBoostFilter.formula}
\title{Edge Boosting Filter}
\usage{
\method{edgeBoostFilter}{formula}(formula, data, ...)

\method{edgeBoostFilter}{default}(x, m = 15, percent = 0.05,
  threshold = 0, classColumn = ncol(x), ...)
}
\arguments{
\item{formula}{A formula describing the classification variable and the attributes to be used.}

\item{data, x}{Data frame containing the tranining dataset to be filtered.}

\item{...}{Optional parameters to be passed to other methods.}

\item{m}{Number of boosting iterations}

\item{percent}{Real number between 0 and 1. It sets the percentage of instances to be removed (as long as
their edge value exceeds the parameter \code{threshold}).}

\item{threshold}{Real number between 0 and 1. It sets the minimum edge value required
by an instance in order to be removed.}

\item{classColumn}{Positive integer indicating the column which contains the (factor of) classes.
By default, the last column is considered.}
}
\value{
An object of class \code{filter}, which is a list with seven components:
\itemize{
   \item \code{cleanData} is a data frame containing the filtered dataset.
   \item \code{remIdx} is a vector of integers indicating the indexes for
   removed instances (i.e. their row number with respect to the original data frame).
   \item \code{repIdx} is a vector of integers indicating the indexes for
   repaired/relabelled instances (i.e. their row number with respect to the original data frame).
   \item \code{repLab} is a factor containing the new labels for repaired instances.
   \item \code{parameters} is a list containing the argument values.
   \item \code{call} contains the original call to the filter.
   \item \code{extraInf} is a character that includes additional interesting
   information not covered by previous items.
}
}
\description{
Ensemble-based filter for removing label noise from a dataset as a
preprocessing step of classification. For more information, see 'Details' and
'References' sections.
}
\details{
The full description of the method can be looked up in the provided reference.

An AdaBoost scheme (Freund & Schapire) is applied with a default C4.5 tree as weak classifier.
After \code{m} iterations, those instances with larger (according to the constraints
\code{percent} and \code{threshold}) edge values (Wheway, Freund & Schapire) are considered noisy
and thus removed.

Notice that making use of extreme values (i.e. \code{percent=1} or \code{threshold=0}) any
'removing constraints' can be ignored.
}
\examples{
# Next example is not run in order to save time
\dontrun{
data(iris)
out <- edgeBoostFilter(Species~., data = iris, m = 10, percent = 0.05, threshold = 0)
print(out)
identical(out$cleanData, iris[setdiff(1:nrow(iris),out$remIdx),])
}
}
\references{
Freund Y., Schapire R. E. (1997): A decision-theoretic generalization of on-line learning and
an application to boosting. \emph{Journal of computer and system sciences}, 55(1), 119-139.

Wheway V. (2001, January): Using boosting to detect noisy data. In \emph{Advances in Artificial Intelligence}.
PRICAI 2000 Workshop Reader (pp. 123-130). Springer Berlin Heidelberg.
}

