% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/exchange-method-kmeans-anticlustering.R
\name{fast_anticlustering}
\alias{fast_anticlustering}
\title{Fast anticlustering}
\usage{
fast_anticlustering(x, K, k_neighbours = Inf, categories = NULL)
}
\arguments{
\item{x}{A numeric vector, matrix or data.frame of data
points.  Rows correspond to elements and columns correspond to
features. A vector represents a single numeric feature.}

\item{K}{How many anticlusters should be created.}

\item{k_neighbours}{The number of neighbours that serve as exchange
partner for each element. Defaults to Inf, i.e., each element
is exchanged with each element in other groups.}

\item{categories}{A vector, data.frame or matrix representing one or
several categorical constraints.}
}
\description{
The most efficient way to solve anticlustering optimizing the
k-means variance criterion with an exchange method. Can be used for
very large data sets.
}
\details{
This function was created to make anticlustering applicable
to large data sets (e.g., 100,000 elements). It optimizes the k-means
variance objective because computing all pairwise distances is not
feasible for many elements. Additionally, this function employs a
speed-optimized exchange method. For each element, the potential
exchange partners are generated using a nearest neighbor search with the
function \code{\link[RANN]{nn2}} from the \code{RANN} package. The nearest
neighbors then serve as exchange partners. This approach is inspired by the
preclustering heuristic according to which good solutions are found
when similar elements are in different sets---by swapping nearest
neighbors, this will often be the case. The number of exchange partners
per element has to be set using the argument \code{k_neighbours}; by
default, it is set to \code{Inf}, meaning that all possible swaps are
tested. This default must be changed by the user for large data sets.
More exchange partners generally improve the output, but also increase
run time.

When setting the \code{categories} argument, exchange partners will
be generated from the same category. Note that when
\code{categories} has multiple columns (i.e., each element is
assigned to multiple columns), each combination of categories is
treated as a distinct category by the exchange method.
}
\examples{


features <- iris[, - 5]

start <- Sys.time()
ac_exchange <- fast_anticlustering(features, K = 3)
Sys.time() - start

## The following call is equivalent to the call above:
start <- Sys.time()
ac_exchange <- anticlustering(features, K = 3, objective = "variance")
Sys.time() - start

## Improve run time by using fewer exchange partners:
start <- Sys.time()
ac_fast <- fast_anticlustering(features, K = 3, k_neighbours = 10)
Sys.time() - start

by(features, ac_exchange, function(x) round(colMeans(x), 2))
by(features, ac_fast, function(x) round(colMeans(x), 2))

}
\seealso{
\code{\link{anticlustering}}

\code{\link{variance_objective}}
}
\author{
Martin Papenberg \email{martin.papenberg@hhu.de}
}
