% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/wdpa_clean.R
\name{wdpa_clean}
\alias{wdpa_clean}
\title{Clean data}
\usage{
wdpa_clean(
  x,
  crs = paste("+proj=cea +lon_0=0 +lat_ts=30 +x_0=0",
    "+y_0=0 +datum=WGS84 +ellps=WGS84 +units=m +no_defs"),
  exclude_unesco = TRUE,
  retain_status = c("Designated", "Inscribed", "Established"),
  snap_tolerance = 1,
  simplify_tolerance = 0,
  geometry_precision = 1500,
  erase_overlaps = TRUE,
  verbose = interactive()
)
}
\arguments{
\item{x}{\code{\link[sf:sf]{sf::sf()}} object containing protected area data.}

\item{crs}{\code{character} or code{integer} object representing a
coordinate reference system. Defaults to World Behrmann
(\emph{ESRI:54017}).}

\item{exclude_unesco}{\code{logical} should UNESCO Biosphere Reserves be excluded?
Defaults to \code{TRUE}.}

\item{retain_status}{\code{character} vector containing the statuses for
protected areas that should be retained during the cleaning process.
Available statuses include:
\code{"Proposed"}, \code{"Inscribed"}, \code{"Adopted"}, \code{"Designated"}, and
\code{"Established"}.
Additionally, a \code{NULL} argument can be specified to ensure that no
protected areas are excluded according to their status.
The default argument is a \code{character} vector containing \code{"Designated"},
\code{"Inscribed"}, and \code{"Established"}.
This default argument ensures that protected areas that are not currently
implemented are excluded.}

\item{snap_tolerance}{\code{numeric} tolerance for snapping geometry to a
grid for resolving invalid geometries. Defaults to 1 meter.}

\item{simplify_tolerance}{\code{numeric} simplification tolerance.
Defaults to 0 meters.}

\item{geometry_precision}{\code{numeric} level of precision for processing
the spatial data (used with \code{\link[sf:st_precision]{sf::st_set_precision()}}). The
default argument is 1500 (higher values indicate higher precision).
This level of precision is generally suitable for analyses at the
national-scale. For analyses at finer-scale resolutions, please
consider using a greater value (e.g. 10000).}

\item{erase_overlaps}{\code{logical} should overlapping boundaries be
erased? This is useful for making comparisons between individual
protected areas and understanding their "effective" geographic coverage.
On the other hand, this processing step may not be needed
(e.g. if the protected area boundaries are going to be rasterized), and so
processing time can be substantially by skipping this step and setting
the argument to \code{FALSE}. Defaults to \code{TRUE}.}

\item{verbose}{\code{logical} should progress on data cleaning be reported?
Defaults to \code{TRUE} in an interactive session, otherwise
\code{FALSE}.}
}
\value{
\code{\link[sf:sf]{sf::sf()}} object.
}
\description{
Clean data obtained from
\href{https://www.protectedplanet.net/en}{Protected Planet}.
Specifically, this function is designed to clean data obtained from
the World Database on Protected Areas
(WDPA) and the World Database on Other Effective Area-Based Conservation
Measures (WDOECM).
For recommended practices on cleaning large datasets
(e.g. datasets that span multiple countries or a large geographic area),
please see below.
}
\details{
This function cleans data following best practices
(Butchart \emph{et al.} 2015; Protected Planet 2021; Runge \emph{et al.} 2015).
To obtain accurate protected area coverage statistics for a country,
please note that you will need to manually clip the cleaned data to
the countries' coastline and its Exclusive Economic Zone (EEZ).

\enumerate{

\item Exclude protected areas according to their status (i.e.
\code{"STATUS"} field). Specifically, protected areas that have
a status not specified in the argument to \code{retain_status} are excluded.
By default, only protected areas that have a
\code{"Designated"}, \code{"Inscribed"}, or \code{"Established"} status are retained.
This means that the default behavior is to exclude protected that
are not currently implemented.

\item Exclude United Nations Educational, Scientific and Cultural
Organization (UNESCO) Biosphere Reserves (Coetzer \emph{et al.} 2014).
This step is only performed if the argument to \code{exclude_unesco} is
\code{TRUE}.

\item Create a field (\code{"GEOMETRY_TYPE"}) indicating if areas are
represented as point localities (\code{"POINT"}) or as polygons
(\code{"POLYGON"}).

\item Exclude areas represented as point localities that do not
have a reported spatial extent (i.e. missing data for the field

\item Geometries are wrapped to the dateline (using
\code{\link[sf:st_transform]{sf::st_wrap_dateline()}} with the options
\code{"WRAPDATELINE=YES"} and \code{"DATELINEOFFSET=180"}).

\item Reproject data to coordinate system specified in argument to
\code{crs} (using \code{\link[sf:st_transform]{sf::st_transform()}}).

\item Repair any invalid geometries that have manifested
(using \code{\link[=st_repair_geometry]{st_repair_geometry()}}).

\item Buffer areas represented as point localities to circular areas
using their reported spatial extent (using data in the field
\code{"REP_AREA"} and \code{\link[sf:geos_unary]{sf::st_buffer()}}; see Visconti
\emph{et al.} 2013).

\item Snap the geometries to a grid to fix any remaining
geometry issues (using argument to \code{snap_tolerance} and
\code{\link[lwgeom:st_snap_to_grid]{lwgeom::st_snap_to_grid()}}).

\item Repair any invalid geometries that have manifested
(using \code{\link[=st_repair_geometry]{st_repair_geometry()}}).

\item Simplify the protected area geometries to reduce computational burden
(using argument to \code{simplify_tolerance} and
\code{\link[sf:geos_unary]{sf::st_simplify()}}).

\item Repair any invalid geometries that have manifested
(using \code{\link[=st_repair_geometry]{st_repair_geometry()}}).

\item The \code{"MARINE"} field is converted from integer codes
to descriptive names (i.e. \code{0} = \code{"terrestrial"},
\code{1} = \code{"partial"}, \code{2} = \code{"marine"}).

\item The \code{"PA_DEF"} field is converted from integer codes
to descriptive names (i.e. \code{0} = \code{"OECM"}, and \code{1} = \code{"PA"}).

\item Zeros in the \code{"STATUS_YR"} field are replaced with
missing values (i.e. \code{NA_real_} values).

\item Zeros in the \code{"NO_TK_AREA"} field are replaced with \code{NA}
values for areas where such data are not reported or applicable
(i.e. areas with the values \code{"Not Applicable"}
or \code{"Not Reported"} in the \code{"NO_TK_AREA"} field).

\item Overlapping geometries are erased from the protected area data
(discussed in Deguignet \emph{et al.} 2017). Geometries are erased such
that areas associated with more effective management
categories (\code{"IUCN_CAT"}) or have historical precedence are retained
(using \code{\link[sf:geos_binary_ops]{sf::st_difference()}}).

\item Slivers are removed (geometries with areas less than 0.1 square
meters).

\item The size of areas are calculated in square kilometers and stored in
the field \code{"AREA_KM2"}.

}
}
\section{Recommended practices for large datasets}{

This function can be used to clean large datasets assuming that
sufficient computational resources and time are available.
Indeed, it can clean data spanning large countries, multiple
countries, and even the full global dataset.
When processing the full global dataset, it is recommended to use a
computer system with at least 32 GB RAM available and to allow for at least
one full day for the data cleaning procedures to complete.
It is also recommended to avoid using the computer system for any other
tasks while the data cleaning procedures are being completed,
because they are very computationally intensive.
Additionally, when processing large datasets -- and especially
for the global dataset -- it is strongly recommended to disable the
procedure for erasing overlapping areas.
This is because the built-in procedure for erasing overlaps is
very time consuming when processing many protected areas, so that
information on each protected area can be output
(e.g. IUCN category, year established).
Instead, when cleaning large datasets, it is recommended to run
the data cleaning procedures with the procedure for erasing
overlapping areas disabled (i.e. with \code{erase_overlaps = FALSE}).
After the data cleaning procedures have completed,
the protected area data can be manually dissolved
to remove overlapping areas (e.g. using \code{\link[=wdpa_dissolve]{wdpa_dissolve()}}).
For an example of processing a large protected area dataset,
please see the vignette.
}

\examples{
\dontrun{
# fetch data for the Liechtenstein
lie_raw_data <- wdpa_fetch("LIE", wait = TRUE)

# clean data
lie_data <- wdpa_clean(lie_raw_data)

# plot cleaned dataset
plot(lie_data)

}
}
\references{
Butchart SH, Clarke M, Smith RJ, Sykes RE, Scharlemann JP,
Harfoot M, ... & Brooks TM (2015) Shortfalls and solutions for
meeting national and global conservation area targets.
\emph{Conservation Letters}, \strong{8}: 329--337.

Coetzer KL, Witkowski ET, & Erasmus BF (2014) Reviewing
Biosphere Reserves globally: Effective conservation action or bureaucratic
label? \emph{Biological Reviews}, \strong{89}: 82--104.

Deguignet M, Arnell A, Juffe-Bignoli D, Shi Y, Bingham H, MacSharry B &
Kingston N (2017) Measuring the extent of overlaps in protected area
designations. \emph{PloS One}, \strong{12}: e0188681.

Runge CA, Watson JEM, Butchart HM, Hanson JO, Possingham HP & Fuller RA
(2015) Protected areas and global conservation of migratory birds.
\emph{Science}, \strong{350}: 1255--1258.

Protected Planet (2021) Calculating protected and OECM area coverage.
Available at:
\url{https://www.protectedplanet.net/en/resources/calculating-protected-area-coverage}.

Visconti P, Di Marco M, Alvarez-Romero JG, Januchowski-Hartley SR, Pressey,
RL, Weeks R & Rondinini C (2013) Effects of errors and gaps in spatial data
sets on assessment of conservation progress. \emph{Conservation Biology},
\strong{27}: 1000--1010.
}
\seealso{
\code{\link[=wdpa_fetch]{wdpa_fetch()}}, \code{\link[=wdpa_dissolve]{wdpa_dissolve()}}.
}
