% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/06-data_evaluate.R
\name{dataset_evaluate}
\alias{dataset_evaluate}
\title{Generate a quality assessment report of a dataset}
\usage{
dataset_evaluate(
  dataset,
  data_dict = NULL,
  taxonomy = NULL,
  .dataset_name = NULL,
  as_data_dict_mlstr = TRUE
)
}
\arguments{
\item{dataset}{A tibble identifying the input dataset observations
associated to its data dictionary.}

\item{data_dict}{A list of tibble(s) representing meta data of an
associated dataset. Automatically generated if not provided.}

\item{taxonomy}{A tibble identifying the scheme used for variables
classification.}

\item{.dataset_name}{A character string specifying the name of the dataset
(internally used in the function \code{\link[=dossier_evaluate]{dossier_evaluate()}}).}

\item{as_data_dict_mlstr}{Whether the output data dictionary has a simple
data dictionary structure or not (meaning has a Maelstrom data dictionary
structure, compatible with Maelstrom Research ecosystem, including Opal).
TRUE by default.}
}
\value{
A list of tibbles of report for one data dictionary.
}
\description{
Assesses the content and structure of a dataset and reports possible issues
in the dataset and data dictionary to facilitate assessment of input data.
The report can be used to help assess data structure, presence of fields,
coherence across elements, and taxonomy or data dictionary formats. This
report is compatible with Excel and can be exported as an Excel spreadsheet.
}
\details{
A data dictionary contains metadata about variables and can be associated
with a dataset. It must be a list of data frame-like objects with elements
named 'Variables' (required) and 'Categories' (if any). To be usable in any
function, the 'Variables' element must contain at least the 'name' column,
and the 'Categories' element must contain at least the 'variable' and 'name'
columns. To be considered as a minimum workable data dictionary, in
'Variables' the 'name' column must also have unique and non-null entries,
and in 'Categories' the combination of 'variable' and 'name' columns must
also be unique'.

A dataset must be a data frame-like object and can be associated with a
data dictionary. If no data dictionary is provided, a minimum workable
data dictionary will be generated as needed by relevant functions.
An identifier \code{id} column for sorting can be specified by the user. If
specified, the \code{id} values must be non-missing and will be used in functions
that require it. If no identifier column is specified, indexing is handled
automatically by the function.

A taxonomy is classification scheme that can be defined for variable
attributes. If defined, a taxonomy must be a data frame-like object. It must
be compatible with (and is generally extracted from) an Opal environment. To
work with certain functions, a valid taxonomy must contain at least the
columns 'taxonomy', 'vocabulary', and 'terms'. In addition, the taxonomy
may follow Maelstrom research taxonomy, and its content can be evaluated
accordingly, such as naming convention restriction, tagging elements,
or scales, which are specific to Maelstrom Research. In this particular
case, the tibble must also contain 'vocabulary_short', 'taxonomy_scale',
'vocabulary_scale' and 'term_scale' to work with some specific functions.
}
\examples{
{

# use DEMO_files provided by the package
library(dplyr)
library(fabR) # add_index

#' ###### Example : any data frame (or tibble) can be summarized
dataset <- iris['Sepal.Width']
dataset_evaluate(dataset)

}

}
\seealso{
\code{\link[=dossier_evaluate]{dossier_evaluate()}}
}
