% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/io2.R
\name{read_file_duckdb}
\alias{read_file_duckdb}
\alias{read_parquet_duckdb}
\alias{read_csv_duckdb}
\alias{read_json_duckdb}
\title{Read Parquet, CSV, and other files using DuckDB}
\usage{
read_parquet_duckdb(
  path,
  ...,
  prudence = c("thrifty", "lavish", "stingy"),
  options = list()
)

read_csv_duckdb(
  path,
  ...,
  prudence = c("thrifty", "lavish", "stingy"),
  options = list()
)

read_json_duckdb(
  path,
  ...,
  prudence = c("thrifty", "lavish", "stingy"),
  options = list()
)

read_file_duckdb(
  path,
  table_function,
  ...,
  prudence = c("thrifty", "lavish", "stingy"),
  options = list()
)
}
\arguments{
\item{path}{Path to files, glob patterns \code{*} and \verb{?} are supported.}

\item{...}{These dots are for future extensions and must be empty.}

\item{prudence}{Memory protection, controls if DuckDB may convert
intermediate results in DuckDB-managed memory to data frames in R memory.
\itemize{
\item \code{"thrifty"}: up to a maximum size of 1 million cells,
\item \code{"lavish"}: regardless of size,
\item \code{"stingy"}: never.
}

The default is \code{"thrifty"} for the ingestion functions,
and may be different for other functions.
See \code{vignette("prudence")} for more information.}

\item{options}{Arguments to the DuckDB function
indicated by \code{table_function}.}

\item{table_function}{The name of a table-valued
DuckDB function such as \code{"read_parquet"},
\code{"read_csv"}, \code{"read_csv_auto"} or \code{"read_json"}.}
}
\value{
A duckplyr frame, see \code{\link[=as_duckdb_tibble]{as_duckdb_tibble()}} for details.
}
\description{
These functions ingest data from a file.
In many cases, these functions return immediately because they only read the metadata.
The actual data is only read when it is actually processed.

\code{read_parquet_duckdb()} reads a CSV file using DuckDB's \code{read_parquet()} table function.

\code{read_csv_duckdb()} reads a CSV file using DuckDB's \code{read_csv_auto()} table function.

\code{read_json_duckdb()} reads a JSON file using DuckDB's \code{read_json()} table function.

\code{read_file_duckdb()} uses arbitrary readers to read data.
See \url{https://duckdb.org/docs/data/overview} for a documentation
of the available functions and their options.
To read multiple files with the same schema,
pass a wildcard or a character vector to the \code{path} argument,
}
\section{Fine-tuning prudence}{

\ifelse{html}{\href{https://lifecycle.r-lib.org/articles/stages.html#experimental}{\figure{lifecycle-experimental.svg}{options: alt='[Experimental]'}}}{\strong{[Experimental]}}

The \code{prudence} argument can also be a named numeric vector
with at least one of \code{cells} or \code{rows}
to limit the cells (values) and rows in the resulting data frame
after automatic materialization.
If both limits are specified, both are enforced.
The equivalent of \code{"thrifty"} is \code{c(cells = 1e6)}.
}

\examples{
# Create simple CSV file
path <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)

# Reading is immediate
df <- read_csv_duckdb(path)

# Names are always available
names(df)

# Materialization upon access is turned off by default
try(print(df$a))

# Materialize explicitly
collect(df)$a

# Automatic materialization with prudence = "lavish"
df <- read_csv_duckdb(path, prudence = "lavish")
df$a

# Specify column types
read_csv_duckdb(
  path,
  options = list(delim = ",", types = list(c("DOUBLE", "VARCHAR")))
)

# Create and read a simple JSON file
path <- tempfile("duckplyr_test_", fileext = ".json")
writeLines('[{"a": 1, "b": "x"}, {"a": 2, "b": "y"}]', path)

# Reading needs the json extension
db_exec("INSTALL json")
db_exec("LOAD json")
read_json_duckdb(path)
}
