% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/extract_metadata_from_pdf.R
\name{extract_pdf_metadata}
\alias{extract_pdf_metadata}
\title{Extract DOI and Metadata from PDF}
\usage{
extract_pdf_metadata(pdf_path, fields = "doi", return_all_dois = FALSE)
}
\arguments{
\item{pdf_path}{Character. Path to the PDF file.}

\item{fields}{Character vector. Metadata fields to extract. Options are:
"doi", "title", "authors", "journal", "year", "all". Default is "doi".}

\item{return_all_dois}{Logical. If TRUE, returns all DOIs found; if FALSE (default),
returns only the first article DOI found (excluding journal ISSNs).}
}
\value{
If fields = "doi" (default), returns a character string with the DOI
or NA_character_ if not found. If multiple fields are requested, returns a
named list with the requested metadata. If return_all_dois = TRUE, the DOI
element will be a character vector.
}
\description{
This function extracts the Digital Object Identifier (DOI) and other metadata
from a PDF file using pdftools::pdf_info(). It searches through all metadata
fields including the XMP metadata XML.
}
\details{
The function searches for DOIs in:
\itemize{
\item All fields in the keys list (prioritizing article DOI fields)
\item The XMP metadata XML field
}

Journal DOIs/ISSNs (containing "(ISSN)" or from journal-specific fields)
are automatically filtered out to return article DOIs.

For other metadata:
\itemize{
\item Title: extracted from Title field or dc:title in XMP metadata
\item Authors: extracted from Author/Creator fields or dc:creator in XMP metadata
\item Journal: extracted from Subject, prism:publicationName in XMP metadata
\item Year: extracted from created/modified dates, prism:coverDate, or title
}

Common DOI prefixes are automatically removed. The function uses regex pattern
matching to validate DOI format and extract structured data from XMP XML.
}
\examples{
\dontrun{
# Extract only DOI
doi <- extract_pdf_metadata("path/to/paper.pdf")

# Extract multiple metadata fields
meta <- extract_pdf_metadata("path/to/paper.pdf",
                             fields = c("doi", "title", "journal"))

# Extract all available metadata
meta <- extract_pdf_metadata("path/to/paper.pdf", fields = "all")
}

}
\seealso{
\code{\link[pdftools]{pdf_info}}
}
