% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/predict_alphafold_domain.R
\name{predict_alphafold_domain}
\alias{predict_alphafold_domain}
\title{Predict protein domains of AlphaFold predictions}
\usage{
predict_alphafold_domain(
  pae_list,
  pae_power = 1,
  pae_cutoff = 5,
  graph_resolution = 1,
  return_data_frame = FALSE,
  show_progress = TRUE
)
}
\arguments{
\item{pae_list}{a list of proteins that contains aligned errors for their AlphaFold predictions.
This list can be retrieved with the \code{fetch_alphafold_aligned_error()} function. It should contain a
column containing the scored residue (\code{scored_residue}), the aligned residue (\code{aligned_residue}) and
the predicted aligned error (\code{error}).}

\item{pae_power}{a numeric value, each edge in the graph will be weighted proportional to (\code{1 / pae^pae_power}).
Default is \code{1}.}

\item{pae_cutoff}{a numeric value, graph edges will only be created for residue pairs with \code{pae < pae_cutoff}.
Default is \code{5}.}

\item{graph_resolution}{a numeric value that regulates how aggressive the clustering algorithm is. Smaller values
lead to larger clusters. Value should be larger than zero, and values larger than 5 are unlikely to be useful.
Higher values lead to stricter (i.e. smaller) clusters. The value is provided to the Leiden clustering algorithm
of the \code{igraph} package as \code{graph_resolution / 100}. Default is \code{1}.}

\item{return_data_frame}{a logical value; if \code{TRUE} a data frame instead of a list
is returned. It is recommended to only use this if information for few proteins is retrieved.
Default is \code{FALSE}.}

\item{show_progress}{a logical value that specifies if a progress bar will be shown. Default
is \code{TRUE}.}
}
\value{
A list of the provided proteins that contains domain assignments for each residue. If \code{return_data_frame} is
\code{TRUE}, a data frame with this information is returned instead. The data frame contains the
following columns:
\itemize{
\item residue: The protein residue number.
\item domain: A numeric value representing a distinct predicted domain in the protein.
\item accession: The UniProt protein identifier.
}
}
\description{
Uses the predicted aligned error (PAE) of AlphaFold predictions to find possible protein domains.
A graph-based community clustering algorithm (Leiden clustering) is used on the predicted error
(distance) between residues of a protein in order to infer pseudo-rigid groups in the protein. This is
for example useful in order to know which parts of protein predictions are likely in a fixed relative
position towards each other and which might have varying distances.
This function is based on python code written by Tristan Croll. The original code can be found on his
\href{https://github.com/tristanic/pae_to_domains}{GitHub page}.
}
\examples{
\donttest{
# Fetch aligned errors
aligned_error <- fetch_alphafold_aligned_error(
  uniprot_ids = c("F4HVG8", "O15552"),
  error_cutoff = 4
)

# Predict protein domains
af_domains <- predict_alphafold_domain(
  pae_list = aligned_error,
  return_data_frame = TRUE
)

head(af_domains, n = 10)
}
}
