% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/coi5p.r
\name{indel_check}
\alias{indel_check}
\alias{indel_check.coi5p}
\title{Check if a coi5p sequence likely contains an error.}
\usage{
indel_check(x, ...)

\method{indel_check}{coi5p}(x, ..., indel_threshold = -358.88)
}
\arguments{
\item{x}{A coi5p class object for which frame() and translate() have been run.}

\item{...}{Additional arguments to be passed between methods.}

\item{indel_threshold}{The log likelihood threshold used to assess whether or not sequences
are likely to contain an indel. Default is -358.88. Values lower than this will be classified
as likely to contain an indel and values higher will be classified as not likely to contain an indel.}
}
\value{
An object of class \code{"coi5p"}
}
\description{
Check if a coi5p sequence likely contains an error.
}
\details{
The indel check function analyzes the framed and translated DNA sequences in two ways in order to
allow users to make an informed decision about whether or not a DNA sequence contains a frameshift error.
This test is designed to detect insertion or deletion errors resulting from technical errors in DNA sequencing,
but can in some instances identify biological contaminants (i.e. if the contaminant sequence uses a different
genetic code than the target, or if the contaminants are things such as pseudogenes that possess sequences that
are highly divergent from animal COI-5P sequences).

The two tests performed are: (1) a query for stop codons in the amino acid sequence and (2) an evaluation of the
log likelihood value resulting from the comparison of the framed coi5p amino acid sequence against the COI-5P
amino acid PHMM. The default likelihood value for identifying a sequence is likely erroneous is -358.88. sequences with
likelihood values lower than this will receive an indel_likely value of TRUE. The threshold of -358.88 was experimentally
determined to be the optimal likelihood threshold for separating of full length sequences with and without errors when
the censored translation option is used. Sequences will have higher likelihood values when a specific genetic code is used.
Sequences will have lower likelihood values when they are not complete barcode sequences (i.e. <500bp in length). For these
reasons the likelihood threshold is not a specific value but a parameter that can be altered based on the type of translation
and length of the sequences. Below are experimentally determined suggested values for different size and translation table
combinations.

Short barcode sequences, known genetic code: indel_threshold = -354.44

Short barcode sequences, unknown genetic code: indel_threshold = -440.24

Full length barcode sequences, known genetic code: indel_threshold = -246.20

Full length barcode sequences, unknown genetic code: indel_threshold = -358.88
}
\examples{
#previously run functions:
dat = coi5p(example_nt_string)
dat = frame(dat)
dat = translate(dat)
#current function
dat = indel_check(dat)
#with custom indel threshold
dat = indel_check(dat, indel_threshold = -400)
#additional components in output coi5p object:
dat$stop_codons #Boolean - Indicates if there are stop codons in the amino acid sequence.
dat$indel_likely #Boolean - Indicates if the likelihood score below the specified indel_threshold.
dat$aaScore #view the amino acid log likelihood score
}
\seealso{
\code{\link{coi5p}}

\code{\link{frame}}

\code{\link{translate}}
}
