--- title: "Introduction to rTPC" author: "Daniel Padfield" output: rmarkdown::html_vignette date: "`r Sys.Date()`" description: > vignette: > %\VignetteIndexEntry{Introduction to rTPC} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- #### An introduction to rTPC and how it can be used to fit thermal performance curves using nls.multstart. ------------------------------------------------------------------------ ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", #tidy.opts=list(width.cutoff=60), #tidy=TRUE, fig.align = 'center' ) ``` ```{r runtimer, include=FALSE} runstart <- lubridate::now() ``` ```{r setup, message=FALSE} # load packages library(rTPC) library(nls.multstart) library(broom) library(tidyverse) ``` **rTPC** provides a suite of functions to help fit thermal performance curves to empirical data. After searching the literature, **rTPC** contains `r length(rTPC::get_model_names())` different model formulations that have been used previously. These functions can be easily applied to methods in R that use non-linear least squares regression to estimate thermal performance curves. The available model formulations can be accessed using **get_model_names()**. ```{r get_mod_names} # list model names get_model_names() ``` They are generally named after the author of the paper (and hence the name of the model within the literature) and the year at which I found the model to be first used, separated by a **"\_"**. Some original model formulations have been altered so that all models take temperature in degrees centigrade and raw rate values as input. We can demonstrate the fitting procedure by taking a single curve from the example dataset **rTPC** - a dataset of 60 TPCs of respiration and photosynthesis of the aquatic algae, *Chlorella vulgaris*. We can plot the data using **ggplot2**. ```{r first_plot, fig.width=6, fig.height = 4} # load in data data("chlorella_tpc") # keep just a single curve d <- filter(chlorella_tpc, curve_id == 1) # show the data ggplot(d, aes(temp, rate)) + geom_point() + theme_bw(base_size = 12) + labs( x = 'Temperature (ºC)', y = 'Metabolic rate', title = 'Respiration across temperatures' ) ``` For each model, **rTPC** has helper functions that estimate sensible start values (**get_start_vals()**), lower (**get_lower_lims()**) and upper (**get_upper_lims()**) limits. To demonstrate this, we shall use the sharpe-schoolfield model for high temperature inactivation only. ```{r} # choose model mod = 'sharpschoolhigh_1981' # get start vals start_vals <- get_start_vals( d$temp, d$rate, model_name = 'sharpeschoolhigh_1981' ) # get limits low_lims <- get_lower_lims(d$temp, d$rate, model_name = 'sharpeschoolhigh_1981') upper_lims <- get_upper_lims( d$temp, d$rate, model_name = 'sharpeschoolhigh_1981' ) start_vals low_lims upper_lims ``` One problem with most methods of fitting models in R using non-linear least squares regression is that they are sensitive to the choice of starting parameters. This problem also occurs in previous specialist R packages that help fit thermal performance curves, such as [devRate](https://github.com/frareb/devRate/) and [temperatureresponse](https://github.com/low-decarie/temperatureresponse). These methods can fail entirely or give different parameter estimates between multiple runs of the same code. To overcome this, we recommend using the R package [**nls.multstart**](https://github.com/padpadpadpad/nls.multstart), which uses **minpackLM::nlsLM()**, but allows for multiple sets of starting parameters. It iterates through multiple starting values, attempting a fit with each set of start parameters. The best model is then picked using AIC scores. Using **nls_multstart()**, we will use Latin Hypercube Sampling (LHS), which can only be used when `iter` is set to a single number. Instead of sampling from a uniform distribution across the bounds of each parameter, these methods try to take a set of samples from the range of parameter values that covers the parameter space optimally for any given set of parameters. _This approach can result in less iterations being needed to get the same reliability of model fitting than either the shotgun or grid-start approaches_. ```{r fit_model} # fit model fit <- nls_multstart( rate ~ sharpeschoolhigh_1981(temp = temp, r_tref, e, eh, th, tref = 15), data = d, iter = 500, start_lower = start_vals - 10, start_upper = start_vals + 10, lower = low_lims, upper = upper_lims, supp_errors = 'Y', lhstype = 'random' ) fit ``` To calculate additional parameters of interest, we can use **rTPC::calc_params()**. This function uses high resolution predictions of the fitted model to estimate traits associated with a thermal performance curve. The currently available methods can be viewed by running `?calc_params`. For example, we may be interested in variation in the optimum temperature, $T_{opt}$, given that we adapted algae to different temperatures. ```{r} # calculate additional traits calc_params(fit) %>% # round for easy viewing mutate_all(round, 2) ``` Finally for this introduction, we can get predictions of our model using **broom::augment()**, which is similar to **predict()**. These are then plotted over our original data. ```{r pred_and_plot, fig.width=6, fig.height = 4} # predict new data new_data <- data.frame(temp = seq(min(d$temp), max(d$temp), 0.5)) preds <- augment(fit, newdata = new_data) # plot data and model fit ggplot(d, aes(temp, rate)) + geom_point() + geom_line(aes(temp, .fitted), preds, col = 'blue') + theme_bw(base_size = 12) + labs( x = 'Temperature (ºC)', y = 'Metabolic rate', title = 'Respiration across temperatures' ) ``` ```{r tot_time, include=FALSE} tot_time <- lubridate::as.duration(lubridate::now() - runstart) ``` [Built in `r tot_time`s]{style="opacity: 0.1;font-size: small;"}