---
title: "How to use normref"
author: "Marieke E. Timmerman, Klazien de Vries, and Hannah M. Heister"
date: "`r Sys.Date()`"
output:
  html_document:
    self_contained: true
bibliography: ../inst/REFERENCES.bib  
vignette: >
  %\VignetteIndexEntry{How to use normref}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
is_not_cran <- identical(Sys.getenv("NOT_CRAN"), "true")
```



# Introduction

The `normref` package provides tools to calculate continuous norms for psychological test scores and do sample planning, where the norms depend on a single continuous variable. The latter is denoted as the norm predictor, which typically is age. Applying the norms results in standardized scores such as percentiles, T-scores, or IQ scores. Such standardized scores allow for norm-referenced interpretation, that is, comparing an individual’s performance to that of a relevant reference population.

For example, an IQ score expresses an individual’s performance relative to the general population of the same age. To estimate such norms, `normref` requires raw scores from a representative normative sample.  
This vignette illustrates how to derive norms from such a sample, following the approach described in a tutorial [@timmerman2021tutorial], and how to do sampling planning, following the appproach based on a distribution free polynomial regression model for group means [@egberink2026COTAN,@hessen2026richtlijnen].

---
  
# Workflow overview
  
The main workflow in `normref` consists of:
  
1. Shaping the data (``shape_data``)  
2. Fitting candidate models (`fb_select`)  
3. Inspecting model fit (BIC, worm plots, centile plots)  
4. Creating norm tables (`normtable_create`, `plot_normtable`)  
5. *(Optional)* Adding confidence intervals (CIs) for normed scores based on age-dependent reliability estimates 

Advanced functionality includes: norming of composite scores, norms for new data, user-defined distributions, and sample size planning.

---
  
# Workflow
  
## Example: norming a single test
  
  We illustrate the process using IDS data [@grob2018ids; @grob2018idsNL].
An anonymized and perturbed version of this data is included in `normref`.  
We focus on Test 14, with $n=1660$ children and raw scores ranging from 2–34.

Three candidate distributions are considered:  

- Beta-Binomial (BB): suitable for bounded integer scores  
- Box-Cox Power Exponential (BCPE): flexible for continuous scores  
- Normal (NO): a simpler benchmark  

To compare candidate models, we use the free order model selection procedure with the Bayesian Information Criterion (BIC) as selection criterion, which has been shown to be effective in model selection [@voncken2019model].

```{r loadlibrary, echo=TRUE, include=TRUE}
library(normref)
library(gamlss.dist) # also needed for vignette
```
First, we load the data.

```{r getdata, echo=TRUE, results='hide'}
get(data("ids_data"))
```
## Shape the data
 
Before fitting any distributions to model the raw scores of Test 14 as a function of age, we must ensure that the data are in the correct format for gamlss implementation. This is handled by the function `shape_data`.

The `shape_data` function verifies whether the response variable is properly formatted and, if necessary, reshapes it. For binomial-type distributions like the BB, gamlss requires the response to be a two-column matrix: one column for the number of correct items and one for the number of incorrect items. For each individual, these columns sum to the maximum possible raw score on the test.

By default, `shape_data` uses the highest observed raw score in the data as the maximum. This can be overridden with the argument max_score. The function also removes any cases with missing values for the raw scores or age.

Applying `shape_data` to our data produces a new dataframe, `mydata_BB`, which includes the additional column `"shaped_score"` containing the scores in the correct format for BB models:
```{r shapedata, echo=TRUE, include=TRUE}
mydata_BB <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y14",
                     family = "BB",
                     max_score = 34)
```

We can also apply shape data to obtain scores in the correct format for the BCPE model, which requires nonzero, positive scores.

```{r selectmodel_BCPE_y14, echo=TRUE, include=TRUE}
mydata_BCPE <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y14",
                     family = "BCPE")

```
Because Test 14 has no results with zero scores, the scores were not transformed. A new column `"shaped_score"` was created, but it is identical to `"y14"`.

## Fit candidate models

To fit the candidate models, we use the model selection function `fb_select()`, which implements the free order model selection procedure [@voncken2019model].
We first apply this procedure to the BB distribution. Note that we pass `"shaped_score"` as the `score_name`, because this column contains the reshaped response variable created by `shape_data()`.

```{r selectmodel_BB_y14, echo=TRUE, include=TRUE}
mod_BB_y14 <- fb_select(data = mydata_BB,
                       age_name = "age",
                       score_name = "shaped_score",
                       family = "BB",
                       selcrit = "BIC")
```
The free order model selection procedure selects a second degree polynomial of age for both the $\mu$ and $\sigma$ parameters of the BB model.

As alternative models, we also fit the BCPE and NO models.
```{r selectmodel_BCPENO_y14, echo=TRUE, include=TRUE}
mod_BCPE_y14 <- fb_select(data = mydata_BCPE,
                         age_name = "age",
                         score_name = "shaped_score",
                         family = "BCPE",
                         selcrit = "BIC") 

mod_NO_y14 <- fb_select(data = ids_data,
                      age_name = "age",
                      score_name = "y14",
                      family = "NO",
                      selcrit = "BIC")
``` 

## Inspect model fit
  

After fitting the BB, BCPE, and NO models, we first compare their BIC values. Lower BIC values indicate better model fit.
  
```{r showfit, echo=TRUE, include=TRUE}
mod_BB_y14$selcrit
mod_BCPE_y14$selcrit
mod_NO_y14$selcrit
``` 
The results show that the BCPE model provides the best fit, as it has the lowest BIC. The NO model follows at some distance, while the BB model performs worst according to BIC.

To further evaluate model fit, we inspect worm plots [@buuren2001worm]. Worm plots display the relationship between empirical quantiles and model-implied quantiles.

* Ideally, the points (the “worm”) are close to the red horizontal zero line,

* and most values fall inside the funnel-shaped confidence bands (dashed lines).

For Test 14, the worm plot confirms that the BCPE model has the best fit: the worm is relatively flat, and nearly all values lie within the confidence bands. By contrast, the BB and NO models show some deviations from the expected pattern in the intervals [-4, -2] and [-4, 0] respectively.
```{r show_wormplot, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
gamlss::wp(mod_BCPE_y14, ylim.all = 2)
gamlss::wp(mod_BB_y14, ylim.all = 2) 
gamlss::wp(mod_NO_y14, ylim.all = 2) 
``` 

Because both BIC values and worm plots point to the BCPE as the superior model, we select this model for further inspection and prediction.

To fully evaluate the fit, worm plots should also be created per age interval, since overall diagnostics can hide local misfit across the age range.  
This can be done with the arguments `xvar` (the age variable) and `n.inter` (the number of age intervals). For example, for the BCPE model:  

```{r show_wormplot_age, echo=TRUE, include=TRUE, fig.width=6, fig.height=4, warning=FALSE, message = FALSE}
gamlss::wp(mod_BCPE_y14, xvar = age, ylim.worm = 2, n.inter = 4)
```

Note that a sample size of at least 200 per worm plot is recommended for stable results [@buuren2001worm].

Once the best-fitting model has been selected (BCPE in this case), we can visually inspect its fit using centile curves. These curves show the predicted centiles of test scores as a function of age, allowing us to see how scores are distributed across the age range.

```{r plot_model_BCPE, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
gamlss::centiles(mod_BCPE_y14, cent = c(5,25,50,75,95))
``` 
These centile plots complement worm plots and BIC comparisons by providing a direct, visual summary of the distribution of scores across ages, making it easier to assess model fit and to interpret age-dependent norms.

For most distributions, the centiles function in GAMLSS works well. However, for binomial-type distributions such as the BB distribution, the standard centiles function does not always render correctly. For these cases, `centiles_bin()` provides an alternative (e.g., `centiles_bin(mod_BB_y14, xvar = age, cent = c(5,25,50,75,95))`).



## Create norm tables

After selecting the best-fitting model, we can generate norm tables using the function `normtable_create()`. By default, the function produces z-scores (i.e., $\mu_{age} = 0, \sigma_{age} = 1$), but other norm types can also be requested:

- T-scores: ($\mu_{age} = 50, \sigma_{age} = 10$)
- IQ-scores: ($\mu_{age} = 100, \sigma_{age} = 15$)

The function returns a list containing:

- `cdf_matrix`: the percentiles for each raw score and age  
- `norm_matrix`: the corresponding normed scores in the requested norm type  
- `norm_sample` and `cdf_sample`: the percentile and normed scores for the observed sample  

Optionally, an Excel file can be generated by specifying `excel_name`. The Excel file contains:

1. Norm matrix tab: first column shows ages, first row shows raw scores, with the normed score at each age/score combination.  
2. Norm sample tab: the ages and associated normed scores of the observed sample.  

For example, in the norm matrix:

- At age 4.0, a raw score of 10 corresponds to an IQ of 129.1 (almost 2 SD above the mean).  
- At age 6.2, the same raw score of 10 corresponds to an IQ of 97.0.  

In this vignette, we save results to a temporary Excel file (`tempfile()`) to comply with CRAN policies.  
In real applications, users can specify a permanent path, e.g.  
`excel_name = "norms_y14_1.xlsx"`.

```{r, normtable_BCPE, echo=TRUE, include=TRUE}
temp_file <- tempfile(fileext = ".xlsx")
norm_mod_BCPE_y14  <- normtable_create(model = mod_BCPE_y14,
                                      data = mydata_BCPE,
                                      age_name = "age",
                                      score_name = "shaped_score", 
                                      normtype = "IQ", 
                                      excel_name = temp_file,
                                      excel = TRUE)
head(norm_mod_BCPE_y14[["norm_matrix"]][, c(1, 11)],
     n = 15)
``` 

We can visualize the generated norm tables using the function `plot_normtable()`. This function plots the percentiles as a function of age for each raw score. Each line represents a specific raw score:

- Lines are ordered from highest score (most difficult) at the top to lowest score (easiest) at the bottom.  
- Green circles indicate the combinations of raw scores and ages observed in the sample.  

This visualization allows us to inspect how raw scores translate to normed scores across ages and to see whether the norms behave as expected.

```{r plot_normtable_y14, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
plot_normtable(normtable = norm_mod_BCPE_y14)
``` 

## Confidence intervals for normed scores

By default, `normtable_create()` returns point estimates of the norms. Optionally, you can also calculate confidence intervals (CIs) for the normed scores. The default confidence level is 0.95 (95% interval).  

The CIs are estimated using an extension of the group model from classical test theory, adapted for age-dependent scores [@heister2024item]. To compute the upper and lower boundaries, you need a dataframe containing two vectors:

- `age`: the ages at which the norms are evaluated  
- `rel`: the estimated test reliability at each age  

The resulting CIs are stored in the output tables with the suffix `_lower` and `_upper` (e.g., `norm_sample_lower` and `norm_sample_upper`).  

For distributions without a closed-form mean and standard deviation, such as the BCPE distribution, computing the CIs can be slower because these quantities need to be estimated via resampling.

In the example below, we use the fictitious age-dependent reliabilities provided in `ids_rel_data`. Later, in the section *Estimating test reliability dependent on age*, we show how such reliabilities can be derived from individual item scores using a sliding window approach. Note that for Test 14, we cannot estimate these reliabilities from raw scores, as the individual items are not available.

```{r create_normtable_y14, echo=TRUE, results='hide'}
get(data("ids_rel_data"))
temp_file <- tempfile(fileext = ".xlsx")

norm_mod_BCPE_y14  <- normtable_create(model = mod_BCPE_y14,
                                      data = mydata_BCPE,
                                      age_name = "age",
                                      score_name = "shaped_score",
                                      datarel = ids_rel_data,
                                      normtype = "IQ",
                                      excel_name = temp_file,
                                      excel = TRUE)
``` 

# Advanced features

## Norming of composite scores

Composite scores are calculated by combining multiple subtest scores, typically as the sum of their standardized scores (z-scores). To illustrate this process, we first fit a BCPE model to Test 7. Based on the wormplot inspection (not shown), the model fits the data well. Note that fitting BCPE models can be computationally intensive.

```{r modelnorms_y7, echo=TRUE, include=TRUE}
if (is_not_cran) {
mydata_BCPE_y7 <- shape_data(data = ids_data,
                     age_name = "age",
                     score_name = "y7",
                     family = "BCPE")
mod_BCPE_y7 <- fb_select(data = mydata_BCPE_y7,
                        age_name = "age",
                        score_name = "shaped_score",
                        family = "BCPE",
                        selcrit = "BIC")
norm_mod_BCPE_y7 <- normtable_create(model = mod_BCPE_y7,
                                    data = mydata_BCPE_y7,
                                    age_name = "age",
                                    score_name = "shaped_score")
} else {
norm_mod_BCPE_y7 <- readRDS(system.file("extdata", "norm_mod_BCPE_y7.rds", package = "normref"))
}
``` 
Next, we calculate composite scores based on the norm tables of Test 7 and Test 14. The workflow is as follows:

1. Create a list of the relevant subtest norm tables.
2. Use the function `composite_shape()` to compute the composite scores as the sum of z-scores across the tests.
3. Fit a normal distribution to the composite scores, since sums of normally distributed variables are  normally distributed.
4. Generate a norm table for the composite scores using `normtable_create()`.


```{r modelnormcompositescore, echo=TRUE, include=TRUE}
composite <- list(norm_mod_BCPE_y7,
                  norm_mod_BCPE_y14)

composite_data <- composite_shape(normtables = composite)

modNO_comp <- fb_select(
  data = composite_data,
  age_name = "age",
  score_name = "z_sum",
  family = "NO",
  selcrit = "BIC"
)

norm_modNO_comp <- normtable_create(modNO_comp,
                                   composite_data,
                                   age_name = "age",
                                   score_name = "z_sum",
                                   cont_cor = TRUE)
``` 
## Norms for a new sample of individuals

It is possible to calculate normed scores directly for a new sample, using their raw scores and ages. 

In this example, we select the raw scores and ages of the first five individuals in our dataset. We then use the previously estimated model to obtain their normed scores and display the results.

```{r newsamplenorm, echo=TRUE, include=TRUE}
newdata <- ids_data[1:5,c("age","y14")]
norm_mod_BCPE_newdata <- normtable_create(model = mod_BCPE_y14,
                                         data = newdata,
                                         age_name = "age",
                                         score_name = "y14", 
                                         new_data = TRUE,
                                         datarel = ids_rel_data)
norm_mod_BCPE_newdata[["norm_sample"]]
``` 

## Norming with a truncated model

Free order model selection with the `fb_select` is also available for distributions that are defined by the user, for example truncated distributions generated with the `gamlss.tr` package. Using truncated distributions can improve model fit when continuous distributions are applied to scores that have natural bounds. By explicitly accounting for minimum and maximum possible values, truncation prevents unrealistic predictions outside the valid score range and often leads to more accurate norm estimates. We illustrate the use of a truncated NO distribution for the `ids_kn_data` data, which consists of 34 binary item scores. This example uses a NO distribution for demonstration purposes only. In practice, users should explore several candidate distributions to select the one that best fits their data.

```{r getdata_ids_kn_data, echo=TRUE, results='hide'}
get(data("ids_kn_data"))
```
To fit a truncated NO model, we first create a new distribution using the `gen.trun()` function. We specify truncation at both ends of the distribution (`type="both"`) with a minimum score of 0 and a maximum score of 34 (`par=c(0,34)`). We apply this to the NO family (`family="NO"`) and name the new distribution `tr34`, resulting in a distribution called `NOtr34`.

Next, we shape the data appropriately for the truncated distribution and fit the model.

```{r fit truncated model, echo=TRUE, include=TRUE}

gamlss.tr::gen.trun(par=c(0,34), family="NO", name="tr34", type="both")

mydata_NOtr34 <- shape_data(data = ids_kn_data,
                     age_name = "age_years",
                     score_name = "rawscore",
                     family = "NOtr34")
mod_NOtr34_KN <- fb_select(data = mydata_NOtr34,
                        age_name = "age_years",
                        score_name = "shaped_score",
                        family = "NOtr34")
```


## Estimating test reliability dependent on age

To calculate confidence intervals for the normed scores, we first estimate the reliability of the test as a function of age using a sliding window approach. This approach is implemented in the function `reliability_window()`, which computes Cronbach's alpha within age-specific groups [@heister2024item]. We again use the `ids_kn_data` dataset.

To estimate the reliability of the test, information at the item-level is needed.
First, we identify the columns in the dataset corresponding to the items.

```{r item_variables_ids_kn_data, echo=TRUE, include=TRUE}
item_variables_kn <- which(substr(colnames(ids_kn_data),1,2) == "KN")
```

To choose appropriate parameters for estimating age-dependent reliabilities, we first explore a range of window widths and step sizes. This helps ensure that the estimated reliabilities are stable and vary smoothly with age. The function `different_rel()` allows us to compare the reliability estimates across different settings:

```{r selectwindowstep_ids_kn_data, echo=TRUE, include=TRUE}
rel_int <- different_rel(data = ids_kn_data, 
                         item_variables = item_variables_kn, 
                         step_window = c(0.5, 1 , 2, 5, 10, 20), 
                         age_name = "age_years",
                         min_agegroup = 5,
                         max_agegroup = 20,
                         step_agegroup = c(0.5,1,1.5,2))
```

We then visualize the reliability estimates across age for the different window widths and step sizes to guide our selection. Ideally, the step size should be small enough to capture smooth changes in reliability, and the window width should be wide enough to assume approximately equal true scores within the window. 

```{r plot_selectwindow_ids_kn_data, echo=TRUE, include=TRUE, fig.width=6, fig.height=4}
plot_drel(rel_int, ncol = 2)
```

For this example, we select a step size of 2 and a window width of 2.

```{r estrel_ids_kn_data, echo=TRUE, include=TRUE}
rel_est <- reliability_window(data = ids_kn_data,
                   age_name = "age_years",
                   item_variables = item_variables_kn,
                   window_width = 2)
```

When we estimate the normed scores using the truncated BCPE model, we can now construct reliability based CIs for the normed scores.

```{r calculate confidence intervals, echo=TRUE, include=TRUE}
norm_mod_BCPEtr34_KN  <- normtable_create(model = mod_NOtr34_KN,
                                     data = mydata_NOtr34,
                                     age_name = "age_years",
                                     score_name = "shaped_score",
                                     datarel = rel_est,
                                     normtype = "IQ")
```

## Sample size planning for continuous norming 
To estimate the norms, `normref` requires raw scores from a representative normative sample. In order to achieve sufficient precision, the sample size needs to be sufficiently large. For continuous norming, there are no guidelines for the minimally required sample size. For traditional norming (i.e., for a single group), there are guidelines. An approach to do sample size planning for continuous norming is to define groups, based on reasonable intervals of the norm predictor, and require at least the same estimation accuracy for each group as would be required for traditional norming [@egberink2026COTAN]. One could compute the optimal sample sizes per group under a distribution free polynomial regression model for group means, following @hessen2026richtlijnen. In the assumed continuous norming model, group means are first estimated and then smoothed using a polynomial model that regresses group means on age. The function `sample_size_poly()` determines the group sample sizes such that the precision of the estimated means (not higher-order moments) is at least as high as under traditional norming.

There are two variants, one assuming homoscedasticity, and one using estimated variances per group. Before data collection, one could estimate the sample sizes assuming homoscedasticity. During and after data  collection, one could estimate using the estimated variances per group.

The function requires as input the number of groups (e.g., age groups), the polynomial degree for modeling the population mean as a function of group number, the reference sample size per group under traditional norming, where typical values are 400, 300 and 200 [@egberink2026COTAN], and possibly the estimated variances of the means in each group. 

Multiple optimal solutions may exist. By default, a single solution is returned, defined as the solution minimizing the difference between the largest and smallest group sample sizes. This yields a practically balanced design. Alternatively, users can inspect all optimal solutions (`solution_type = "all"`) or examine the range of optimal sample sizes per group (`solution_type = "range"`). Note that the function can take a while to compute.

Now, we estimate the sample sizes for 6 groups, assuming a polynomial degree of 3, for a reference sample size per group under traditional norming of 400. In the table, the sample size per group (n) is presented for each of the 5 groups. 

```{r calculate sample size homoscedasticity, echo=TRUE, include=TRUE}
if (is_not_cran) {
res1 <- sample_size_poly(
  n_groups = 5,
  poly_degree = 3,
  n0 = 400
)
} else {
res <- readRDS(system.file("extdata", "sample_size_res.rds", package = "normref"))
res1 <- res$res1
}
res1
```

Here, we illustrate how to estimate the sample sizes based on sample data, assuming a polynomial degree of 2, for a reference sample size per group under traditional norming of 300. 

```{r calculate sample size heteroscedasticity, echo=TRUE, include=TRUE}
if (is_not_cran) {
ids_data$age_group <- cut(ids_data$age, breaks = seq(6, 18, by = 1))
v <- tapply(ids_data$y7, ids_data$age_group, var, na.rm = TRUE)

res2 <- sample_size_poly(
 n_groups = length(v),
 poly_degree = 2,
 n0 = 300,
 variances = v
 )
} else {
res <- readRDS(system.file("extdata", "sample_size_res.rds", package = "normref"))
res2 <- res$res2
}
res2
```


# References