---
title: 'Do Bigger Skaters Translate Better in the Playoffs?'
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Do Bigger Skaters Translate Better in the Playoffs?}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = '#>',
  fig.align = 'center',
  out.width = '92%',
  fig.width = 7,
  fig.height = 4.6
)

make_table <- function(x, caption, digits = 3) {
  knitr::kable(x, caption = caption, digits = digits)
}
```

## Question

The playoff-size cliché is familiar: heavier teams supposedly survive the grind,
win the walls, and keep scoring when space disappears. But that claim hides two
different questions:

- Do bigger skaters score more in the playoffs?
- Do bigger skaters lose less scoring when regular-season hockey becomes
  playoff hockey?

This article uses `skater_playoff_statistics()`, `skater_statistics()`, and
`players()` to compare career regular-season scoring, career playoff scoring,
and the gap between the two.

## Build Player Table

We keep salary-cap-era skaters with meaningful regular-season and playoff
samples. The unit is one player, not one season.

```{r data}
# Pull scoring and bio tables.
playoff_stats <- nhlscraper::skater_playoff_statistics()
career_stats <- nhlscraper::skater_statistics()[, c(
  'playerId',
  'rsGamesPlayed',
  'rsPoints',
  'positionCode'
)]
player_bios <- nhlscraper::players()[, c(
  'playerId',
  'playerFullName',
  'height',
  'weight'
)]

# Join player-level sources.
analysis_tbl <- merge(
  playoff_stats,
  career_stats,
  by = c('playerId', 'positionCode'),
  all.x = TRUE
)
analysis_tbl <- merge(
  analysis_tbl,
  player_bios,
  by = 'playerId',
  all.x = TRUE
)

# Keep modern skaters with stable samples.
analysis_tbl <- analysis_tbl[
  !is.na(analysis_tbl[['height']]) &
    !is.na(analysis_tbl[['weight']]) &
    analysis_tbl[['firstSeasonForGameType']] >= 20052006 &
    analysis_tbl[['gamesPlayed']] >= 20 &
    analysis_tbl[['rsGamesPlayed']] >= 200,
  ,
  drop = FALSE
]

# Fill names and compute rates.
analysis_tbl[['playerFullName']] <- ifelse(
  is.na(analysis_tbl[['playerFullName']]) |
    analysis_tbl[['playerFullName']] == '',
  paste(
    analysis_tbl[['skaterFirstName']],
    analysis_tbl[['skaterLastName']]
  ),
  analysis_tbl[['playerFullName']]
)
analysis_tbl[['regularPPG']] <-
  analysis_tbl[['rsPoints']] / analysis_tbl[['rsGamesPlayed']]
analysis_tbl[['playoffPPG']] <-
  analysis_tbl[['points']] / analysis_tbl[['gamesPlayed']]
analysis_tbl[['playoffLift']] <-
  analysis_tbl[['playoffPPG']] - analysis_tbl[['regularPPG']]
analysis_tbl[['positionBucket']] <- ifelse(
  analysis_tbl[['positionCode']] == 'D',
  'Defense',
  'Forward'
)

# Assign equal-count weight quartiles.
weight_rank <- rank(
  analysis_tbl[['weight']],
  ties.method = 'first'
) / nrow(analysis_tbl)
analysis_tbl[['weightQuartile']] <- cut(
  weight_rank,
  breaks = c(0, 0.25, 0.50, 0.75, 1),
  include.lowest = TRUE,
  labels = c('Lightest', 'Second', 'Third', 'Heaviest')
)
nrow(analysis_tbl)
```

The sample has `r nrow(analysis_tbl)` skaters. That filters out one-series
mirages while still leaving enough players to compare body types.

## Level Versus Translation

First, compare regular-season scoring, playoff scoring, and playoff lift by
weight quartile.

```{r quartile-table}
# Summarize scoring by weight quartile.
quartile_summary <- aggregate(
  cbind(regularPPG, playoffPPG, playoffLift) ~ weightQuartile,
  data = analysis_tbl,
  FUN = mean
)
quartile_counts <- as.data.frame(table(analysis_tbl[['weightQuartile']]))
names(quartile_counts) <- c('weightQuartile', 'n')
quartile_summary <- merge(
  quartile_summary,
  quartile_counts,
  by = 'weightQuartile'
)
quartile_summary <- quartile_summary[
  match(levels(analysis_tbl[['weightQuartile']]), quartile_summary[['weightQuartile']]),
  c('weightQuartile', 'n', 'regularPPG', 'playoffPPG', 'playoffLift')
]
make_table(
  quartile_summary,
  caption = 'Regular-season scoring, playoff scoring, and playoff lift by weight quartile.',
  digits = 3
)
```

The most important distinction is level versus translation. Lighter skaters can
still post higher raw scoring rates. Bigger skaters can still translate a bit
better relative to their own regular-season baseline. Those are different
claims, and mixing them together is how playoff clichés become sloppy.

```{r quartile-plot, fig.cap = 'Playoff scoring level and playoff lift by weight quartile.'}
# Plot playoff scoring and playoff lift.
old_par <- graphics::par(no.readonly = TRUE)
graphics::par(mfrow = c(1, 2), mar = c(8, 4, 3, 1))
graphics::boxplot(
  playoffPPG ~ weightQuartile,
  data = analysis_tbl,
  col = c('#d8f3dc', '#b7e4c7', '#74c69d', '#2d6a4f'),
  border = '#1b4332',
  las = 2,
  xlab = '',
  ylab = 'Playoff Points Per Game'
)
graphics::barplot(
  quartile_summary[['playoffLift']],
  names.arg = quartile_summary[['weightQuartile']],
  col = c('#fcbf49', '#f77f00', '#d62828', '#6a4c93'),
  border = NA,
  las = 2,
  xlab = '',
  ylab = 'Playoff Lift'
)
graphics::abline(h = 0, lty = 2, col = '#495057')
graphics::par(old_par)
```

Every group loses scoring on average. The interesting question is how much.

## Position Is Part of the Story

Weight and position are tangled. Defensemen are heavier, score less, and often
play playoff minutes that are less offense-driven. Splitting forwards and
defensemen helps keep the interpretation honest.

```{r position-summary}
# Summarize rates by position and quartile.
position_summary <- aggregate(
  cbind(regularPPG, playoffPPG, playoffLift) ~ positionBucket + weightQuartile,
  data = analysis_tbl,
  FUN = mean
)
position_counts <- aggregate(
  playerId ~ positionBucket + weightQuartile,
  data = analysis_tbl,
  FUN = length
)
names(position_counts)[names(position_counts) == 'playerId'] <- 'n'
position_summary <- merge(
  position_summary,
  position_counts,
  by = c('positionBucket', 'weightQuartile')
)
make_table(
  position_summary,
  caption = 'Scoring translation by position family and weight quartile.',
  digits = 3
)
```

The split keeps the story from becoming too neat. Size alone is not the answer.
Role, position, and baseline scoring level matter.

## Which Players Actually Rise?

Population averages are useful, but the player list is where the question feels
like hockey.

```{r risers}
# Show largest positive playoff lifts.
risers_tbl <- analysis_tbl[
  analysis_tbl[['gamesPlayed']] >= 40,
  c(
    'playerFullName',
    'positionBucket',
    'weight',
    'regularPPG',
    'playoffPPG',
    'playoffLift',
    'gamesPlayed'
  )
]
risers_tbl <- risers_tbl[order(-risers_tbl[['playoffLift']]), ]
risers_tbl <- utils::head(risers_tbl, 10)
make_table(
  risers_tbl,
  caption = 'Largest playoff scoring lifts among skaters with at least 40 playoff games.',
  digits = 3
)
```

```{r fallers}
# Show largest negative playoff lifts.
fallers_tbl <- analysis_tbl[
  analysis_tbl[['gamesPlayed']] >= 40,
  c(
    'playerFullName',
    'positionBucket',
    'weight',
    'regularPPG',
    'playoffPPG',
    'playoffLift',
    'gamesPlayed'
  )
]
fallers_tbl <- fallers_tbl[order(fallers_tbl[['playoffLift']]), ]
fallers_tbl <- utils::head(fallers_tbl, 10)
make_table(
  fallers_tbl,
  caption = 'Largest playoff scoring drops among skaters with at least 40 playoff games.',
  digits = 3
)
```

These tables are deliberately humbling. Playoff translation is not a body-type
sorting machine. Some lighter players hold up beautifully. Some bigger players
drop. The useful signal is a tendency, not a rule.

## Model the Gap

A simple model lets us ask whether height or weight carries a standalone slope
once position is included.

```{r model}
# Fit playoff-lift model.
lift_fit <- stats::lm(
  playoffLift ~ height + weight + I(positionCode == 'D'),
  data = analysis_tbl
)
lift_fit_tbl <- as.data.frame(summary(lift_fit)$coefficients)
lift_fit_tbl[['term']] <- rownames(lift_fit_tbl)
rownames(lift_fit_tbl) <- NULL
lift_fit_tbl[['term']] <- c(
  'Intercept',
  'Height',
  'Weight',
  'Defense indicator'
)
lift_fit_tbl <- lift_fit_tbl[, c(
  'term',
  'Estimate',
  'Std. Error',
  't value',
  'Pr(>|t|)'
)]
make_table(
  lift_fit_tbl,
  caption = 'Linear model of playoff scoring lift.',
  digits = 4
)
```

The plain-language result is that size is not magic. Once position is in the
model, the direct height/weight slopes are not the headline. The playoff
translation cliché is closer to a role-and-usage story than a simple
bigger-is-better story.

## What We Learned

Bigger skaters do not automatically become better playoff scorers. The lighter
groups can still lead in raw playoff points per game. But the heaviest group can
look slightly sturdier relative to its own regular-season scoring baseline.

That split answer is the point. `nhlscraper` makes it easy to combine career
stats and player bios, but the interpretation still has to respect the shape of
the data. A good mini research question is not always one where the cliché is
true or false. Sometimes the best answer is: true in one sense, false in
another.