--- title: 'Do Bigger Skaters Translate Better in the Playoffs?' output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Do Bigger Skaters Translate Better in the Playoffs?} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = '#>', fig.align = 'center', out.width = '92%', fig.width = 7, fig.height = 4.6 ) make_table <- function(x, caption, digits = 3) { knitr::kable(x, caption = caption, digits = digits) } ``` ## Question The playoff-size cliché is familiar: heavier teams supposedly survive the grind, win the walls, and keep scoring when space disappears. But that claim hides two different questions: - Do bigger skaters score more in the playoffs? - Do bigger skaters lose less scoring when regular-season hockey becomes playoff hockey? This article uses `skater_playoff_statistics()`, `skater_statistics()`, and `players()` to compare career regular-season scoring, career playoff scoring, and the gap between the two. ## Build Player Table We keep salary-cap-era skaters with meaningful regular-season and playoff samples. The unit is one player, not one season. ```{r data} # Pull scoring and bio tables. playoff_stats <- nhlscraper::skater_playoff_statistics() career_stats <- nhlscraper::skater_statistics()[, c( 'playerId', 'rsGamesPlayed', 'rsPoints', 'positionCode' )] player_bios <- nhlscraper::players()[, c( 'playerId', 'playerFullName', 'height', 'weight' )] # Join player-level sources. analysis_tbl <- merge( playoff_stats, career_stats, by = c('playerId', 'positionCode'), all.x = TRUE ) analysis_tbl <- merge( analysis_tbl, player_bios, by = 'playerId', all.x = TRUE ) # Keep modern skaters with stable samples. analysis_tbl <- analysis_tbl[ !is.na(analysis_tbl[['height']]) & !is.na(analysis_tbl[['weight']]) & analysis_tbl[['firstSeasonForGameType']] >= 20052006 & analysis_tbl[['gamesPlayed']] >= 20 & analysis_tbl[['rsGamesPlayed']] >= 200, , drop = FALSE ] # Fill names and compute rates. analysis_tbl[['playerFullName']] <- ifelse( is.na(analysis_tbl[['playerFullName']]) | analysis_tbl[['playerFullName']] == '', paste( analysis_tbl[['skaterFirstName']], analysis_tbl[['skaterLastName']] ), analysis_tbl[['playerFullName']] ) analysis_tbl[['regularPPG']] <- analysis_tbl[['rsPoints']] / analysis_tbl[['rsGamesPlayed']] analysis_tbl[['playoffPPG']] <- analysis_tbl[['points']] / analysis_tbl[['gamesPlayed']] analysis_tbl[['playoffLift']] <- analysis_tbl[['playoffPPG']] - analysis_tbl[['regularPPG']] analysis_tbl[['positionBucket']] <- ifelse( analysis_tbl[['positionCode']] == 'D', 'Defense', 'Forward' ) # Assign equal-count weight quartiles. weight_rank <- rank( analysis_tbl[['weight']], ties.method = 'first' ) / nrow(analysis_tbl) analysis_tbl[['weightQuartile']] <- cut( weight_rank, breaks = c(0, 0.25, 0.50, 0.75, 1), include.lowest = TRUE, labels = c('Lightest', 'Second', 'Third', 'Heaviest') ) nrow(analysis_tbl) ``` The sample has `r nrow(analysis_tbl)` skaters. That filters out one-series mirages while still leaving enough players to compare body types. ## Level Versus Translation First, compare regular-season scoring, playoff scoring, and playoff lift by weight quartile. ```{r quartile-table} # Summarize scoring by weight quartile. quartile_summary <- aggregate( cbind(regularPPG, playoffPPG, playoffLift) ~ weightQuartile, data = analysis_tbl, FUN = mean ) quartile_counts <- as.data.frame(table(analysis_tbl[['weightQuartile']])) names(quartile_counts) <- c('weightQuartile', 'n') quartile_summary <- merge( quartile_summary, quartile_counts, by = 'weightQuartile' ) quartile_summary <- quartile_summary[ match(levels(analysis_tbl[['weightQuartile']]), quartile_summary[['weightQuartile']]), c('weightQuartile', 'n', 'regularPPG', 'playoffPPG', 'playoffLift') ] make_table( quartile_summary, caption = 'Regular-season scoring, playoff scoring, and playoff lift by weight quartile.', digits = 3 ) ``` The most important distinction is level versus translation. Lighter skaters can still post higher raw scoring rates. Bigger skaters can still translate a bit better relative to their own regular-season baseline. Those are different claims, and mixing them together is how playoff clichés become sloppy. ```{r quartile-plot, fig.cap = 'Playoff scoring level and playoff lift by weight quartile.'} # Plot playoff scoring and playoff lift. old_par <- graphics::par(no.readonly = TRUE) graphics::par(mfrow = c(1, 2), mar = c(8, 4, 3, 1)) graphics::boxplot( playoffPPG ~ weightQuartile, data = analysis_tbl, col = c('#d8f3dc', '#b7e4c7', '#74c69d', '#2d6a4f'), border = '#1b4332', las = 2, xlab = '', ylab = 'Playoff Points Per Game' ) graphics::barplot( quartile_summary[['playoffLift']], names.arg = quartile_summary[['weightQuartile']], col = c('#fcbf49', '#f77f00', '#d62828', '#6a4c93'), border = NA, las = 2, xlab = '', ylab = 'Playoff Lift' ) graphics::abline(h = 0, lty = 2, col = '#495057') graphics::par(old_par) ``` Every group loses scoring on average. The interesting question is how much. ## Position Is Part of the Story Weight and position are tangled. Defensemen are heavier, score less, and often play playoff minutes that are less offense-driven. Splitting forwards and defensemen helps keep the interpretation honest. ```{r position-summary} # Summarize rates by position and quartile. position_summary <- aggregate( cbind(regularPPG, playoffPPG, playoffLift) ~ positionBucket + weightQuartile, data = analysis_tbl, FUN = mean ) position_counts <- aggregate( playerId ~ positionBucket + weightQuartile, data = analysis_tbl, FUN = length ) names(position_counts)[names(position_counts) == 'playerId'] <- 'n' position_summary <- merge( position_summary, position_counts, by = c('positionBucket', 'weightQuartile') ) make_table( position_summary, caption = 'Scoring translation by position family and weight quartile.', digits = 3 ) ``` The split keeps the story from becoming too neat. Size alone is not the answer. Role, position, and baseline scoring level matter. ## Which Players Actually Rise? Population averages are useful, but the player list is where the question feels like hockey. ```{r risers} # Show largest positive playoff lifts. risers_tbl <- analysis_tbl[ analysis_tbl[['gamesPlayed']] >= 40, c( 'playerFullName', 'positionBucket', 'weight', 'regularPPG', 'playoffPPG', 'playoffLift', 'gamesPlayed' ) ] risers_tbl <- risers_tbl[order(-risers_tbl[['playoffLift']]), ] risers_tbl <- utils::head(risers_tbl, 10) make_table( risers_tbl, caption = 'Largest playoff scoring lifts among skaters with at least 40 playoff games.', digits = 3 ) ``` ```{r fallers} # Show largest negative playoff lifts. fallers_tbl <- analysis_tbl[ analysis_tbl[['gamesPlayed']] >= 40, c( 'playerFullName', 'positionBucket', 'weight', 'regularPPG', 'playoffPPG', 'playoffLift', 'gamesPlayed' ) ] fallers_tbl <- fallers_tbl[order(fallers_tbl[['playoffLift']]), ] fallers_tbl <- utils::head(fallers_tbl, 10) make_table( fallers_tbl, caption = 'Largest playoff scoring drops among skaters with at least 40 playoff games.', digits = 3 ) ``` These tables are deliberately humbling. Playoff translation is not a body-type sorting machine. Some lighter players hold up beautifully. Some bigger players drop. The useful signal is a tendency, not a rule. ## Model the Gap A simple model lets us ask whether height or weight carries a standalone slope once position is included. ```{r model} # Fit playoff-lift model. lift_fit <- stats::lm( playoffLift ~ height + weight + I(positionCode == 'D'), data = analysis_tbl ) lift_fit_tbl <- as.data.frame(summary(lift_fit)$coefficients) lift_fit_tbl[['term']] <- rownames(lift_fit_tbl) rownames(lift_fit_tbl) <- NULL lift_fit_tbl[['term']] <- c( 'Intercept', 'Height', 'Weight', 'Defense indicator' ) lift_fit_tbl <- lift_fit_tbl[, c( 'term', 'Estimate', 'Std. Error', 't value', 'Pr(>|t|)' )] make_table( lift_fit_tbl, caption = 'Linear model of playoff scoring lift.', digits = 4 ) ``` The plain-language result is that size is not magic. Once position is in the model, the direct height/weight slopes are not the headline. The playoff translation cliché is closer to a role-and-usage story than a simple bigger-is-better story. ## What We Learned Bigger skaters do not automatically become better playoff scorers. The lighter groups can still lead in raw playoff points per game. But the heaviest group can look slightly sturdier relative to its own regular-season scoring baseline. That split answer is the point. `nhlscraper` makes it easy to combine career stats and player bios, but the interpretation still has to respect the shape of the data. A good mini research question is not always one where the cliché is true or false. Sometimes the best answer is: true in one sense, false in another.