--- title: The Sequential Model with kerasnip output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{The Sequential Model with kerasnip} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Introduction This vignette provides a comprehensive guide to using `kerasnip` to define sequential Keras models within the `tidymodels` ecosystem. `kerasnip` bridges the gap between the imperative, layer-by-layer construction of Keras models and the declarative, specification-based approach of `tidymodels`. Here, we will focus on `create_keras_sequential_spec()`, which is ideal for models where layers form a plain stack, with each layer having exactly one input tensor and one output tensor. ## Setup We'll start by loading the necessary packages: ``` r library(kerasnip) library(tidymodels) #> ── Attaching packages ────────────────────────────────────────────────────────────────────────────── tidymodels 1.5.0 ── #> ✔ broom 1.0.12 ✔ recipes 1.3.2 #> ✔ dials 1.4.3 ✔ rsample 1.3.2 #> ✔ dplyr 1.2.1 ✔ tailor 0.1.0 #> ✔ ggplot2 4.0.3 ✔ tidyr 1.3.2 #> ✔ infer 1.1.0 ✔ tune 2.1.0 #> ✔ modeldata 1.5.1 ✔ workflows 1.3.0 #> ✔ parsnip 1.5.0 ✔ workflowsets 1.1.1 #> ✔ purrr 1.2.2 ✔ yardstick 1.4.0 #> ── Conflicts ───────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ── #> ✖ purrr::discard() masks scales::discard() #> ✖ dplyr::filter() masks stats::filter() #> ✖ dplyr::lag() masks stats::lag() #> ✖ recipes::step() masks stats::step() library(keras3) #> #> Attaching package: 'keras3' #> The following object is masked from 'package:yardstick': #> #> get_weights #> The following object is masked from 'package:infer': #> #> generate ``` ## When to use `create_keras_sequential_spec()` A `Sequential` model in Keras is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. `kerasnip`'s `create_keras_sequential_spec()` function is designed to define such models in a `tidymodels`-compatible way. Instead of building the model layer-by-layer imperatively, you define a named, ordered list of R functions called `layer_blocks`. Each `layer_block` function takes a Keras model object as its first argument and returns the modified model. `kerasnip` then uses these blocks to construct the full Keras Sequential model. For models with more complex, non-linear topologies (e.g., multiple inputs/outputs, residual connections, or multi-branch models), you should use `create_keras_functional_spec()`. ## Creating a `kerasnip` Sequential Model Specification Let's define a simple sequential model with three dense layers. First, we define our `layer_blocks`: ``` r # The first block must initialize the model. `input_shape` # is passed automatically. input_block <- function(model, input_shape) { keras_model_sequential(input_shape = input_shape) } # A reusable block for hidden layers. `units` will become a tunable parameter. hidden_block <- function(model, units = 32, activation = "relu") { model |> layer_dense(units = units, activation = activation) } # The output block. `num_classes` is passed automatically for classification. output_block <- function(model, num_classes, activation = "softmax") { model |> layer_dense(units = num_classes, activation = activation) } ``` Now, we use `create_keras_sequential_spec()` to generate our `parsnip` model specification function. We'll name our model `my_simple_mlp`. ``` r create_keras_sequential_spec( model_name = "my_simple_mlp", layer_blocks = list( input = input_block, hidden_1 = hidden_block, hidden_2 = hidden_block, output = output_block ), mode = "classification" ) ``` ## A common debugging workflow: `compile_keras_grid()` In the original Keras guide, a common workflow is to incrementally add layers and call `summary()` to inspect the architecture. With `kerasnip`, the model is defined declaratively, so we can't inspect it layer-by-layer in the same way. However, `kerasnip` provides a powerful equivalent: `compile_keras_grid()`. This function checks if your `layer_blocks` define a valid Keras model and returns the compiled model structure, all without running a full training cycle. This is perfect for debugging your architecture. Let's see this in action with a CNN architecture: ``` r # Define CNN layer blocks cnn_input_block <- function(model, input_shape) { keras_model_sequential(input_shape = input_shape) } cnn_conv_block <- function( model, filters = 32, kernel_size = 3, activation = "relu" ) { model |> layer_conv_2d( filters = filters, kernel_size = kernel_size, activation = activation ) } cnn_pool_block <- function(model, pool_size = 2) { model |> layer_max_pooling_2d(pool_size = pool_size) } cnn_flatten_block <- function(model) { model |> layer_flatten() } cnn_output_block <- function(model, num_classes, activation = "softmax") { model |> layer_dense(units = num_classes, activation = activation) } # Create the kerasnip spec function create_keras_sequential_spec( model_name = "my_cnn", layer_blocks = list( input = cnn_input_block, conv1 = cnn_conv_block, pool1 = cnn_pool_block, flatten = cnn_flatten_block, output = cnn_output_block ), mode = "classification" ) # Create a spec instance for a 28x28x1 image cnn_spec <- my_cnn( conv1_filters = 32, conv1_kernel_size = 5, compile_loss = "categorical_crossentropy", compile_optimizer = "adam" ) # Prepare dummy data with the correct shape. # We create a list of 28x28x1 arrays. x_dummy_list <- lapply( 1:10, function(i) array(runif(28 * 28 * 1), dim = c(28, 28, 1)) ) x_dummy_df <- tibble::tibble(x = x_dummy_list) y_dummy <- factor(sample(0:9, 10, replace = TRUE), levels = 0:9) y_dummy_df <- tibble::tibble(y = y_dummy) # Use compile_keras_grid to get the model summary # A one-row grid with no extra hyperparameters builds the spec as-is. compilation_results <- compile_keras_grid( spec = cnn_spec, grid = tibble::tibble(.rows = 1L), x = x_dummy_df, y = y_dummy_df ) # Print the summary compilation_results |> select(compiled_model) |> pull() |> pluck(1) |> summary() #> Model: "sequential" #> ┌─────────────────────────────────────────────────────┬────────────────────────────────────────┬─────────────────────── #> │ Layer (type) │ Output Shape │ Param # #> ├─────────────────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────── #> │ conv2d (Conv2D) │ (None, 24, 24, 32) │ 832 #> ├─────────────────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────── #> │ max_pooling2d (MaxPooling2D) │ (None, 12, 12, 32) │ 0 #> ├─────────────────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────── #> │ flatten (Flatten) │ (None, 4608) │ 0 #> ├─────────────────────────────────────────────────────┼────────────────────────────────────────┼─────────────────────── #> │ dense (Dense) │ (None, 10) │ 46,090 #> └─────────────────────────────────────────────────────┴────────────────────────────────────────┴─────────────────────── #> Total params: 46,922 (183.29 KB) #> Trainable params: 46,922 (183.29 KB) #> Non-trainable params: 0 (0.00 B) ``` ``` r compilation_results |> select(compiled_model) |> pull() |> pluck(1) |> plot(show_shapes = TRUE) ``` ![model](images/model_plot_shapes_s.png){fig-alt="A picture showing the model shape"}