`exnexstan`: Binary data and package overview

Author

Maximilian Rohde

Published

July 25, 2023

Background

The exnexstan package implements the EXNEX model for binary data introduced in “Robust exchangeability designs for early phase clinical trials with multiple strata” by Neuenschwander et al. (2015) (https://onlinelibrary.wiley.com/doi/10.1002/pst.1730) using Stan. The cmdstanr package is used to interface R with the Stan probabilistic programming language that fits the models using Markov Chain Monte Carlo (MCMC)¹.

¹ Installing and compiling cmdstan can be done through cmdstanr. See the “Installing CmdStan” documentation here: https://mc-stan.org/cmdstanr/articles/cmdstanr.html#installing-cmdstan-1

EXNEX models are an extension of Bayesian hierarchical models (BHMs). Bayesian hierarchical models are commonly used to analyze data from related studies, such as strata in a basket trial, since the partial pooling resulting from BHMs is often a good compromise between complete stratification and complete pooling. However, BHMs can perform poorly if some strata are not exchangeable with the other strata.

EXNEX is a mixture model that allows for each strata the possibility of being exchangeable (with probability $p_j$) with the other strata, or nonexchangeable with the other strata (with probability (1 - $p_j$)). This increases the robustness of the model to certain strata being not exchangeable with the others. More than two exchangeability groups may be specified in the model, although they can be difficult to fit depending on the amount of data available. Currently, exnexstan only supports a single exchangeability group.

We write out the model in mathematical notation below. For clarity, we use the names for the prior values as given in the code. \[\begin{align*} Z_{j} &\sim \operatorname{Bernoulli}(p_j) && \text{(Indicator variable of EX vs NEX)} \\ \theta_j &\sim \operatorname{Normal}(\text{mean} = \mu_{Z_{j}}, \text{sd} = \tau_{Z_{j}}) && \text{(Response probability on log-odds scale)} \\ \mu_0 &= \texttt{nex\_prior\_mean} && \text{(NEX mean)} \\ \tau_0 &= \texttt{nex\_prior\_sd} && \text{(NEX standard deviation)} \\ \mu_1 &\sim \operatorname{Normal}(\texttt{mu\_prior\_mean}, \texttt{mu\_prior\_sd}) && \text{(EX mean)} \\ \tau_1 &\sim \operatorname{Half-Normal}(\texttt{tau\_prior\_mean}, \texttt{tau\_prior\_sd}) && \text{(EX standard deviation)} \\ r_j &\sim \operatorname{Binomial}(n_j, \text{logit}^{-1}(\theta_j)) && \text{(Likelihood)} \end{align*}\]

We can also represent the model graphically as shown below.

Illustration of the binary EXNEX model showing the connections between the likelihood, priors, and variables.

To illustrate the exnexstan package, we use the data shown in Table 1 of Neuenschwander et al. (2015). The data come from a basket clinical trial conducted to assess the efficacy of the cancer therapy, Imatinib, in multiple disease strata, where the primary outcome variable is binary response to treatment.

A first example

We begin by loading the exnexstan package and supporting packages for data visualization.

library(exnexstan)

library(tidyverse)

library(bayesplot)  # MCMC Diagnostic plots
library(tidybayes)  # Bayesian geoms for ggplot
library(patchwork)  # Plotting tools
library(gt)         # HTML tables
library(extrafont)  # Allow custom fonts

# Set global ggplot theme
theme_set(cowplot::theme_cowplot(font_size=12,
                                 font_family = "Fira Sans"))

# Set bayesplot color scheme
bayesplot::color_scheme_set("viridis")

To fit EXNEX models, use the fit_exnex() function. We pass in as arguments the data:

r: the number of responses for each strata
n: the sample size for each strata

as well as the parameters to specify the priors. The data r and n must be integers; the model will raise an error if not. For this model, we assume a 0.5 prior probability of exchangeability for each strata, and we use the prior parameters used in the original analysis by Neuenschwander et al. (2015). We set adapt_delta = 0.99, to improve the accuracy of the MCMC sampling in regions of high posterior curvature (see https://mc-stan.org/rstanarm/reference/adapt_delta.html for more details).

# Fit the binary EXNEX model using the data from Neuenschwander et al. (2015)
mod <-
exnexstan::fit_exnex(
          r = c(2,0,1,6,7,3,5,1,0,3) |> as.integer(),
          n = c(15, 13, 12, 28, 29, 29, 26, 5, 2, 20) |> as.integer(),
          p_exch = rep(0.5, 10),
          mu_prior_mean = -1.73,
          mu_prior_sd = 2.616,
          tau_prior_sd = 1,
          nex_prior_mean = -1.73,
          nex_prior_sd = 2.801,
          adapt_delta = 0.99
          )

# Save the posterior draws to a data.frame
draws <- mod$draws(format = "df")

We fit the model and store the posterior draws as a data.frame. We can then take a look at the first 100 posterior draws from the draws data frame.

The data frame contains posterior draws for:

mu and tau, the parameters specifying the distribution of the exchangeable strata
theta[1] to theta[10], the response probability for each strata on the log-odds scale
p[1] to p[10], the response probability for each strata
p_mix[1] to p_mix[10], the posterior probability of being exchangeable (EX) for each strata

MCMC diagnostics

Before we interpret the results of the model, it is important to check the MCMC diagnostics. One of the advantages of using Stan and the Hamiltonian Monte Carlo sampler that it implements is that it provides automatic warnings and diagnostics when the sampler is not performing well, and the results may not be reliable.

Because the functions in exnexstan return a cmdstanr model object, we can use all the methods already implemented in cmdstanr. See the cmdstan documentation for more details. Note that cmdstanr uses the R6 object-oriented system in R, so the functions are called after the object using $.

cmdstan_diagnose() summarizes the automatic checks that Stan runs when fitting the model. Everything looks good since we observe no divergences and E-BFMI /R-hat values are satisfactory.

mod$cmdstan_diagnose()

Processing csv files: /var/folders/s_/gvf52_f51dq4lx00k_v05lp80000gn/T/RtmpQsgukS/exnex-202307251149-1-818709.csv, /var/folders/s_/gvf52_f51dq4lx00k_v05lp80000gn/T/RtmpQsgukS/exnex-202307251149-2-818709.csv, /var/folders/s_/gvf52_f51dq4lx00k_v05lp80000gn/T/RtmpQsgukS/exnex-202307251149-3-818709.csv, /var/folders/s_/gvf52_f51dq4lx00k_v05lp80000gn/T/RtmpQsgukS/exnex-202307251149-4-818709.csv

Checking sampler transitions treedepth.
Treedepth satisfactory for all transitions.

Checking sampler transitions for divergences.
No divergent transitions found.

Checking E-BFMI - sampler transitions HMC potential energy.
E-BFMI satisfactory.

Effective sample size satisfactory.

Split R-hat values satisfactory all parameters.

Processing complete, no problems detected.

We should also visually observe the trace plot using the bayesplot package. We will check a few of the relevant parameters.

# MCMC Trace plot
# `pars` select which variables to plot
# `window` specifies that we look at a certain range of samples
# `np` allows divergences to be plotted
mcmc_trace(
  x = draws,
  facet_args = list(ncol = 1),
  pars = c("tau", "theta[1]", "p[1]"),
  window = c(2000, 3000),
  np = nuts_params(mod))

More information on diagnosing MCMC sampling issues can be found here: https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html.

Summarizing results

Now that we have checked the diagnostics, we can view the results of the model.

Table summary

The summary_table() function provides an HTML table of the posterior summaries for each strata.

# Create summary table of posterior draws
exnexstan::summary_table(mod)

EXNEX Analysis Results

Graphics

Using the tidybayes extension to the ggplot package, we can create informative visualizations of the posterior distribution for each strata. First, we examine the response probabilities for each strata. We show both a density plot of the posterior distributions (top) and the posterior intervals for each strata (bottom).

Code

# Extract response probabilities from draws data.frame
# and pivot into long format for plotting
response_probabilities <-
draws |>
  select(`p[1]`:`p[10]`) |>
  pivot_longer(cols = everything(),
               names_pattern = "p\\[(.*)\\]",
               names_to = "Strata", 
               values_to = "Response Probability") |>
  mutate(Strata = Strata |> as.integer() |> as.factor())

Code

# Density plots
p1 <-
response_probabilities |>
  ggplot() +
  aes(x = `Response Probability`, group = Strata) +
  geom_density(bounds = c(0, Inf)) +
  coord_cartesian(xlim=c(0,1)) +
  labs(y = "Posterior Density")

# Credible intervals
p2 <-
response_probabilities |>
  ggplot() +
  aes(x = `Response Probability`, y = Strata) +
  stat_interval(.width = c(0.5, 0.8, 0.95, 0.99), linewidth=1.5) +
  coord_cartesian(xlim=c(0,1)) +
  labs(color = "Credible Interval") 

# Combine plots with patchwork
p1 / p2

We can also look at the posterior probability of being exchangeable, p_mix[j], for each strata.

Code

# Extract probability of exchangeability for each strata
exch_probabilities <-
draws |>
  select(`p_mix[1]`:`p_mix[10]`) |>
  pivot_longer(cols = everything(),
               names_pattern = "p_mix\\[(.*)\\]",
               names_to = "Strata", 
               values_to = "Exchangeability Probability") |>
  mutate(Strata = Strata |> as.integer() |> as.factor())

Code

# Density plots
p1 <-
exch_probabilities |>
  ggplot() +
  aes(x = `Exchangeability Probability`, group = Strata) +
  geom_density(bounds = c(0, Inf)) +
  coord_cartesian(xlim=c(0,1)) +
  labs(y = "Posterior Density")

# Credible intervals
p2 <-
exch_probabilities |>
  ggplot() +
  aes(x = `Exchangeability Probability`, y = Strata) +
  stat_interval(.width = c(0.5, 0.8, 0.95, 0.99), linewidth=1.5) +
  coord_cartesian(xlim=c(0,1)) +
  labs(color = "Credible Interval") 

# Combine the plots with patchwork
p1 / p2

Comparing different exchangeability assumptions

In this section, we fit EX, NEX, and EXNEX models and compare the results.

When fitting the EX model, there can be some MCMC convergence issues (e.g., divergences) because of the extreme posterior curvature that occurs when $\tau$ is close to zero. A common way to solve this issue is to use a non-centered parameterization of the model (see https://mc-stan.org/docs/stan-users-guide/reparameterization.html). The exnexstan::fit_exch() function implements a non-centered parameterization and can greatly improve the MCMC fitting of EX models. The model structure is equivalent to using exnexstan::fit_exnex() with p_exch equal to 1 for all strata – only the parameterization is different for the purpose of MCMC sampling.

Model fitting

# Data is the same for all the models
r = c(2,0,1,6,7,3,5,1,0,3) |> as.integer()
n = c(15, 13, 12, 28, 29, 29, 26, 5, 2, 20) |> as.integer()

# EXNEX (p_exch = 0.5 for all strata)
exnex <-
exnexstan::fit_exnex(
          r = r,
          n = n,
          p_exch = rep(0.5, 10),
          mu_prior_mean = -1.73,
          mu_prior_sd = 2.616,
          tau_prior_sd = 1,
          nex_prior_mean = -1.73,
          nex_prior_sd = 2.801,
          adapt_delta = 0.99
          )
          
# NEX (p_exch = 0 for all strata)
nex <-
exnexstan::fit_exnex(
          r = r,
          n = n,
          p_exch = rep(0, 10),
          mu_prior_mean = -1.73,
          mu_prior_sd = 2.616,
          tau_prior_sd = 1,
          nex_prior_mean = -1.73,
          nex_prior_sd = 2.801,
          adapt_delta = 0.99
          )

# EX (p_exch = 1 for all strata)
ex <-
exnexstan::fit_exch(
          r = r,
          n = n,
          mu_prior_mean = -1.73,
          mu_prior_sd = 2.616,
          tau_prior_sd = 1,
          adapt_delta = 0.99
          )

# Extract the posterior draws from each model and add a column
# to identify the model

exnex_draws <- exnex$draws(format = "df")
exnex_draws$model <- "EXNEX"

nex_draws <- nex$draws(format = "df") 
nex_draws$model <- "Stratified (NEX)"

ex_draws <- ex$draws(format = "df")
ex_draws$model <- "Exchangeable (EX)"

# Combine the draws together
df <- bind_rows(exnex_draws, nex_draws, ex_draws)

Code

# Extract and pivot the response probabilities
response_probabilities <-
df |>
  select(model, `p[1]`:`p[10]`) |>
  pivot_longer(cols = -model,
               names_pattern = "p\\[(.*)\\]",
               names_to = "Strata", 
               values_to = "Response Probability") |>
  mutate(Strata = Strata |> as.integer() |> as.factor())

Plotting

For each of the methods and each strata, we plot the posterior median along with the 50% and 95% credible intervals. Above each strata, we show the number of events out of the total, and the observed response probability.

Code

# Data frame with information for plotting the horizontal lines
# and the text at the top of the graph for each strata
line_data <-
  tibble(r = c(2,0,1,6,7,3,5,1,0,3),
         n = c(15, 13, 12, 28, 29, 29, 26, 5, 2, 20),
         Strata = as.factor(1:10),
         p = r/n)

# Create the labels (responses / total) for each strata
line_data$label <- map2_chr(line_data$r, line_data$n, ~glue::glue("{.x}/{.y}"))

response_probabilities |>
  ggplot() +
  aes(x = model, y = `Response Probability`) +
  stat_pointinterval(mapping = aes(color=model, shape=model),
                     point_interval = "median_hdci",
                     .width=c(0.5, 0.95)) +
  geom_hline(data=line_data,
             mapping = aes(yintercept = p),
             linetype=2,
             alpha=0.8) +
  geom_text(data=line_data |> mutate(model="EXNEX", `Response Probability` = 1),
            mapping = aes(label = label),
            nudge_x = 0.1) +
  geom_text(data=line_data |> mutate(model="EXNEX", `Response Probability` = 0.95),
            mapping = aes(label = round(p,2)),
            nudge_x = 0.1) +
  facet_wrap(~Strata, nrow = 1) +
  scale_y_continuous(breaks = seq(0, 1, by=0.2)) +
  coord_cartesian(ylim = c(0,1)) +
  labs(x = "",
       color = "") +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.line.x = element_blank(),
        axis.line.y = element_blank(),
        panel.background = element_rect(fill = "#f2f3f7"),
        text=element_text(family="Fira Sans")) +
  guides(shape = "none")

--- title: "`exnexstan`: Binary data and package overview" subtitle: "" author: - name: Maximilian Rohde date: "`r Sys.Date()`" cache: false fig-width: 8 fig-height: 6 fig-dpi: 500 knitr: opts_chunk: dev: "ragg_png" format: html: toc: true toc-depth: 4 toc-location: left standalone: true embed-resources: true code-tools: true code-fold: false code-link: true code-block-bg: "#f1f3f5" code-block-border-left: "#31BAE9" mainfont: Source Sans Pro theme: cosmo fontsize: 16px fig-format: retina fig-cap-location: bottom tbl-cap-location: top reference-location: margin citation-location: margin execute: warning: false message: false --- # Background The `exnexstan` package implements the EXNEX model for binary data introduced in "Robust exchangeability designs for early phase clinical trials with multiple strata" by Neuenschwander et al. (2015) (<https://onlinelibrary.wiley.com/doi/10.1002/pst.1730>) using Stan. The `cmdstanr` package is used to interface R with the Stan probabilistic programming language that fits the models using Markov Chain Monte Carlo (MCMC)^[Installing and compiling `cmdstan` can be done through `cmdstanr`. See the "Installing CmdStan" documentation here: <https://mc-stan.org/cmdstanr/articles/cmdstanr.html#installing-cmdstan-1>]. EXNEX models are an extension of Bayesian hierarchical models (BHMs). Bayesian hierarchical models are commonly used to analyze data from related studies, such as strata in a basket trial, since the partial pooling resulting from BHMs is often a good compromise between complete stratification and complete pooling. However, BHMs can perform poorly if some strata are not exchangeable with the other strata. EXNEX is a mixture model that allows for each strata the possibility of being exchangeable (with probability $p_j$) with the other strata, or nonexchangeable with the other strata (with probability (1 - $p_j$)). This increases the robustness of the model to certain strata being not exchangeable with the others. More than two exchangeability groups may be specified in the model, although they can be difficult to fit depending on the amount of data available. Currently, `exnexstan` only supports a single exchangeability group. We write out the model in mathematical notation below. For clarity, we use the names for the prior values as given in the code. \begin{align*} Z_{j} &\sim \operatorname{Bernoulli}(p_j) && \text{(Indicator variable of EX vs NEX)} \\ \theta_j &\sim \operatorname{Normal}(\text{mean} = \mu_{Z_{j}}, \text{sd} = \tau_{Z_{j}}) && \text{(Response probability on log-odds scale)} \\ \mu_0 &= \texttt{nex\_prior\_mean} && \text{(NEX mean)} \\ \tau_0 &= \texttt{nex\_prior\_sd} && \text{(NEX standard deviation)} \\ \mu_1 &\sim \operatorname{Normal}(\texttt{mu\_prior\_mean}, \texttt{mu\_prior\_sd}) && \text{(EX mean)} \\ \tau_1 &\sim \operatorname{Half-Normal}(\texttt{tau\_prior\_mean}, \texttt{tau\_prior\_sd}) && \text{(EX standard deviation)} \\ r_j &\sim \operatorname{Binomial}(n_j, \text{logit}^{-1}(\theta_j)) && \text{(Likelihood)} \end{align*} We can also represent the model graphically as shown below. ![Illustration of the binary EXNEX model showing the connections between the likelihood, priors, and variables.](./binary_diagram.png) To illustrate the `exnexstan` package, we use the data shown in Table 1 of [Neuenschwander et al. (2015)](https://onlinelibrary.wiley.com/doi/10.1002/pst.1730). The data come from a basket clinical trial conducted to assess the efficacy of the cancer therapy, Imatinib, in multiple disease strata, where the primary outcome variable is binary response to treatment. # A first example We begin by loading the `exnexstan` package and supporting packages for data visualization. ```{r} #| cache: false library(exnexstan) library(tidyverse) library(bayesplot) # MCMC Diagnostic plots library(tidybayes) # Bayesian geoms for ggplot library(patchwork) # Plotting tools library(gt) # HTML tables library(extrafont) # Allow custom fonts # Set global ggplot theme theme_set(cowplot::theme_cowplot(font_size=12, font_family = "Fira Sans")) # Set bayesplot color scheme bayesplot::color_scheme_set("viridis") ``` To fit EXNEX models, use the `fit_exnex()` function. We pass in as arguments the data: - `r`: the number of responses for each strata - `n`: the sample size for each strata as well as the parameters to specify the priors. The data `r` and `n` must be integers; the model will raise an error if not. For this model, we assume a 0.5 prior probability of exchangeability for each strata, and we use the prior parameters used in the original analysis by [Neuenschwander et al. (2015)](https://onlinelibrary.wiley.com/doi/10.1002/pst.1730). We set `adapt_delta = 0.99`, to improve the accuracy of the MCMC sampling in regions of high posterior curvature (see <https://mc-stan.org/rstanarm/reference/adapt_delta.html> for more details). ```{r} #| output: false # Fit the binary EXNEX model using the data from Neuenschwander et al. (2015) mod <- exnexstan::fit_exnex( r = c(2,0,1,6,7,3,5,1,0,3) |> as.integer(), n = c(15, 13, 12, 28, 29, 29, 26, 5, 2, 20) |> as.integer(), p_exch = rep(0.5, 10), mu_prior_mean = -1.73, mu_prior_sd = 2.616, tau_prior_sd = 1, nex_prior_mean = -1.73, nex_prior_sd = 2.801, adapt_delta = 0.99 ) # Save the posterior draws to a data.frame draws <- mod$draws(format = "df") ``` We fit the model and store the posterior draws as a data.frame. We can then take a look at the first 100 posterior draws from the `draws` data frame. ```{r} #| echo: false draws |> head(100) |> round(2) |> gt() |> opt_interactive() ``` The data frame contains posterior draws for: - `mu` and `tau`, the parameters specifying the distribution of the exchangeable strata - `theta[1]` to `theta[10]`, the response probability for each strata on the log-odds scale - `p[1]` to `p[10]`, the response probability for each strata - `p_mix[1]` to `p_mix[10]`, the posterior probability of being exchangeable (EX) for each strata ## MCMC diagnostics Before we interpret the results of the model, it is important to check the MCMC diagnostics. One of the advantages of using Stan and the Hamiltonian Monte Carlo sampler that it implements is that it provides automatic warnings and diagnostics when the sampler is not performing well, and the results may not be reliable. Because the functions in `exnexstan` return a `cmdstanr` model object, we can use all the methods already implemented in `cmdstanr`. See the `cmdstan` [documentation](https://mc-stan.org/cmdstanr/index.html) for more details. Note that `cmdstanr` uses the [R6](https://r6.r-lib.org/) object-oriented system in R, so the functions are called after the object using `$`. `cmdstan_diagnose()` summarizes the automatic checks that Stan runs when fitting the model. Everything looks good since we observe no divergences and E-BFMI /R-hat values are satisfactory. ```{r} mod$cmdstan_diagnose() ``` We should also visually observe the trace plot using the `bayesplot` package. We will check a few of the relevant parameters. ```{r} # MCMC Trace plot # `pars` select which variables to plot # `window` specifies that we look at a certain range of samples # `np` allows divergences to be plotted mcmc_trace( x = draws, facet_args = list(ncol = 1), pars = c("tau", "theta[1]", "p[1]"), window = c(2000, 3000), np = nuts_params(mod)) ``` More information on diagnosing MCMC sampling issues can be found here: <https://mc-stan.org/bayesplot/articles/visual-mcmc-diagnostics.html>. ## Summarizing results Now that we have checked the diagnostics, we can view the results of the model. ### Table summary The `summary_table()` function provides an HTML table of the posterior summaries for each strata. ```{r} # Create summary table of posterior draws exnexstan::summary_table(mod) ``` ### Graphics Using the `tidybayes` extension to the `ggplot` package, we can create informative visualizations of the posterior distribution for each strata. First, we examine the response probabilities for each strata. We show both a density plot of the posterior distributions (top) and the posterior intervals for each strata (bottom). ```{r} #| code-fold: true # Extract response probabilities from draws data.frame # and pivot into long format for plotting response_probabilities <- draws |> select(`p[1]`:`p[10]`) |> pivot_longer(cols = everything(), names_pattern = "p\\[(.*)\\]", names_to = "Strata", values_to = "Response Probability") |> mutate(Strata = Strata |> as.integer() |> as.factor()) ``` ```{r} #| code-fold: true # Density plots p1 <- response_probabilities |> ggplot() + aes(x = `Response Probability`, group = Strata) + geom_density(bounds = c(0, Inf)) + coord_cartesian(xlim=c(0,1)) + labs(y = "Posterior Density") # Credible intervals p2 <- response_probabilities |> ggplot() + aes(x = `Response Probability`, y = Strata) + stat_interval(.width = c(0.5, 0.8, 0.95, 0.99), linewidth=1.5) + coord_cartesian(xlim=c(0,1)) + labs(color = "Credible Interval") # Combine plots with patchwork p1 / p2 ``` We can also look at the posterior probability of being exchangeable, `p_mix[j]`, for each strata. ```{r} #| code-fold: true # Extract probability of exchangeability for each strata exch_probabilities <- draws |> select(`p_mix[1]`:`p_mix[10]`) |> pivot_longer(cols = everything(), names_pattern = "p_mix\\[(.*)\\]", names_to = "Strata", values_to = "Exchangeability Probability") |> mutate(Strata = Strata |> as.integer() |> as.factor()) ``` ```{r} #| code-fold: true # Density plots p1 <- exch_probabilities |> ggplot() + aes(x = `Exchangeability Probability`, group = Strata) + geom_density(bounds = c(0, Inf)) + coord_cartesian(xlim=c(0,1)) + labs(y = "Posterior Density") # Credible intervals p2 <- exch_probabilities |> ggplot() + aes(x = `Exchangeability Probability`, y = Strata) + stat_interval(.width = c(0.5, 0.8, 0.95, 0.99), linewidth=1.5) + coord_cartesian(xlim=c(0,1)) + labs(color = "Credible Interval") # Combine the plots with patchwork p1 / p2 ``` # Comparing different exchangeability assumptions In this section, we fit EX, NEX, and EXNEX models and compare the results. When fitting the EX model, there can be some MCMC convergence issues (e.g., divergences) because of the extreme posterior curvature that occurs when $\tau$ is close to zero. A common way to solve this issue is to use a non-centered parameterization of the model (see <https://mc-stan.org/docs/stan-users-guide/reparameterization.html>). The `exnexstan::fit_exch()` function implements a non-centered parameterization and can greatly improve the MCMC fitting of EX models. The model structure is equivalent to using `exnexstan::fit_exnex()` with `p_exch` equal to 1 for all strata -- only the parameterization is different for the purpose of MCMC sampling. ## Model fitting ```{r} #| output: false # Data is the same for all the models r = c(2,0,1,6,7,3,5,1,0,3) |> as.integer() n = c(15, 13, 12, 28, 29, 29, 26, 5, 2, 20) |> as.integer() # EXNEX (p_exch = 0.5 for all strata) exnex <- exnexstan::fit_exnex( r = r, n = n, p_exch = rep(0.5, 10), mu_prior_mean = -1.73, mu_prior_sd = 2.616, tau_prior_sd = 1, nex_prior_mean = -1.73, nex_prior_sd = 2.801, adapt_delta = 0.99 ) # NEX (p_exch = 0 for all strata) nex <- exnexstan::fit_exnex( r = r, n = n, p_exch = rep(0, 10), mu_prior_mean = -1.73, mu_prior_sd = 2.616, tau_prior_sd = 1, nex_prior_mean = -1.73, nex_prior_sd = 2.801, adapt_delta = 0.99 ) # EX (p_exch = 1 for all strata) ex <- exnexstan::fit_exch( r = r, n = n, mu_prior_mean = -1.73, mu_prior_sd = 2.616, tau_prior_sd = 1, adapt_delta = 0.99 ) ``` ```{r} # Extract the posterior draws from each model and add a column # to identify the model exnex_draws <- exnex$draws(format = "df") exnex_draws$model <- "EXNEX" nex_draws <- nex$draws(format = "df") nex_draws$model <- "Stratified (NEX)" ex_draws <- ex$draws(format = "df") ex_draws$model <- "Exchangeable (EX)" # Combine the draws together df <- bind_rows(exnex_draws, nex_draws, ex_draws) ``` ```{r} #| code-fold: true # Extract and pivot the response probabilities response_probabilities <- df |> select(model, `p[1]`:`p[10]`) |> pivot_longer(cols = -model, names_pattern = "p\\[(.*)\\]", names_to = "Strata", values_to = "Response Probability") |> mutate(Strata = Strata |> as.integer() |> as.factor()) ``` ## Plotting For each of the methods and each strata, we plot the posterior median along with the 50% and 95% credible intervals. Above each strata, we show the number of events out of the total, and the observed response probability. ```{r} #| code-fold: true # Data frame with information for plotting the horizontal lines # and the text at the top of the graph for each strata line_data <- tibble(r = c(2,0,1,6,7,3,5,1,0,3), n = c(15, 13, 12, 28, 29, 29, 26, 5, 2, 20), Strata = as.factor(1:10), p = r/n) # Create the labels (responses / total) for each strata line_data$label <- map2_chr(line_data$r, line_data$n, ~glue::glue("{.x}/{.y}")) response_probabilities |> ggplot() + aes(x = model, y = `Response Probability`) + stat_pointinterval(mapping = aes(color=model, shape=model), point_interval = "median_hdci", .width=c(0.5, 0.95)) + geom_hline(data=line_data, mapping = aes(yintercept = p), linetype=2, alpha=0.8) + geom_text(data=line_data |> mutate(model="EXNEX", `Response Probability` = 1), mapping = aes(label = label), nudge_x = 0.1) + geom_text(data=line_data |> mutate(model="EXNEX", `Response Probability` = 0.95), mapping = aes(label = round(p,2)), nudge_x = 0.1) + facet_wrap(~Strata, nrow = 1) + scale_y_continuous(breaks = seq(0, 1, by=0.2)) + coord_cartesian(ylim = c(0,1)) + labs(x = "", color = "") + theme(axis.title.x=element_blank(), axis.text.x=element_blank(), axis.ticks.x=element_blank(), axis.line.x = element_blank(), axis.line.y = element_blank(), panel.background = element_rect(fill = "#f2f3f7"), text=element_text(family="Fira Sans")) + guides(shape = "none") ```