Title: | Stan Model Adjustments and Sensitivity Analyses using Importance Sampling |
---|---|
Description: | Functions to help assess the sensitivity of a Bayesian model (fitted using the rstan pakcage) to the specification of its likelihood and priors. Users provide a series of alternate sampling specifications, and the package uses Pareto-smoothed importance sampling to estimate posterior quantities of interest under each specification. |
Authors: | Cory McCartan [aut, cre] |
Maintainer: | Cory McCartan <[email protected]> |
License: | BSD_3_clause + file LICENSE |
Version: | 0.1.2 |
Built: | 2024-11-01 11:26:19 UTC |
Source: | https://github.com/CoryMcCartan/adjustr |
Given a set of new sampling statements, which can be parametrized by a data frame or list, compute Pareto-smoothed importance weights and attach them to the specification object, for further calculation and plotting.
adjust_weights(spec, object, data = NULL, keep_bad = FALSE, incl_orig = TRUE)
adjust_weights(spec, object, data = NULL, keep_bad = FALSE, incl_orig = TRUE)
spec |
An object of class |
object |
A model object, either of type |
data |
The data that was used to fit the model in |
keep_bad |
When |
incl_orig |
When |
This function does the bulk of the sensitivity analysis work. It operates by parsing the model code from the provided Stan object, extracting the parameters and their sampling statements. It then uses R metaprogramming/tidy evaluation tools to flexibly evaluate the log density for each draw and each sampling statement, under the original and alternative specifications. From these, the function computes the overall importance weight for each draw and performs Pareto-smoothed importance sampling. All of the work is performed in R, without recompiling or refitting the Stan model.
A tibble, produced by converting the provided specs
to a
tibble (see as.data.frame.adjustr_spec
), and adding columns
.weights
, containing vectors of weights for each draw, and
.pareto_k
, containing the diagnostic Pareto shape parameters. Values
greater than 0.7 indicate that importance sampling is not reliable.
If incl_orig
is TRUE
, a row is added for the original model
specification. Weights can be extracted with the
pull.adjustr_weighted
method. The returned object also
includes the model sample draws, in the draws
attribute.
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., & Gabry, J. (2015). Pareto smoothed importance sampling. arXiv preprint arXiv:1507.02646.
make_spec
, summarize.adjustr_weighted
, spec_plot
## Not run: model_data = list( J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18) ) spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) adjust_weights(spec, eightschools_m) adjust_weights(spec, eightschools_m, keep_bad=TRUE) spec = make_spec(y ~ student_t(df, theta, sigma), df=1:10) adjust_weights(spec, eightschools_m, data=model_data) # will throw an error because `y` and `sigma` aren't provided adjust_weights(spec, eightschools_m) ## End(Not run)
## Not run: model_data = list( J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18) ) spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) adjust_weights(spec, eightschools_m) adjust_weights(spec, eightschools_m, keep_bad=TRUE) spec = make_spec(y ~ student_t(df, theta, sigma), df=1:10) adjust_weights(spec, eightschools_m, data=model_data) # will throw an error because `y` and `sigma` aren't provided adjust_weights(spec, eightschools_m) ## End(Not run)
Functions to help assess the sensitivity of a Bayesian model to the specification of its likelihood and priors, estimated using the rstan package. Users provide a series of alternate sampling specifications, and the package uses Pareto-smoothed importance sampling to estimate posterior quantities of interest under each specification.
See the list of key functions and the example below. Full package documentation available at https://corymccartan.github.io/adjustr/.
adjustr_spec
Object Into a Data FrameReturns the data frame of specification parameters, with added columns of
the form .samp_1
, .samp_2
, ... for each sampling statement
(or just .samp
if there is only one sampling statement).
## S3 method for class 'adjustr_spec' as.data.frame(x, ...)
## S3 method for class 'adjustr_spec' as.data.frame(x, ...)
x |
the |
... |
additional arguments to underlying method |
dplyr
Methods for adjustr_spec
ObjectsCore dplyr
verbs which don't involve grouping
(filter
, arrange
,
mutate
, select
,
rename
, and slice
) are
implemented and operate on the underlying table of specification parameters.
## S3 method for class 'adjustr_spec' filter(.data, ..., .preserve = FALSE) ## S3 method for class 'adjustr_spec' arrange(.data, ...) ## S3 method for class 'adjustr_spec' rename(.data, ...) ## S3 method for class 'adjustr_spec' select(.data, ...) ## S3 method for class 'adjustr_spec' slice(.data, ..., .preserve = FALSE)
## S3 method for class 'adjustr_spec' filter(.data, ..., .preserve = FALSE) ## S3 method for class 'adjustr_spec' arrange(.data, ...) ## S3 method for class 'adjustr_spec' rename(.data, ...) ## S3 method for class 'adjustr_spec' select(.data, ...) ## S3 method for class 'adjustr_spec' slice(.data, ..., .preserve = FALSE)
.data |
the |
... |
additional arguments to underlying method |
.preserve |
as in |
## Not run: spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) arrange(spec, desc(df)) slice(spec, 4:7) filter(spec, df == 2) ## End(Not run)
## Not run: spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) arrange(spec, desc(df)) slice(spec, 4:7) filter(spec, df == 2) ## End(Not run)
Prints a list of sampling statements extracted from the model
block of
a Stan program, with each labelled "parameter" or "data" depending on the
type of variable being sampled.
extract_samp_stmts(object)
extract_samp_stmts(object)
object |
A |
Invisbly returns a list of sampling formulas.
## Not run: extract_samp_stmts(eightschools_m) #> Sampling statements for model 2c8d1d8a30137533422c438f23b83428: #> parameter eta ~ std_normal() #> data y ~ normal(theta, sigma) ## End(Not run)
## Not run: extract_samp_stmts(eightschools_m) #> Sampling statements for model 2c8d1d8a30137533422c438f23b83428: #> parameter eta ~ std_normal() #> data y ~ normal(theta, sigma) ## End(Not run)
Takes a vector of weights, or data frame or list containing sets of weights, and resamples indices for use in later computation.
get_resampling_idxs(x, frac = 1, replace = T)
get_resampling_idxs(x, frac = 1, replace = T)
x |
A vector of weights, a list of weight vectors, or a data frame of
type |
frac |
A real number giving the fraction of draws to resample; the
default, 1, resamples all draws. Smaller values should be used when
|
replace |
Whether sampling should be with replacement. When weights
are extreme it may make sense to use |
A vector, list, or data frame, depending of the type of x
,
containing the sampled indices. If any weights are NA
, the indices
will also be NA
.
## Not run: spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) adjusted = adjust_weights(spec, eightschools_m) get_resampling_idxs(adjusted) get_resampling_idxs(adjusted, frac=0.1, replace=FALSE) ## End(Not run)
## Not run: spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) adjusted = adjust_weights(spec, eightschools_m) get_resampling_idxs(adjusted) get_resampling_idxs(adjusted, frac=0.1, replace=FALSE) ## End(Not run)
Takes a set of new sampling statements, which can be parametrized by other
arguments, data frames, or lists, and creates an adjustr_spec
object
suitable for use in adjust_weights
.
make_spec(...)
make_spec(...)
... |
Model specification. Each argument can either be a formula, a named vector, data frames, or lists. Formula arguments provide new sampling statements to replace their
counterparts in the original Stan model. All such formulas must be of the
form For named vector arguments, each entry of the vector will be substituted into the corresponding parameter in the sampling statements. For data frames, each entry in each column will be substituted into the corresponding parameter in the sampling statements. List arguments are coerced to data frames. They can either be lists of named vectors, or lists of lists of single-element named vectors. The lengths of all parameter arguments must be consistent. Named vectors can have length 1 or must have length equal to the number of rows in all data frame arguments and the length of list arguments. |
An object of class adjustr_spec
, which is essentially a list
with two elements: samp
, which is a list of sampling formulas, and
params
, which is a list of lists of parameters. Core
dplyr verbs which don't involve grouping
(filter
, arrange
,
mutate
, select
,
rename
, and slice
) are
supported and operate on the underlying table of specification parameters.
adjust_weights
, summarize.adjustr_weighted
, spec_plot
make_spec(eta ~ cauchy(0, 1)) make_spec(eta ~ student_t(df, 0, 1), df=1:10) params = tidyr::crossing(df=1:10, infl=c(1, 1.5, 2)) make_spec(eta ~ student_t(df, 0, 1), y ~ normal(theta, infl*sigma), params)
make_spec(eta ~ cauchy(0, 1)) make_spec(eta ~ student_t(df, 0, 1), df=1:10) params = tidyr::crossing(df=1:10, infl=c(1, 1.5, 2)) make_spec(eta ~ student_t(df, 0, 1), y ~ normal(theta, infl*sigma), params)
adjustr_weighted
ObjectThis function modifies the default behavior of dplyr::pull
to extract
the .weights
column.
## S3 method for class 'adjustr_weighted' pull(.data, var = ".weights", name = NULL, ...)
## S3 method for class 'adjustr_weighted' pull(.data, var = ".weights", name = NULL, ...)
.data |
A table of data |
var |
A variable, as in |
name |
Ignored |
... |
Ignored |
Uses weights computed in adjust_weights
to plot posterior
quantities of interest versus specification parameters
spec_plot( x, by, post, only_mean = FALSE, ci_level = 0.8, outer_level = 0.95, ... )
spec_plot( x, by, post, only_mean = FALSE, ci_level = 0.8, outer_level = 0.95, ... )
x |
An |
by |
The x-axis variable, which is usually one of the specification
parameters. Can be set to |
post |
The posterior quantity of interest, to be computed for each
resampled draw of each specificaiton. Should evaluate to a single number
for each draw. Automatically quoted and evaluated in the context of |
only_mean |
Whether to only plot the posterior mean. May be more stable. |
ci_level |
The inner credible interval to plot. Central 100*ci_level posterior draws. |
outer_level |
The outer credible interval to plot. |
... |
Ignored. |
A ggplot
object which can be further
customized with the ggplot2 package.
adjust_weights
, summarize.adjustr_weighted
## Not run: spec = make_spec(eta ~ student_t(df, 0, scale), df=1:10, scale=seq(2, 1, -1/9)) adjusted = adjust_weights(spec, eightschools_m) spec_plot(adjusted, df, theta[1]) spec_plot(adjusted, df, mu, only_mean=TRUE) spec_plot(adjusted, scale, tau) ## End(Not run)
## Not run: spec = make_spec(eta ~ student_t(df, 0, scale), df=1:10, scale=seq(2, 1, -1/9)) adjusted = adjust_weights(spec, eightschools_m) spec_plot(adjusted, df, theta[1]) spec_plot(adjusted, df, mu, only_mean=TRUE) spec_plot(adjusted, scale, tau) ## End(Not run)
Uses weights computed in adjust_weights
to compute posterior
summary statistics. These statistics can be compared against their reference
values to quantify the sensitivity of the model to aspects of its
specification.
## S3 method for class 'adjustr_weighted' summarise(.data, ..., .resampling = FALSE, .model_data = NULL) ## S3 method for class 'adjustr_weighted' summarize(.data, ..., .resampling = FALSE, .model_data = NULL)
## S3 method for class 'adjustr_weighted' summarise(.data, ..., .resampling = FALSE, .model_data = NULL) ## S3 method for class 'adjustr_weighted' summarize(.data, ..., .resampling = FALSE, .model_data = NULL)
.data |
An |
... |
Name-value pairs of expressions. The name of each argument will be
the name of a new variable, and the value will be computed for the
posterior distribution of eight alternative specification. For example,
a value of Also supported is the custom function The arguments in |
.resampling |
Whether to compute summary statistics by first resampling
the data according to the weights. Defaults to |
.model_data |
Stan model data, if not provided in the earlier call to
|
An adjustr_weighted
object, wth the new columns specified in
...
added.
## Not run: model_data = list( J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18) ) spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) adjusted = adjust_weights(spec, eightschools_m) summarize(adjusted, mean(mu), var(mu)) summarize(adjusted, wasserstein(mu, p=2)) summarize(adjusted, diff_1 = mean(y[1] - theta[1]), .model_data=model_data) summarize(adjusted, quantile(tau, probs=c(0.05, 0.5, 0.95))) ## End(Not run)
## Not run: model_data = list( J = 8, y = c(28, 8, -3, 7, -1, 1, 18, 12), sigma = c(15, 10, 16, 11, 9, 11, 10, 18) ) spec = make_spec(eta ~ student_t(df, 0, 1), df=1:10) adjusted = adjust_weights(spec, eightschools_m) summarize(adjusted, mean(mu), var(mu)) summarize(adjusted, wasserstein(mu, p=2)) summarize(adjusted, diff_1 = mean(y[1] - theta[1]), .model_data=model_data) summarize(adjusted, quantile(tau, probs=c(0.05, 0.5, 0.95))) ## End(Not run)