Overview
SeverityEstimate fits a Bayesian model to estimate the
infection fatality ratio (IFR) and symptomatic infection rate (SIR) from
case line list data, accounting for the fact that active and passive
surveillance systems detect cases at different rates and with different
symptom profiles. The model is based on Lessler et al. (2016) doi:10.1093/aje/kwv452.
The lightest workflow is to format your data with time,
detection, and outcome columns, plus any
strata columns you want to model, then call default_model()
followed by fit(). For more customized models, you can
still build a SeverityEstimateModel manually and configure
it with the individual set_*() helpers.
old <- options(mc.cores = 1L)
on.exit(options(old))
library(SeverityEstimate)
#> Loading required package: checkmate
#> Loading required package: rstan
#> Loading required package: StanHeaders
#>
#> rstan version 2.32.7 (Stan version 2.32.2)
#> For execution on a local, multicore CPU with excess RAM we recommend calling
#> options(mc.cores = parallel::detectCores()).
#> To avoid recompilation of unchanged Stan programs, we recommend calling
#> rstan_options(auto_write = TRUE)
#> For within-chain threading using `reduce_sum()` or `map_rect()` Stan functions,
#> change `threads_per_chain` option:
#> rstan_options(threads_per_chain = 1)Sample data
We start by generating a synthetic line list with
create_sample_linelist(). This requires a
strata table describing each subgroup’s true underlying
parameters - the infection fatality ratio (ifr), the
symptomatic infection rate (sir), and the population size —
along with a sequence of observation times and detection probabilities
for active and passive surveillance.
Here we define three age groups observed weekly over a 13-week period, with a constant force of infection of 0.005 across all strata and time points.
strata <- do.call(rbind, lapply(list(
list(age = "youth", ifr = 0.25, sir = 0.50, population = 10000L),
list(age = "adult", ifr = 0.30, sir = 0.65, population = 20000L),
list(age = "senior", ifr = 0.35, sir = 0.80, population = 5000L)
), as.data.frame))
times <- seq(as.Date("2024-01-01"), as.Date("2024-3-31"), "+7 days")
force_of_infection <- matrix(
data = 0.005, nrow = length(times), ncol = nrow(strata)
)
linelist <- create_sample_linelist(
strata,
times,
0.15, # active detection probability
0.05, # passive detection probability for asymptomatic cases
0.95, # passive detection probability for symptomatic cases
force_of_infection = force_of_infection,
seed = 123L
)
head(linelist, n = 20L)
#> patient time age detection outcome
#> 1 UID1 2024-01-01 youth Active Asymptomatic
#> 2 UID2 2024-01-01 youth Active Asymptomatic
#> 3 UID3 2024-01-01 youth Active Asymptomatic
#> 4 UID4 2024-01-01 youth Active Asymptomatic
#> 5 UID5 2024-01-01 youth Active Symptomatic
#> 6 UID6 2024-01-01 youth Active Symptomatic
#> 7 UID7 2024-01-08 youth Active Asymptomatic
#> 8 UID8 2024-01-08 youth Active Asymptomatic
#> 9 UID9 2024-01-08 youth Active Asymptomatic
#> 10 UID10 2024-01-08 youth Active Asymptomatic
#> 11 UID11 2024-01-08 youth Active Asymptomatic
#> 12 UID12 2024-01-08 youth Active Asymptomatic
#> 13 UID13 2024-01-08 youth Active Symptomatic
#> 14 UID14 2024-01-08 youth Active Symptomatic
#> 15 UID15 2024-01-08 youth Active Symptomatic
#> 16 UID16 2024-01-08 youth Active Death
#> 17 UID17 2024-01-08 youth Active Death
#> 18 UID18 2024-01-08 youth Active Death
#> 19 UID19 2024-01-15 youth Active Asymptomatic
#> 20 UID20 2024-01-15 youth Active AsymptomaticThe resulting line list has one row per observed case, with columns
for the patient identifier, time step, age stratum, detection type
(Active or Passive), and outcome
(Asymptomatic, Symptomatic, or
Death).
summary(linelist)
#> patient time age detection
#> Length:1502 Min. :2024-01-01 Length:1502 Length:1502
#> Class :character 1st Qu.:2024-01-22 Class :character Class :character
#> Mode :character Median :2024-02-12 Mode :character Mode :character
#> Mean :2024-02-12
#> 3rd Qu.:2024-03-04
#> Max. :2024-03-25
#> outcome
#> Length:1502
#> Class :character
#> Mode :character
#>
#>
#> Building and fitting the model
default_model() is the shortest path from formatted data
to a fitted model. It expects the line list to contain
time, detection, and outcome, and
it treats every other line-list column as a strata column with
degrees_of_freedom = 0L. Because the sample line list also
contains a patient identifier, we first drop that column before
constructing the model.
default_model() also sets the weakly informative
detection priors used throughout this package:
- Active detection:
Beta(1, 1) - Passive asymptomatic detection:
Beta(1, 3) - Passive symptomatic detection:
Beta(3, 1)
Once constructed, fit() samples from the package’s
precompiled Stan model, passing any additional arguments
(e.g. chains, iter) through to
rstan::sampling(). The settings below are intentionally
small to keep this vignette reasonably quick to render. They are not
sufficient to assess convergence or support serious inference. In
practice you’d want to use more chains and iterations and review the
usual Stan diagnostics.
linelist_model_data <- linelist[, c("time", "age", "detection", "outcome")]
population <- strata[, c("age", "population")]
model <- default_model(linelist_model_data, population)
model
#> Severity Estimate Model:
#>
#> Data:
#> dataset rows columns
#> line_list 1502 4
#> population 3 2
#>
#> Detection Probability Priors:
#> active prior: beta(1.0, 1.0)
#> passive_asymptomatic prior: beta(1.0, 3.0)
#> passive_symptomatic prior: beta(3.0, 1.0)
#>
#> Timesteps:
#> time: 2024-01-01 to 2024-03-25 (13 timesteps)
#>
#> Detection:
#> column: detection
#> active: 350 cases (values: Active)
#> passive: 1152 cases (values: Passive)
#>
#> Outcome:
#> column: outcome
#> asymptomatic: 154 cases (values: Asymptomatic)
#> symptomatic: 790 cases (values: Symptomatic)
#> severe: 558 cases (values: Death)
#>
#> Strata:
#> age: 3 levels, df = 0 (adult, senior, youth)
estimate <- fit(
model,
chains = 1L,
iter = 250L,
seed = 123L,
refresh = 0
)
#> Warning: The largest R-hat is NA, indicating chains have not mixed.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#r-hat
#> Warning: Bulk Effective Samples Size (ESS) is too low, indicating posterior means and medians may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#bulk-ess
#> Warning: Tail Effective Samples Size (ESS) is too low, indicating posterior variances and tail quantiles may be unreliable.
#> Running the chains for more iterations may help. See
#> https://mc-stan.org/misc/warnings.html#tail-ess
summary(estimate)
#> Detection Rates:
#> Estimate
#> passive_asymptomatic 0.05704
#> passive_symptomatic 0.95175
#> active 0.15974
#>
#> Severity Estimates:
#> age IFR Estimate SIR Estimate
#> adult 0.3097 0.6554
#> senior 0.3005 0.7675
#> youth 0.2400 0.5198If you need more control, you can still start from
SeverityEstimateModel() and use set_strata(),
set_timesteps(), set_detection(),
set_outcome(), and the prior setters directly. That is
useful, for example, when you want a smoothed ordered strata effect with
degrees_of_freedom > 0L rather than the default
unsmoothed categorical effect.
Extracting results
Detection rate parameters
calculate_parameter_estimates() returns a summary of the
three detection rate parameters fitted by the model: the active
detection rate, the passive asymptomatic detection rate, and the passive
symptomatic detection rate. The alpha argument controls
which credible interval bounds are returned.
calculate_parameter_estimates(estimate, alpha = 0.05)
#> parameter parameter_description
#> 1 active_detection active detection rate
#> 2 passive_asymptomatic_detection mildly/asymptomatic passive detection rate
#> 3 passive_symptomatic_detection severe symptoms passive detection rate
#> mean_estimate median_estimate lower_05 upper_05
#> 1 0.15974402 0.15968138 0.1385260 0.18011611
#> 2 0.05703974 0.05604461 0.0369103 0.08813589
#> 3 0.95174877 0.96341735 0.8526623 0.99385044Fatality ratios
calculate_fatality_ratio() returns IFR and SIR estimates
broken down by stratum. Setting naive_estimate = TRUE also
includes a naive observed ratio computed directly from the raw counts,
which provides a useful point of comparison against the model-adjusted
estimates.
calculate_fatality_ratio(
estimate, median_estimate = FALSE, naive_estimate = TRUE
)
#> age ifr_mean_estimate ifr_lower_05 ifr_upper_05 sir_mean_estimate
#> 1 adult 0.3096621 0.2777563 0.3424366 0.6553673
#> 2 senior 0.3005075 0.2438216 0.3661525 0.7675303
#> 3 youth 0.2400402 0.1994504 0.2927146 0.5197518
#> sir_lower_05 sir_upper_05 naive_ifr naive_sir
#> 1 0.6052815 0.7026613 0.3890135 0.9047085
#> 2 0.6653631 0.8383285 0.3495935 0.9471545
#> 3 0.4442299 0.5791658 0.3434066 0.8461538