Getting Started¶

This tutorial will guide you through the basic steps to get started with the vaxflux package, which is designed for modeling vaccine uptake prevalence using various curve families. This particular tutorial introduces the core concepts of the package, generates a synthetic dataset, how to fit a model to this data, and steps for working with a fitted model.

Selecting A Curve Family¶

At the core of vaxflux is the concept of a curve family. A curve family is a set of curves that share a common structure and are used for modeling the uptake prevalence of a vaccine. Advanced users can also create their own curve families to model different types of vaccine uptake patterns.

from vaxflux import LogisticCurve

logistic_curve = LogisticCurve()

The logistic curve is a common choice for modeling vaccine uptake because it captures the typical S-shaped curve seen in many vaccination campaigns, where uptake starts slowly, accelerates, and then levels off as the population becomes saturated with the vaccine. However, vaxflux makes some modifications to the typical logistic curve to better suit the requirements of vaccine uptake modeling. For details on how the logistic curve is defined in vaxflux, refer to the vaxflux.LogisticCurve class documentation.

Season And Date Ranges¶

In vaxflux, a season is defined as a period of time during which vaccine uptake is measured. The package allows you to specify the start and end dates of a season, which is crucial for modeling vaccine uptake patterns over time. These are specified to the model using the vaxflux.SeasonRange class. Within each season, you can define a date range that represents the period during which vaccine uptake is observed, which is done using the vaxflux.DateRange class. For this example we will create a season that starts the 1st Monday of October and ends the 1st Sunday of the following February for the 2022/23, 2023/24, and 2024/25 seasons.

Then we'll define date ranges for each of these seasons that span a week using the vaxflux.daily_date_ranges function. The daily_date_ranges function generates a list of date ranges for each season, where each date range represents a period of time during which vaccine uptake is observed. The range_days argument specifies the number of days in each date range where ranges are defined start date inclusive end date exclusive.

from vaxflux import DateRange, SeasonRange, daily_date_ranges

seasons = [
    SeasonRange(season="2022/23", start_date="2022-10-03", end_date="2023-02-05"),
    SeasonRange(season="2023/24", start_date="2023-10-02", end_date="2024-02-04"),
    SeasonRange(season="2024/25", start_date="2024-10-07", end_date="2025-02-02"),
]
dates = daily_date_ranges(seasons, range_days=6)

Defining Covariates¶

In vaxflux, covariates are additional variables that can influence vaccine uptake patterns. These can include demographic information, geographic data, or any other relevant factors. Covariates and their categories are defined using the vaxflux.CovariateCategories class. In this example, we will create a single "age" covariate with three categories, "youth", "adult", and "elderly", which loosely correspond to 0-17 yrs, 18-65 yrs, and 65+ yrs, respectively. This covariate will be used to model how vaccine uptake varies across different age groups.

from vaxflux import CovariateCategories

age_covariate = CovariateCategories(
    covariate="age",
    categories=["youth", "adult", "elderly"],
)

Create A Sample Dataset¶

As a first step in using the package you can create a sample dataset with vaxflux.data.sample_dataset. This will create a dataset with the same data generating process that the model assumes in the format needed for vaxflux. The function requires a curve family, a list of seasons and dates, a list of covariates, a list of parameters for the curve family for each covariate category, an observational noise level, and a random seed for reproducibility. Many of these arguments have already been defined in the previous sections.

The parameters are defined as a list of tuples, where each tuple contains the curve family parameter name, season, covariate category/categories, and the value for that parameter. The parameters for a logistic curve can be found in the vaxflux.LogisticCurve documentation, but in brief they are:

\(m\): The max uptake prevalence, which is the maximum value of the logistic curve.
\(r\): The rate of change of the uptake prevalence, which determines how quickly the curve rises.
\(s\): The switch point of the curve, which is the inflection point of the prevalence curve.

The epsilon argument defines the observational noise level. By default, observations are drawn with gamma noise; you can switch to a normal noise model by passing noise="normal" to sample_dataset.

from vaxflux.data import sample_dataset

parameters = [
    ("m", "2022/23", "youth",   -0.5),
    ("m", "2022/23", "adult",    0.5),
    ("m", "2022/23", "elderly",  1.2),
    ("r", "2022/23", "youth",   -3.2),
    ("r", "2022/23", "adult",   -3.2),
    ("r", "2022/23", "elderly", -3.2),
    ("s", "2022/23", "youth",   40.0),
    ("s", "2022/23", "adult",   40.0),
    ("s", "2022/23", "elderly", 40.0),
    ("m", "2023/24", "youth",   -0.525),
    ("m", "2023/24", "adult",    0.53),
    ("m", "2023/24", "elderly",  1.235),
    ("r", "2023/24", "youth",   -3.1),
    ("r", "2023/24", "adult",   -3.1),
    ("r", "2023/24", "elderly", -3.1),
    ("s", "2023/24", "youth",   42.0),
    ("s", "2023/24", "adult",   42.0),
    ("s", "2023/24", "elderly", 42.0),
    ("m", "2024/25", "youth",   -0.51),
    ("m", "2024/25", "adult",    0.52),
    ("m", "2024/25", "elderly",  1.22),
    ("r", "2024/25", "youth",   -3.0),
    ("r", "2024/25", "adult",   -3.0),
    ("r", "2024/25", "elderly", -3.0),
    ("s", "2024/25", "youth",   44.0),
    ("s", "2024/25", "adult",   44.0),
    ("s", "2024/25", "elderly", 44.0),
]
sample_observations = sample_dataset(
    logistic_curve,
    seasons,
    dates,
    [age_covariate],
    parameters,
    0.0005,
    noise="gamma",
    random_seed=42,
)

Defining A Model¶

Now that many of the building blocks of the model have been defined we can create a model represented by the vaxflux.VaxfluxModel class. This class encapsulates the entire modeling process, including the curve family, seasons, dates, covariates, and the sample dataset. Many of the arguments to this class have already been defined in the previous sections, but we will also need to define the prior distributions for the covariates that will be used in the model.

In this example we will use partially pooled Gaussian covariates for seasonal effects and an age effect on the max uptake parameter. The vaxflux.PartiallyPooledGaussianCovariate class provides a convenient way to express these priors in the NumPyro-based model. Note that the priors are loosely centered around the values used to generate the sample dataset, but they are not exact. This is because the model will learn the parameters from the data, and the priors are used to inform the model about reasonable ranges for these parameters. For more information on model details and how to inform prior distributions please refer to the model-details section of the documentation.

from vaxflux import PartiallyPooledGaussianCovariate, VaxfluxModel

covariates = [
    PartiallyPooledGaussianCovariate(
        parameter="m",
        covariate=None,
        mu=(0.0, 0.75),
        sigma=0.75,
    ),
    PartiallyPooledGaussianCovariate(
        parameter="r",
        covariate=None,
        mu=(-2.5, 1.5),
        sigma=1.5,
    ),
    PartiallyPooledGaussianCovariate(
        parameter="s",
        covariate=None,
        mu=(35.0, 20.0),
        sigma=20.0,
    ),
    PartiallyPooledGaussianCovariate(
        parameter="m",
        covariate="age",
        mu=(1.0, 1.0),
        sigma=1.0,
    ),
]
model = (
    VaxfluxModel(curve=logistic_curve)
    .add_seasons(seasons)
    .add_dates(dates)
    .add_covariate_categories(age_covariate)
    .add_covariates(covariates)
    .add_observations(sample_observations)
    .add_observation_process(
        kind="normal",
        noise=0.005,
        partially_pool_by_season=False,
        prevalence_penalty=10.0,
    )
)

Fitting The Model¶

Finally, we can fit the model to the sample dataset using the vaxflux.VaxfluxModel.sample method. This method handles prior predictive sampling, MCMC fitting, and posterior predictive sampling in one call. In this example, we will run a small prior predictive sample and then fit with 2,000 warmup steps, 1,000 samples, and 2 chains. The return value is a VaxfluxInferenceData object compatible with ArviZ.

If you are following along with this tutorial with a REPL or notebook please note that the following code will take anywhere from a few seconds to a few minutes to run. Adjust the arguments to sample appropriately.

vaxflux_idata = model.sample(
    prior_samples=500,
    warmup=2_000,
    samples=1_000,
    chains=2,
)
vaxflux_idata

Conclusion¶

In this tutorial, we have covered the basic steps to get started with the vaxflux package and the key building blocks that are used to create a model. Like this tutorial, when using vaxflux the loose steps to construction a model are:

Select a curve family that represents the vaccine uptake pattern you want to model.
Define the seasons and date ranges for the model.
Define the covariates and their categories that will be used in the model.
Either loading or creating a dataset that contains the vaccine uptake data.
Define the model using the VaxfluxModel class, including the curve family, seasons, dates, covariates, and observations.
Fit the model to the data using the sample method.