2b: Introduction to panel data versioning and nowcasting • AMPHForecastSuite

Introduction

This vignette walks through a minimal baseline nowcasting workflow using the baselinenowcast and epinowcast packages. We focus on weekly confirmed hospital admissions attributed to influenza in Maryland and illustrate how to: - retrieve revision-aware surveillance data, - build a reporting triangle, - estimate a reporting delay distribution, and - generate a deterministic point nowcast that corrects for under-reporting.

We repeat these steps with data from the baselinenowcast documentation.

Setup

We assume you have followed the first guide of this course, and have an API key ready for the DELPHI Epidata API.

epidatr::get_api_key()

We will nowcast the confirmed hospital admission for influenza in Maryland. We specify the date at which we are nowcasting, and the date with the final data for evaluation.

state_name <- "Maryland"
geo_values <- "md"
forecast_disease <- "influenza"
nowcast_date = "2025-02-10"
eval_date = "2025-03-21"

Load the required packages.

Retrieve and Prepare Surveillance Data

Use epidatr::pub_covidcast() to pull National Healthcare Safety Network (NHSN) hospital admission data for the selected disease and location. The data are returned with revision tracking (issue) that we rely on for nowcasting.

  # Map disease names to NHSN signal names
  # Based on epidatr NHSN signals for respiratory diseases
  signal_map <- list(
    "influenza" = "confirmed_admissions_flu_ew",
    "covid" = "confirmed_admissions_covid_ew",
    "rsv" = "confirmed_admissions_rsv_ew"
  )
  signal <- signal_map[[forecast_disease]]

  # Call epidatr to get the data
  epidata <- epidatr::pub_covidcast(
    source = "nhsn",
    signals = signal,
    geo_type = "state",
    time_type = "week",
    geo_values = tolower(geo_values),
    issues = "*"
  )

## Warning: No API key found. You will be limited to non-complex queries and encounter rate
## limits if you proceed.
## ℹ See `?save_api_key()` for details on obtaining and setting API keys.
## This warning is displayed once every 8 hours.

  target_data <- epidata |>
    select(
      location = geo_value,
      reference_date = time_value,
      report_date = issue,
      confirm = value
    ) |>
    epinowcast::enw_filter_report_dates(latest_date = eval_date) |>
    epinowcast::enw_filter_reference_dates(latest_date = nowcast_date)
  
    target_data$report_date %>% unique()

##  [1] "2024-11-17" "2024-11-24" "2024-12-01" "2024-12-08" "2024-12-15"
##  [6] "2024-12-22" "2024-12-29" "2025-01-05" "2025-01-12" "2025-01-19"
## [11] "2025-01-26" "2025-02-02" "2025-02-09" "2025-02-16" "2025-02-23"
## [16] "2025-03-02" "2025-03-09" "2025-03-16"

Here we have transformed the data so it is in the format for epinowcast, with columns * reference_date the date of the observation, in this example the date of an incident influenza admission in a hospital * report_date: the date of report for a given set of observations by reference date * confirm: the total (i.e. cumulative) number of hospitalisations by reference date and report date.

target_data

##       location reference_date report_date confirm
##         <char>         <IDat>      <IDat>   <num>
##    1:       md     2020-08-02  2024-11-17      NA
##    2:       md     2020-08-02  2024-11-24      NA
##    3:       md     2020-08-02  2024-12-01      NA
##    4:       md     2020-08-02  2024-12-08      NA
##    5:       md     2020-08-02  2024-12-15      NA
##   ---                                            
## 4171:       md     2025-02-09  2025-02-16    1035
## 4172:       md     2025-02-09  2025-02-23    1081
## 4173:       md     2025-02-09  2025-03-02    1081
## 4174:       md     2025-02-09  2025-03-09    1081
## 4175:       md     2025-02-09  2025-03-16    1081

Create two views of the data: the real-time version available on the nowcast date and the final version used for evaluation. This use very convenient filtering functions in epinowcast. The epinowcast function epinowcast::enw_latest_data() filters observations to keep only the latest available reported total counts for each reference date. The epinowcast function epinowcast::enw_filter_report_dates() is used to create a truncated dataset for generating a retrospective nowcast, using the data that would have been available as of the nowcast date.

observed_data <- epinowcast::enw_filter_report_dates(target_data, latest_date = nowcast_date)
obs_by_reference <- epinowcast::enw_latest_data(observed_data)

latest_data <- epinowcast::enw_latest_data(target_data) |>
  epinowcast::enw_filter_reference_dates(latest_date = max(observed_data$reference_date))

Let’s plot our two datasets, the red line shows the total number of confirmed admissions on each reference date, across all delays, using the data available at the nowcast date, whereas the blackline show the final value of the data.

obs_data_by_reference_date <- epinowcast::enw_latest_data(observed_data)

plot_data <- ggplot() +
  geom_line(
    data = obs_data_by_reference_date,
    aes(x = reference_date, y = confirm), color = "darkred"
  ) +
  geom_line(
    data = latest_data,
    aes(x = reference_date, y = confirm), color = "black"
  ) +
  theme_bw() +
  xlab("Reference date") +
  ylab(signal) +
  #scale_y_continuous(trans = "log10") +
  xlim(as.Date("2024-11-01"), as.Date(nowcast_date))
  ggtitle("Comparing real-time and later observed cases")

## <ggplot2::labels> List of 1
##  $ title: chr "Comparing real-time and later observed cases"

plot_data

## Warning: Removed 222 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Removed 222 rows containing missing values or values outside the scale range
## (`geom_line()`).

We see that some revision occurred ! Let’s try to fix these.

Visualizing all versions on the same plot

Now let’s create a plot showing all different report date versions in different colors on the same graph. This helps us understand how the data evolved over time as new reports came in.

# Get all unique report dates to create snapshots
all_report_dates <- target_data %>%
  pull(report_date) %>%
  unique() %>%
  sort()

# Create a dataframe with the latest data available as of each report date
# Using epinowcast::enw_latest_data() to properly get the latest value for each reference date
all_versions <- map_dfr(all_report_dates, function(rd) {
  target_data %>%
    epinowcast::enw_filter_report_dates(latest_date = rd) %>%
    epinowcast::enw_latest_data() %>%
    mutate(version = rd)
})

# Create plot with all versions in different colors
ggplot(all_versions, aes(x = reference_date, y = confirm, color = as.factor(version))) +
  geom_line(linewidth = 0.8) +
  theme_bw() +
  xlab("Reference date") +
  ylab(signal) +
  labs(
    title = "All Data Versions by Report Date",
    subtitle = "Each color represents the data as it appeared on a different report date",
    color = "Report Date"
  ) +
  scale_color_viridis_d(option = "D", end = 0.9) +
  theme(legend.position = "right",
        legend.key.height = unit(0.4, "cm")) +
  xlim(as.Date("2024-11-01"), as.Date(nowcast_date))

## Warning: Removed 3996 rows containing missing values or values outside the scale range
## (`geom_line()`).

This plot shows how the reported values for each reference date changed over time as more complete data became available. Earlier versions (earlier colors in the legend) show lower values due to reporting delays, while later versions show progressively higher and more complete counts.

Build the Reporting Triangle

Restrict the data to a training window and construct the reporting triangle, which stores rows of reference weeks and columns of reporting delays.

# Empirical data outside this delay window will not be used for training
max_delay <- 4 # weeks
n_training_volume <- 8*7 # days (30 weeks)

Next we will use the epinowcast function, epinowcast::enw_filter_reference_dates() to filter to only include n_training_volume days of historical data, and the epinowcast function epinowcast::enw_latest_data() will be used to filter for the latest available reported total counts for each reference date. Finally to obtain the data we want to evaluate the forecasts against, we will use epinowcast::enw_filter_reference_dates() applied to the target_data, to filter it for only the n_training_volume days of historical data.

training_data <- epinowcast::enw_filter_reference_dates(observed_data, include_days = n_training_volume)

latest_training_data <- epinowcast::enw_latest_data(training_data)

eval_data <- epinowcast::enw_filter_reference_dates(latest_data, include_days = n_training_volume-1)

pobs <- epinowcast::enw_preprocess_data(
  obs = training_data,
  max_delay = max_delay+2,
  timestep="week",
  set_negatives_to_zero = TRUE
)

## Warning: The coverage of the specified maximum delay could not be reliably checked.
## • There are only very few (4) reference dates that are sufficiently far in the
##   past (more than 27 days) to compute coverage statistics for the maximum
##   delay.
## • You can test different maximum delays and obtain coverage statistics using
##   the function check_max_delay() (`?epinowcast::check_max_delay()`).

reporting_triangle_df <- select(
  pobs$new_confirm[[1]],
  reference_date,
  delay,
  new_confirm
)

reporting_triangle <- reporting_triangle_df |>
  pivot_wider(names_from = delay, values_from = new_confirm) |>
  select(-reference_date) |>
  as.matrix()

reporting_triangle[is.na(reporting_triangle)] <- 0

reporting_triangle

##        0  1 2 3 4
## [1,]  94 37 0 0 0
## [2,] 134 44 0 0 0
## [3,] 367 40 0 0 0
## [4,] 366  0 0 6 0
## [5,] 459  0 0 0 0
## [6,] 582  0 0 0 0
## [7,] 770 40 0 0 0
## [8,] 962  0 0 0 0

Look at the output of the triangle, what do you notice ?

Inspect the reporting triangle using a heat map.

#### code to plot:
triangle_df <- as.data.frame(reporting_triangle) |>
  mutate(time = row_number()) |>
  pivot_longer(!time,
    values_to = "count",
    names_prefix = "V",
    names_to = "delay"
  ) |>
  mutate(delay = as.numeric(delay))

plot_triangle <- ggplot(
  triangle_df,
  aes(x = delay, y = time, fill = count)
) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "red") +
  labs(title = "Reporting triangle", x = "Delay", y = "Time") +
  theme_bw() +
  scale_y_reverse()

plot_triangle

Here, the grey indicates matrix elements that are NA, which we would expect to be the case in the bottom right portion of the reporting triangle where the counts have yet to be observed.

Estimate Delay and Generate Point Nowcasts

We use half of the available reference weeks to estimate the delay distribution and reserve the remaining weeks for evaluation.

# most recent 50% of the reference times for delay estimation
n_history_delay <- as.integer(0.5 * n_training_volume) # days

delay_pmf <- baselinenowcast::estimate_delay(
  reporting_triangle = reporting_triangle,
  max_delay = max_delay,
  n = n_history_delay/7
)


delay_df <- data.frame(
  delay = 0:(length(delay_pmf) - 1),
  pmf = delay_pmf
)

delay_cdf_plot <- ggplot(delay_df) +
  geom_line(aes(x = delay, y = cumsum(pmf))) +
  xlab("Delay") +
  ylab("Cumulative proportion reported") +
  ggtitle("Empirical point estimate of cumulative proportion reported by delay") + # nolint
  theme_bw()

delay_pmf_plot <- ggplot(delay_df) +
  geom_line(aes(x = delay, y = pmf)) +
  xlab("Delay") +
  ylab("Proportion reported") +
  ggtitle("Empirical point estimate of proportion reported by delay") +
  theme_bw()

Let’s plot the distribution of delays

delay_cdf_plot

delay_pmf_plot

Apply the delay to generate a point nowcast

point_nowcast_matrix <- baselinenowcast::apply_delay(
  reporting_triangle = reporting_triangle,
  delay_pmf = delay_pmf
)

point_nowcast_df <- eval_data |>
  mutate(nowcast = rowSums(point_nowcast_matrix))

prep_latest_data <- latest_training_data |>
  mutate(type = "Real-time data") |>
  select(type, reference_date, count = confirm)
# Combine data into a single dataframe for plotting
plot_data <- point_nowcast_df |>
  pivot_longer(
    cols = c(confirm, nowcast),
    names_to = "type",
    values_to = "count"
  ) |>
  mutate(type = case_when(
    type == "confirm" ~ "Final observed data",
    type == "nowcast" ~ "Point nowcast",
    TRUE ~ type
  )) |>
  bind_rows(prep_latest_data)

# Create plot with data type as a variable
plot_pt_nowcast <- ggplot(plot_data, aes(
  x = reference_date,
  y = count,
  color = type,
  linetype = type
)) +
  geom_line() +
  scale_color_manual(values = c(
    "Real-time data" = "darkred",
    "Final observed data" = "black",
    "Point nowcast" = "darkblue"
  )) +
  scale_linetype_manual(values = c(
    "Real-time data" = "dashed",
    "Final observed data" = "dashed",
    "Point nowcast" = "solid"
  )) +
  theme_bw() +
  xlab("Reference date") +
  ylab("Confirmed admissions") +
  #scale_y_continuous(trans = "log10") +
  ggtitle("Comparing real-time, nowcasted, and later observed cases") +
  theme(legend.position = "bottom") +
  labs(color = "Type", linetype = "Type")
plot_pt_nowcast

As you observe, this example is disappointing, as in this example the point nowcast (blue) sits almost exactly on top of the real-time observations (red). That behaviour aligns with the estimated delay distribution, which places nearly all probability mass on a zero-week delay for the most recent reference times.

Unfortunately, the NHSN signal currently exposed via Epidata only stores revisions back to late 2024-11-17, so we cannot draw on a longer training history to expose weeks with notable late reports.

Indeed, nowcasting is beneficial with a longer history of data (unfortnuataly epidata does not store NHSN issues before 2024-11-17), better nowcasting nodels.

To see larger adjustments, experiment with other locations, earlier nowcast dates, or datasets that contain richer revision patterns (for example the Robert Koch Institute line list available through the Germany Nowcasting Hub).

Next Steps

Use baselinenowcast::estimate_and_apply_uncertainty() to obtain probabilistic nowcast intervals if you need calibrated uncertainty estimates.
Adjust the training window (n_training_weeks) or max_delay to match the reporting dynamics of other diseases or locations.
Explore model variants documented in the baselinenowcast vignette for stratified data or alternative delay models.

a more interesting example

This section follow closely the Getting started of the baselinenowcast package

Dataset

We will work on a more interesting dataset: daily level data from the Robert Koch Institute via the Germany Nowcasting hub. These data represent hospital admission counts by date of positive test and date of test report in Germany up to October 1, 2021. Filter the data to just look at the national-level data, for all age groups. We will pretend that we are making a nowcast as of August 1, 2021, therefore we will exclude all reference dates and report dates from before that date. germany_covid19_hosp is provided as package data from epinowcast

nowcast_date <- "2021-08-01"
eval_date <- "2021-10-01"

target_data <- epinowcast::germany_covid19_hosp[location == "DE"][age_group == "00+"] |>
  epinowcast::enw_filter_report_dates(latest_date = eval_date) |>
  epinowcast::enw_filter_reference_dates(
    latest_date = nowcast_date
  )
latest_data <- epinowcast::enw_latest_data(target_data)
observed_data <- epinowcast::enw_filter_report_dates(
  target_data,
  latest_date = nowcast_date
)

head(observed_data)

##    reference_date location age_group confirm report_date
##            <IDat>   <fctr>    <fctr>   <int>      <IDat>
## 1:     2021-04-06       DE       00+     149  2021-04-06
## 2:     2021-04-07       DE       00+     312  2021-04-07
## 3:     2021-04-08       DE       00+     424  2021-04-08
## 4:     2021-04-09       DE       00+     288  2021-04-09
## 5:     2021-04-10       DE       00+     273  2021-04-10
## 6:     2021-04-11       DE       00+     107  2021-04-11

Click to expand code to create the plot of the latest data

obs_data_by_reference_date <- epinowcast::enw_latest_data(observed_data)
plot_data <- ggplot() +
  geom_line(
    data = obs_data_by_reference_date,
    aes(x = reference_date, y = confirm), color = "darkred"
  ) +
  geom_line(
    data = latest_data,
    aes(x = reference_date, y = confirm), color = "black"
  ) +
  theme_bw() +
  xlab("Reference date") +
  ylab("Confirmed admissions") +
  scale_y_continuous(trans = "log10") +
  ggtitle("Comparing real-time and later observed cases")

plot_data

Visualizing all versions on the same plot (Germany data)

Let’s create the same visualization for the Germany dataset, showing all different report date versions in different colors.

# Get report dates - sample monthly to avoid too many versions
all_report_dates_germany <- target_data %>%
  pull(report_date) %>%
  unique() %>%
  sort()

# Select only the first day of each month (or closest available date)
monthly_report_dates <- all_report_dates_germany %>%
  as.Date() %>%
  tibble(date = .) %>%
  mutate(year_month = format(date, "%Y-%m")) %>%
  group_by(year_month) %>%
  slice_min(date, n = 1) %>%
  pull(date)

# Create a dataframe with the latest data available as of each monthly report date
# Using epinowcast::enw_latest_data() to properly get the latest value for each reference date
all_versions_germany <- map_dfr(monthly_report_dates, function(rd) {
  target_data %>%
    epinowcast::enw_filter_report_dates(latest_date = rd) %>%
    epinowcast::enw_latest_data() %>%
    mutate(version = rd)
})

# Create plot with all versions in different colors
ggplot(all_versions_germany, aes(x = reference_date, y = confirm, color = as.factor(version))) +
  geom_line(linewidth = 0.8) +
  theme_bw() +
  xlab("Reference date") +
  ylab("Confirmed admissions") +
  labs(
    title = "All Data Versions by Report Date (Germany COVID-19)",
    subtitle = "Monthly snapshots showing how the data evolved over time",
    color = "Report Date"
  ) +
  scale_color_viridis_d(option = "D", end = 0.9) +
  scale_y_continuous(trans = "log10") +
  theme(legend.position = "right",
        legend.key.height = unit(0.4, "cm"))

This visualization is much more interesting for the Germany dataset! You can clearly see how the reported values evolved significantly over time, with substantial revisions occurring as more complete data became available.

Let’s build the reporting triangle

max_delay <- 30
n_training_volume <- 3 * max_delay
n_history_delay <- as.integer(0.5 * n_training_volume)

training_data <- epinowcast::enw_filter_reference_dates(
  observed_data,
  include_days = n_training_volume - 1
)
latest_training_data <- epinowcast::enw_latest_data(training_data)
eval_data <- epinowcast::enw_filter_reference_dates(
  latest_data,
  include_days = n_training_volume - 1
)

pobs <- epinowcast::enw_preprocess_data(
  obs = training_data,
  max_delay = max_delay + 1
)
reporting_triangle_df <- select(
  pobs$new_confirm[[1]],
  reference_date,
  delay,
  new_confirm
)
reporting_triangle <- reporting_triangle_df |>
  pivot_wider(names_from = delay, values_from = new_confirm) |>
  select(-reference_date) |>
  as.matrix()

Click to expand code to create the plot of the reporting triangle

triangle_df <- as.data.frame(reporting_triangle) |>
  mutate(time = row_number()) |>
  pivot_longer(!time,
    values_to = "count",
    names_prefix = "V",
    names_to = "delay"
  ) |>
  mutate(delay = as.numeric(delay))

plot_triangle <- ggplot(
  triangle_df,
  aes(x = delay, y = time, fill = count)
) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "blue") +
  labs(title = "Reporting triangle", x = "Delay", y = "Time") +
  theme_bw() +
  scale_y_reverse()

plot_triangle

Much more interesting !!

Let’s estimate the delay

delay_pmf <- estimate_delay(
  reporting_triangle = reporting_triangle,
  max_delay = max_delay,
  n = n_history_delay
)

Click to expand code to create the plot of the delay distribution

delay_df <- data.frame(
  delay = 0:(length(delay_pmf) - 1),
  pmf = delay_pmf
)

delay_cdf_plot <- ggplot(delay_df) +
  geom_line(aes(x = delay, y = cumsum(pmf))) +
  xlab("Delay") +
  ylab("Cumulative proportion reported") +
  ggtitle("Empirical point estimate of cumulative proportion reported by delay") + # nolint
  theme_bw()

delay_pmf_plot <- ggplot(delay_df) +
  geom_line(aes(x = delay, y = pmf)) +
  xlab("Delay") +
  ylab("Proportion reported") +
  ggtitle("Empirical point estimate of proportion reported by delay") +
  theme_bw()

delay_cdf_plot

delay_pmf_plot

Generate nowcast with delay

point_nowcast_matrix <- baselinenowcast::apply_delay(
  reporting_triangle = reporting_triangle,
  delay_pmf = delay_pmf
)

Click to expand code to create the plot of the point nowcast

point_nowcast_df <- eval_data |>
  mutate(nowcast = rowSums(point_nowcast_matrix))

prep_latest_data <- latest_training_data |>
  mutate(type = "Real-time data") |>
  select(type, reference_date, count = confirm)
# Combine data into a single dataframe for plotting
plot_data <- point_nowcast_df |>
  pivot_longer(
    cols = c(confirm, nowcast),
    names_to = "type",
    values_to = "count"
  ) |>
  mutate(type = case_when(
    type == "confirm" ~ "Final observed data",
    type == "nowcast" ~ "Point nowcast",
    TRUE ~ type
  )) |>
  bind_rows(prep_latest_data)

# Create plot with data type as a variable
plot_pt_nowcast <- ggplot(plot_data, aes(
  x = reference_date,
  y = count,
  color = type
)) +
  geom_line() +
  scale_color_manual(values = c(
    "Real-time data" = "darkred",
    "Final observed data" = "black",
    "Point nowcast" = "darkblue"
  )) +
  theme_bw() +
  xlab("Reference date") +
  ylab("Confirmed admissions") +
  scale_y_continuous(trans = "log10") +
  ggtitle("Comparing real-time, nowcasted, and later observed cases") +
  theme(legend.position = "bottom") +
  labs(color = "Type")

plot_pt_nowcast

Estimate uncertainy

For uncertainty estimation, we will generate retrospective nowcast datasets with the most recent 50% of the reference times.

n_retrospective_nowcasts <- as.integer(0.5 * n_training_volume)


trunc_rep_tri_list <- baselinenowcast::truncate_triangles(reporting_triangle,
  n = n_retrospective_nowcasts
)
retro_rep_tri_list <- baselinenowcast::construct_triangles(trunc_rep_tri_list)

retro_pt_nowcast_mat_list <- baselinenowcast::fill_triangles(
  retro_reporting_triangles = retro_rep_tri_list,
  n = n_history_delay
)

disp_params <- baselinenowcast::estimate_uncertainty(
  point_nowcast_matrices = retro_pt_nowcast_mat_list,
  truncated_reporting_triangles = trunc_rep_tri_list,
  retro_reporting_triangles = retro_rep_tri_list,
  n = n_retrospective_nowcasts
)

nowcast_draws_df <- baselinenowcast::sample_nowcasts(
  point_nowcast_matrix, reporting_triangle,
  uncertainty_params = disp_params,
  draws = 100
)

head(nowcast_draws_df)

##   pred_count time draw
## 1        736    1    1
## 2        897    2    1
## 3        893    3    1
## 4        804    4    1
## 5        722    5    1
## 6        492    6    1

Vizualize probablistic nowcast

latest_data_prepped <- latest_training_data |>
  mutate(time = row_number()) |>
  rename(obs_confirm = confirm) |>
  mutate(reference_date = as.Date(reference_date))

# Prepare the final evaluation data so we can combine the datasets.

final_data_prepped <- eval_data |>
  select(reference_date, final_confirm = confirm) |>
  mutate(reference_date = as.Date(reference_date))
# Join the nowcasts, data as of the nowcast date, and the final data.
obs_with_nowcast_draws_df <- nowcast_draws_df |>
  left_join(latest_data_prepped, by = "time") |>
  left_join(final_data_prepped, by = "reference_date")
head(obs_with_nowcast_draws_df)

##   pred_count time draw reference_date location age_group obs_confirm
## 1        736    1    1     2021-05-04       DE       00+         823
## 2        897    2    1     2021-05-05       DE       00+        1028
## 3        893    3    1     2021-05-06       DE       00+        1016
## 4        804    4    1     2021-05-07       DE       00+         892
## 5        722    5    1     2021-05-08       DE       00+         822
## 6        492    6    1     2021-05-09       DE       00+         561
##   report_date final_confirm
## 1  2021-07-24           823
## 2  2021-07-25          1028
## 3  2021-07-26          1016
## 4  2021-07-27           892
## 5  2021-07-28           822
## 6  2021-07-29           561

Click to expand code to create the plot of the probabilistic nowcast

# Create a separate dataframe for only the observed and final data, to make plotting easier.
combined_data <- obs_with_nowcast_draws_df |>
  select(reference_date, obs_confirm, final_confirm) |>
  distinct() |>
  pivot_longer(
    cols = c(obs_confirm, final_confirm),
    names_to = "type",
    values_to = "count"
  ) |>
  mutate(type = case_when(
    type == "obs_confirm" ~ "Observed data",
    type == "final_confirm" ~ "Final observed data"
  ))


# Plot with draws for nowcast only
plot_prob_nowcast <- ggplot() +
  # Add nowcast draws as thin gray lines
  geom_line(
    data = obs_with_nowcast_draws_df,
    aes(
      x = reference_date, y = pred_count, group = draw,
      color = "Nowcast draw", linewidth = "Nowcast draw"
    )
  ) +
  # Add observed data and final data once
  geom_line(
    data = combined_data,
    aes(
      x = reference_date,
      y = count,
      color = type,
      linewidth = type
    )
  ) +
  theme_bw() +
  scale_color_manual(
    values = c(
      "Nowcast draw" = "gray",
      "Observed data" = "darkred",
      "Final observed data" = "black"
    ),
    name = ""
  ) +
  scale_linewidth_manual(
    values = c(
      "Nowcast draw" = 0.2,
      "Observed data" = 1,
      "Final observed data" = 1
    ),
    name = ""
  ) +
  scale_y_continuous(trans = "log10") +
  xlab("Reference date") +
  ylab("Hospital admissions") +
  theme(legend.position = "bottom") +
  ggtitle("Comparison of admissions as of the nowcast date, later observed counts, \n and probabilistic nowcasted counts") # nolint

plot_prob_nowcast

More ressources:

Paulo Ventura’s archived NHSN exports: https://paulocv.github.io/respiratory_archive/
baselinenowcast getting started guide: https://baselinenowcast.epinowcast.org/articles/baselinenowcast.html (and the accompanying paper: https://www.medrxiv.org/content/10.1101/2025.08.14.25333653v2)
epinowcast introduction for principled hierarchical nowcasting models: https://package.epinowcast.org/articles/epinowcast.html
Delphi Insight Net Workshop 2024 slide decks (see the first two sessions): https://cmu-delphi.github.io/insightnet-workshop-2024. Here the nowcasting is done using the scaling methods, but using another set of software tools
SISMID NFIDD course notes on nowcasting: https://nfidd.github.io/sismid/
Lison et al. review of the nowcasting literature: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1012021