
Disease Forecast Planning for Public Health
Source:vignettes/forecast_planning.Rmd
forecast_planning.RmdIntroduction
Disease forecasting is an invaluable tool for the field public
health, however, clearly defining the parameters of the problems you
want to solve is often more difficult than the forecasting itself. The
acciddasuite pakcage provides a comprehensive toolkit for
generating forecasts with the expectation that users already have a
concrete idea of what they would like to forecast. In this vignette, we
provide a series of questions to aid forecasters in the discovery of the
“why”, “what”, “where” and “how” for their question. This package and
its documentation is still actively under development, and we welcome contributions
and feedback from the community.
Step 1: Why are we interested in forecasting?
First, there needs to be a clearly defined project to get started. Here are a set of questions to consider to begin this process:
-
What is the question that you are attempting to answer? Or what insights do you hope to gain?
- Determine what you are trying to gain from the forecasting project. Specificity at this stage makes future stages easier!
-
Who is the audience, or, who will benefit from the insights?
- Determine who will use the forecasts, and who the interpretation of them will benefit.
-
How far into the future are you interested in forecasting? How far into the future do these insights need to be to be useful?
- <7 days? 1-4 weeks? Full seasonal projections*?
- *full seasonsal projections are known as “scenarios” and are different from forecasts.
- The time frame associated with your question determines whether or not forecasting is the best tool. Read more in Step 2…
- <7 days? 1-4 weeks? Full seasonal projections*?
STOP! Have you defined your forecasting problem using the questions above?
- YES → Proceed to next step
- NO → Continue defining the approach
Step 2: Forecast vs. scenario?
Determine if your central question is strictly a forecast, or if it is more aligned with a scenario projection.
- Forecasts
- Forecasts are concerned with what will happen in the future under current conditions/regardless of what interventions take place. From this, we can determine actionables or deploy resources, but our starting point is based off of “real life” and not an assumption (i.e., an unconditional projection).
- e.g., I want advanced notice on the influenza hospitalization burden in the coming weeks.
- Scenarios
- Scenarios are concerned with what will happen if we take action
X? Or, how will the future differe if X happens
instead of Y? Often, with scenarios, you are comparing
different outcomes on the basis of different assumptions (i.e., a
conditional projection).
- e.g., I want to know what the predicted burden will be if 50% of the population is vaccinated against flu vs. only 30%.
- Scenarios are concerned with what will happen if we take action
X? Or, how will the future differe if X happens
instead of Y? Often, with scenarios, you are comparing
different outcomes on the basis of different assumptions (i.e., a
conditional projection).
If you want to produce projections of the future more than a few weeks at a time, your question is likely better suited for a scenario projection rather than a forecast (forecasts give us a look 1-4 weeks ahead of any given start date, scenarios show us entire seasons at a time). For more information on scenario projections, visit the Scenario Modeling Hub.
STOP! Have you confirmed that your question is best answered with a forecast?
- YES → Proceed to next step
- NO → Consider checking out scenario projections instead!
Step 3: Define Your Data
Next, define what pathogen, target (data stream), and spatial unit are involved in forecasting project.
-
What pathogen are you interested in forecasting?
- e.g., influenza, COVID-19, RSV, etc.
-
What data stream are you interested in forecasting (also known as your target)?
- Examples of common forecasting targets are “incidence of
hospitalization” (of patients with a certain hospitalization), “percent
of emergency department visits” (attributable to a certain pathogen),
“deaths” (due to a certain pathogen), “hospital bed occupancy” (by
patients with a certain pathogen). All of these targets are measures of
disease burden that give public health professionals an idea of how much
disease a population bears at a given time. The most common forecasting
target for respiratory illnesses is “incidence of hospitalization”,
which makes it the easiest to find data on. Presently,
acciddasuiteonly forecasts targets “incidence of hospitalization” and “death”.
- Examples of common forecasting targets are “incidence of
hospitalization” (of patients with a certain hospitalization), “percent
of emergency department visits” (attributable to a certain pathogen),
“deaths” (due to a certain pathogen), “hospital bed occupancy” (by
patients with a certain pathogen). All of these targets are measures of
disease burden that give public health professionals an idea of how much
disease a population bears at a given time. The most common forecasting
target for respiratory illnesses is “incidence of hospitalization”,
which makes it the easiest to find data on. Presently,
-
What spatial unit will provide the best insight? Is there data available at that scale?
- e.g., national, state, county, city, health jurisdiction, hospital system, or even facility (e.g., hospital). The more granular the spatial unit, the more difficult it is to find data, so there is often a trade-off between data specificity and availability.
Note that you can forecast multiple locations (e.g., multiple states or health jurisdictions) at once, but if you want forecast multiple pathogens or targets, it is best to separate those into their own distinct forecasts.
STOP! Have you defined your pathogen, target, and spatial unit?
- YES → Proceed to next step
- NO → Continue defining these data elements
Step 4: Data Availability & Limitations
In our context, forecasts are mathematical predictions of a few weeks ahead given a starting point of “ground truth” data (i.e, information on what has already happened). Because of this, you need to provide forecasting models with ground truth data that matches the resolution of your forecasting question (i.e., same pathogen, same target, same locations). Data availability is often a limiting factor when considering a forecasting question. In this step, we provide a decision tree approach to determine if you can forecast with ground truth data that already exists, if you need to provide specialized data to complete your forecast, or if there is already forecasting infrastructure that answers your forecasting question.
An easy way to find either ground truth data or forecasts is via a forecast hub. Forecasting hubs (organized by the hubverse) are standardized repositories for disease forecasts and ground truth data where all data follows structured guidelines. Ground truth data found in forecasting hubs is forecast-ready, and in fact, the forecasts themselves may answer your forecasting question(s) without any further action from you. For example, if you are forecasting RSV, COVID-19, or influenza at a U.S. national or state level*, your forecasting question is likely already answered by a forecasting hub:
- RSV: GitHub repository | RespiLens visualization
- COVID-19: GitHub repository | RespiLens visualization
- Influenza: GitHub
repository | RespiLens
visualization
- *There is also a hub for sub-state level influenza forecasts (GitHub repository | RespiLens visualization)
For a complete list of hubverse forecast hubs, see here.
Finding your data
After you decide to create your own forecasts and find a suitable ground truth data, you must confirm that your ground truth data stream is stable enough to support a repeatable workflow.
If you are pulling ground truth data from a hubverse hub, this data
is likely released on a weekly cadence and is mostly complete. You can
find this data in the target-data/ directory of the hub’s
GitHub repository. Alternatively, acciddasuite has a
built-in function (get_data()) that will
handle the collection + formatting of state level respiratory data. If
you want to use another ground truth data source, you will first have to
validate it with check_data(). Please see
the external data article for
information on external data source formatting.
If there are reporting delays in your data stream, or inconsistencies
that are often fixed later but you cannot wait on, use the
get_ncast() function to correct recent
weeks for reporting delays.
Next steps:
When you are ready to begin, visit the GET STARTED page to use
acciddasuite for your forecasting needs!