Inpainting Architecture

CoPaint Algorithm

InfluPaint uses the CoPaint algorithm (replacing earlier REpaint) to condition forecasts on observed data. Inpainting alters reverse diffusion iterations by sampling unmasked regions using ground-truth.

Atomic Inpainting Design

Each inpainting job is independent and atomic:

Atomic: Single scenario + single date + single config per job
Parallel: 20 dates × 3 configs = 60 parallel jobs
Independent: Each job can run on different nodes/GPUs
Fault-tolerant: Failed jobs don't affect others

Example: For scenario 5 with 20 dates and 3 configs: - Old way: 1 job × 60 sequential runs = slow - New way: 60 parallel jobs = fast

Model Loading

Inpainting uses dual model loading:

MLflow-first: Load trained model from MLflow using run ID
Filesystem fallback: Load from filesystem if MLflow unavailable

The get_mlflow_run_id.py utility maps scenario IDs to MLflow run IDs.

Mega-Array Submission

All scenarios submitted in one job array:

python generate_inpaint_jobs.py -e "paper-2025-06" --scenarios "0-31"
sbatch inpaint_array_paper-2025-06_all_scenarios.run

Example: 32 scenarios × 20 dates × 3 configs = 1,920 parallel jobs