GPU Acceleration

Roadmap for GPU Support in Elmfire.jl

Elmfire.jl is designed with GPU acceleration in mind. This page describes the current state, the planned approach, and the computational profile that motivates the work.

Status

Phase 0 (complete): Core types have been refactored to support GPU arrays.

  • FireState{T, A} is parameterized on array type A, so it can hold either Matrix{T} (CPU) or GPU arrays.
  • FuelModelArray{T} provides a dense struct-of-arrays fuel model lookup, replacing the Dict-based FuelModelTable which cannot run on GPU.
  • active_mask() converts the narrow band’s Set to a dense BitMatrix for GPU kernel masking.
  • surface_spread_rate_flat() computes Rothermel spread rates from FuelModelArray using array indexing instead of struct access.

Phase 1 (complete): GPU kernels for velocity, CFL, RK2, and simulation driver.

  • simulate_gpu! and simulate_gpu_uniform! run complete fire simulations on GPU.
  • Spread rate kernel computes Rothermel + elliptical spread + velocity components per cell.
  • CFL reduction kernel for adaptive timestep control.
  • RK2 level set update kernel with narrow band masking.
  • Supports accel_time_constant for fire acceleration (computed on host, passed as scalar).
  • Tested with KernelAbstractions.CPU() backend in CI.

Phase 2+ (planned): GPU ensemble support and optimization. See below.

Computational Profile

On a 512x512 grid with ~1,000 active cells in the narrow band, each simulation timestep breaks down roughly as:

Component Time Parallelizable?
Velocity calculation (Rothermel + wind) ~100 ms Yes (per-cell, independent)
Crown fire ~70 ms Yes (per-cell, independent)
Level set RK2 ~10 ms Yes (stencil, structured)
Weather interpolation ~10 ms Yes (per-cell lookup)
Narrow band updates ~10 ms No (set operations)

The first four components (~190 ms) are candidates for GPU offloading. The narrow band management (~10 ms) stays on the CPU.

Approach

KernelAbstractions.jl

GPU kernels will be written with KernelAbstractions.jl (KA), which provides a vendor-agnostic @kernel macro. The same kernel code runs on:

  • NVIDIA GPUs (via CUDA.jl)
  • AMD GPUs (via AMDGPU.jl)
  • Apple Silicon (via Metal.jl)
  • CPU (for testing without GPU hardware)

Package Extension

GPU support will be shipped as a package extension so that users without GPUs pay no dependency cost:

# Project.toml
[weakdeps]
KernelAbstractions = "..."
Adapt = "..."

[extensions]
ElmfireKAExt = ["KernelAbstractions", "Adapt"]

CPU/GPU Boundary

The narrow band is managed on the CPU (it uses Set operations). Each timestep:

  1. CPU collects active cells and builds a mask
  2. Mask is uploaded to GPU
  3. GPU runs velocity, CFL, and RK2 kernels
  4. Newly burned cell indices are downloaded to CPU
  5. CPU updates narrow band and burned status

Grid arrays (phi, ux, uy) stay resident on the GPU between timesteps to minimize transfers.

Planned Phases

Phase 2: GPU Ensemble Support

Multiple ensemble members run concurrently using GPU streams. Each stream executes an independent simulation, reusing the run_ensemble_threaded! pattern with GPU-accelerated inner loops.

Phase 3: Optimization

  • Dual-mode launch: 1D kernel over an index list for small fires (<5K active cells); 2D grid kernel with mask for large fires.
  • Persistent GPU state: Eliminate per-timestep mask uploads by detecting newly burned cells on GPU.
  • Float32 fast path: Already supported by the type system; validate precision on GPU.

Projected Performance

Estimates assume a mid-to-high-end GPU (e.g. NVIDIA A100) on a 512x512 grid:

Component CPU GPU (est.) Speedup
Velocity calculation ~170 ms ~2-5 ms 30-80x
RK2 level set ~10 ms ~0.2 ms 50x
CFL reduction ~1 ms ~0.1 ms 10x
Narrow band (CPU) ~10 ms ~10 ms 1x
Total per timestep ~200 ms ~15-20 ms 10-15x

The narrow band management on CPU becomes the bottleneck on GPU. For larger grids or more active cells, the GPU advantage grows because the CPU-side cost is proportionally smaller.

These are projections, not benchmarks. Actual performance will depend on hardware, grid size, and fire characteristics.