Course Project · DS 289 · IISc Bengaluru

Effect of Floating-Point
Precision on PDE Solvers

How does precision — FP64 vs FP32 vs FP16 — affect the accuracy, stability, conservation, and cost of four finite-difference schemes on the Linear Advection-Diffusion and Viscous Burgers' equations?

🔢 4 Schemes 🧪 6 Experiments ⚡ FP64 / FP32 / FP16 🔥 PyTorch Solvers IISc · DS 289 · 2024–25

Governing Equations

Two Canonical PDEs

🌊
Linear Advection-Diffusion
u_t + c·u_x = ν·u_xx
x ∈ [0,1], periodic BCs
Advection speed c = 1, diffusivity ν = 1/Re. Exact solution known:
u(x,t) = A·exp(−νk²t)·sin(k(x−ct)), k = 2πm. Used for ground-truth L2/Linf error measurements.
💥
Viscous Burgers' Equation
u_t + (u²/2)_x = ν·u_xx
u₀(x) = sin(2πx), periodic BCs
No closed-form solution for general Re. Reference solution computed via Cole–Hopf spectral method on a dense N = 8192 grid. FP64 pilot time-step schedule replayed verbatim at FP32/FP16 to isolate precision effects.

Numerical Schemes

Four Finite-Difference Methods

Scheme 1 · 1st Order
Upwind / Godunov + Forward Euler
First-order upwind for linear advection; Godunov exact Riemann solver for Burgers. Explicit Forward Euler time integration.
Godunov FluxForward EulerO(Δx, Δt)
Scheme 2 · 1st Order
Lax–Friedrichs / Rusanov + Forward Euler
Lax–Friedrichs flux for linear advection; Rusanov (local Lax–Friedrichs) for Burgers. More dissipative than Godunov.
Rusanov FluxForward EulerO(Δx, Δt)
Scheme 3 · 2nd Order
Lax–Wendroff / Richtmyer
Second-order Lax–Wendroff flux for linear; Richtmyer two-step predictor–corrector for Burgers. Dispersive errors near shocks.
Lax-WendroffRichtmyerO(Δx², Δt²)
Scheme 4 · 2nd Order
MUSCL + Rusanov + SSP-RK2
MUSCL slope-limited reconstruction (minmod/mc/vanleer) + Rusanov flux. Strong Stability Preserving RK2 (Shu–Osher) time integration. TVD and ~3× costlier per step than Scheme 1.
MUSCLSSP-RK2TVDO(Δx², Δt²)

Experiments

Six Stress Tests per Scheme

Every scheme runs all six experiments × three precision variants (FP64 / FP32 / FP16), producing a uniform diagnostics dataset for direct cross-scheme comparison.

#ExperimentPurposeGridCFL
01 Baseline Regime Sweep Accuracy across Re = 10, 100, 1000 nx = 1024 0.80
02 Ultra-High Re Shock Stress Shock sharpening at Re = 100,000 — precision sensitivity nx = 1024 0.80
03 Under-Resolved High Re Coarse grid (nx=128) — FP16 breakdown & instability nx = 128 0.80
04 Tiny Amplitude Quantization Amplitude 1e-6 — precision floor & lost updates (linear only) nx = 1024 0.80
05 Long-Horizon Drift Extended time — conservation error & TV drift accumulation nx = 1024 0.50
06 CFL Overdrive Failure Intentional CFL = 1.2 — method-level failure mode nx = 256 1.20

Precision

FP64 / FP32 / FP16 Variants

Precisiontorch.dtypeBitsMantissaStability (CFL=0.8)High Re AccuracyLong Horizon
FP64 torch.float64 64 52 bits Stable ✓ Reference No drift
FP32 torch.float32 32 23 bits Stable ✓ Small error Slow drift
FP16 torch.float16 16 10 bits Unstable (exp03) Catastrophic Mass drift

Key Findings

What the Experiments Showed

FP64 — Reference
Consistently Lowest Errors
FP64 achieves the lowest L2 and L∞ errors across all four schemes and all six experiments. No stability issues at any tested CFL or Reynolds number.
FP32 — Practical Sweet Spot
Adequate for Most Regimes
FP32 introduces small but measurable errors at Re ≥ 1000 where gradients are steep. Stable at CFL = 0.80 across all schemes. Shows slow mass drift over long time horizons.
FP16 — Dangerous
Catastrophic Cancellation at High Re
FP16's 10-bit mantissa causes catastrophic cancellation in high-Re / tiny-amplitude experiments. Loses stability on under-resolved grids (nx=128, Re=100,000). Not suitable for production solvers.
Scheme 4 (MUSCL + SSP-RK2)
Best Accuracy, TVD Preserved
MUSCL with minmod limiter maintains Total Variation Diminishing (TVD) behaviour even at FP32. ~3× costlier per cell per step than Scheme 1, but substantially higher accuracy per grid point.
Scheme 3 (Lax–Wendroff)
Dispersive Errors Near Shocks
Lax–Wendroff exhibits slight TV growth near shocks due to its dispersive error character — even at FP64. FP16 amplifies these oscillations significantly.
Conservation
FP16 Mass Drift Over Time
Schemes 1 & 2 are dissipative — TV decreases monotonically. FP16 amplifies mass drift noticeably over long time horizons (exp05). First-order schemes + FP16 = worst conservation.

Run Locally

Getting Started

# 1. Clone and install git clone https://github.com/rajneeshbabu/precision-pde-solvers.git cd precision-pde-solvers pip install -r requirements.txt # 2. Open any scheme notebook (each is self-contained) jupyter notebook "scheme1/scheme1(Godunov+Forward_Euler).ipynb" jupyter notebook "scheme2/scheme2(Lax-Friedrichs_Rusanov+Forward_Euler).ipynb" jupyter notebook "scheme3/scheme3(Lax-Wendroff).ipynb" jupyter notebook "scheme4/scheme4_MUSCL_Rusanov_SSPRK2.ipynb" # 3. Run all cells top-to-bottom — last cell executes all 6 experiments # Results (CSVs + figures) saved in the scheme folder automatically

Tech Stack

Libraries & Tools

Python 3.10+ PyTorch 2.0+ NumPy SciPy Pandas Matplotlib Jupyter Notebook

Results

Precision Impact — Visual Summary

L2 Error by Precision — Baseline Regime (Re=1000, Advection-Diffusion)

0 1e-5 1e-4 1e-3 Scheme 1 Scheme 2 Scheme 3 Scheme 4 FP64 FP32 FP16

FP64 consistently achieves the lowest L2 error. FP16 degrades sharply at high Reynolds numbers.

Stability Window — FP16 Breakdown at High Re (MUSCL Scheme)

Re=10 Re=100 Re=1k Re=100k NaN ✗ FP64 — stable FP32 — marginal drift FP16 — diverges

FP16 loses stability on under-resolved, high-Re grids (exp03: nx=128, Re=100,000). FP64 and FP32 remain stable at CFL=0.80 across all schemes.

Mass Conservation Drift — Long-Horizon Experiment (exp05, Scheme 1)

0 1e-8 1e-6 1e-4 t=0 t=0.25 t=0.50 t=1.0 FP64 FP32 FP16 — catastrophic drift

FP16 accumulates significant mass error over time due to rounding in the flux accumulation loop. FP64 mass drift stays below machine epsilon throughout.

Team

Authors

RB
Rajneesh Babu
SR No. 26058 · IISc CDS
NB
Nishchith C. Bharadwaj
SR No. 26650 · IISc CDS
DY
Dhruv Yadav
SR No. 26641 · IISc CDS

M.Tech Computational and Data Sciences · Indian Institute of Science · Bengaluru · 2024–25