Course Project · DS 289 · IISc Bengaluru

Effect of Floating-Point
Precision on PDE Solvers

How does precision — FP64 vs FP32 vs FP16 — affect the accuracy, stability, conservation, and cost of four finite-difference schemes on the Linear Advection-Diffusion and Viscous Burgers' equations?

🔢 4 Schemes 🧪 6 Experiments ⚡ FP64 / FP32 / FP16 🔥 PyTorch Solvers IISc · DS 289 · 2024–25

GitHub Repo 📄 Report PDF

Governing Equations

Two Canonical PDEs

🌊

Linear Advection-Diffusion

u_t + c·u_x = ν·u_xx
x ∈ [0,1], periodic BCs

Advection speed c = 1, diffusivity ν = 1/Re. Exact solution known:
u(x,t) = A·exp(−νk²t)·sin(k(x−ct)), k = 2πm. Used for ground-truth L2/Linf error measurements.

💥

Viscous Burgers' Equation

u_t + (u²/2)_x = ν·u_xx
u₀(x) = sin(2πx), periodic BCs

No closed-form solution for general Re. Reference solution computed via Cole–Hopf spectral method on a dense N = 8192 grid. FP64 pilot time-step schedule replayed verbatim at FP32/FP16 to isolate precision effects.

Numerical Schemes

Four Finite-Difference Methods

Scheme 1 · 1st Order

Upwind / Godunov + Forward Euler

First-order upwind for linear advection; Godunov exact Riemann solver for Burgers. Explicit Forward Euler time integration.

Godunov FluxForward EulerO(Δx, Δt)

Scheme 2 · 1st Order

Lax–Friedrichs / Rusanov + Forward Euler

Lax–Friedrichs flux for linear advection; Rusanov (local Lax–Friedrichs) for Burgers. More dissipative than Godunov.

Rusanov FluxForward EulerO(Δx, Δt)

Scheme 3 · 2nd Order

Lax–Wendroff / Richtmyer

Second-order Lax–Wendroff flux for linear; Richtmyer two-step predictor–corrector for Burgers. Dispersive errors near shocks.

Lax-WendroffRichtmyerO(Δx², Δt²)

Scheme 4 · 2nd Order

MUSCL + Rusanov + SSP-RK2

MUSCL slope-limited reconstruction (minmod/mc/vanleer) + Rusanov flux. Strong Stability Preserving RK2 (Shu–Osher) time integration. TVD and ~3× costlier per step than Scheme 1.

MUSCLSSP-RK2TVDO(Δx², Δt²)

Experiments

Six Stress Tests per Scheme

Every scheme runs all six experiments × three precision variants (FP64 / FP32 / FP16), producing a uniform diagnostics dataset for direct cross-scheme comparison.

#	Experiment	Purpose	Grid	CFL
01	Baseline Regime Sweep	Accuracy across Re = 10, 100, 1000	nx = 1024	0.80
02	Ultra-High Re Shock Stress	Shock sharpening at Re = 100,000 — precision sensitivity	nx = 1024	0.80
03	Under-Resolved High Re	Coarse grid (nx=128) — FP16 breakdown & instability	nx = 128	0.80
04	Tiny Amplitude Quantization	Amplitude 1e-6 — precision floor & lost updates (linear only)	nx = 1024	0.80
05	Long-Horizon Drift	Extended time — conservation error & TV drift accumulation	nx = 1024	0.50
06	CFL Overdrive Failure	Intentional CFL = 1.2 — method-level failure mode	nx = 256	1.20

Precision

FP64 / FP32 / FP16 Variants

Precision	torch.dtype	Bits	Mantissa	Stability (CFL=0.8)	High Re Accuracy	Long Horizon
FP64	torch.float64	64	52 bits	Stable ✓	Reference	No drift
FP32	torch.float32	32	23 bits	Stable ✓	Small error	Slow drift
FP16	torch.float16	16	10 bits	Unstable (exp03)	Catastrophic	Mass drift

Key Findings

What the Experiments Showed

FP64 — Reference

Consistently Lowest Errors

FP64 achieves the lowest L2 and L∞ errors across all four schemes and all six experiments. No stability issues at any tested CFL or Reynolds number.

FP32 — Practical Sweet Spot

Adequate for Most Regimes

FP32 introduces small but measurable errors at Re ≥ 1000 where gradients are steep. Stable at CFL = 0.80 across all schemes. Shows slow mass drift over long time horizons.

FP16 — Dangerous

Catastrophic Cancellation at High Re

FP16's 10-bit mantissa causes catastrophic cancellation in high-Re / tiny-amplitude experiments. Loses stability on under-resolved grids (nx=128, Re=100,000). Not suitable for production solvers.

Scheme 4 (MUSCL + SSP-RK2)

Best Accuracy, TVD Preserved

MUSCL with minmod limiter maintains Total Variation Diminishing (TVD) behaviour even at FP32. ~3× costlier per cell per step than Scheme 1, but substantially higher accuracy per grid point.

Scheme 3 (Lax–Wendroff)

Dispersive Errors Near Shocks

Lax–Wendroff exhibits slight TV growth near shocks due to its dispersive error character — even at FP64. FP16 amplifies these oscillations significantly.

Conservation

FP16 Mass Drift Over Time

Schemes 1 & 2 are dissipative — TV decreases monotonically. FP16 amplifies mass drift noticeably over long time horizons (exp05). First-order schemes + FP16 = worst conservation.

Run Locally

Getting Started

# 1. Clone and install
git clone https://github.com/rajneeshbabu/precision-pde-solvers.git
cd precision-pde-solvers
pip install -r requirements.txt

# 2. Open any scheme notebook (each is self-contained)
jupyter notebook "scheme1/scheme1(Godunov+Forward_Euler).ipynb"
jupyter notebook "scheme2/scheme2(Lax-Friedrichs_Rusanov+Forward_Euler).ipynb"
jupyter notebook "scheme3/scheme3(Lax-Wendroff).ipynb"
jupyter notebook "scheme4/scheme4_MUSCL_Rusanov_SSPRK2.ipynb"

# 3. Run all cells top-to-bottom — last cell executes all 6 experiments
# Results (CSVs + figures) saved in the scheme folder automatically
    

Tech Stack

Libraries & Tools

Python 3.10+ PyTorch 2.0+ NumPy SciPy Pandas Matplotlib Jupyter Notebook

Results

Precision Impact — Visual Summary

L2 Error by Precision — Baseline Regime (Re=1000, Advection-Diffusion)

FP64 consistently achieves the lowest L2 error. FP16 degrades sharply at high Reynolds numbers.

Stability Window — FP16 Breakdown at High Re (MUSCL Scheme)

FP16 loses stability on under-resolved, high-Re grids (exp03: nx=128, Re=100,000). FP64 and FP32 remain stable at CFL=0.80 across all schemes.

Mass Conservation Drift — Long-Horizon Experiment (exp05, Scheme 1)

FP16 accumulates significant mass error over time due to rounding in the flux accumulation loop. FP64 mass drift stays below machine epsilon throughout.

Team

Authors

Rajneesh Babu

SR No. 26058 · IISc CDS

Nishchith C. Bharadwaj

SR No. 26650 · IISc CDS

Dhruv Yadav

SR No. 26641 · IISc CDS

M.Tech Computational and Data Sciences · Indian Institute of Science · Bengaluru · 2024–25

Effect of Floating-PointPrecision on PDE Solvers