Time-Masked Autoencoders for Fluid Dynamics

Background

Fluid dynamics simulations (Navier-Stokes, Shallow Water, etc.) produce spatiotemporal fields that evolve under known physical laws. Two distinct challenges:

Forecasting — given the first N timesteps, predict the next M
Reconstruction under partial observation — fill in missing frames from sparse observations (think: satellite imagery with cloud cover, or sensor dropouts)

Both reduce to “learn the dynamics well enough to interpolate/extrapolate.” Standard video prediction models (ConvLSTM, transformer baselines) struggle here because the structure of fluid dynamics is highly regular (governed by PDEs) but the appearance is chaotic in detail.

The question

Masked Autoencoders (MAE, VideoMAE) work shockingly well for image and video representation learning — randomly hide 75-90% of patches, learn to reconstruct the rest, and somehow the resulting features generalize beautifully.

What if we apply this to fluid dynamics? Specifically, mask entire temporal frames (not random spatial patches) — does this force the model to learn the underlying dynamical structure?

Approach

Architecture: ViT-style autoencoder operating on stacks of Shallow Water simulation frames
Time masking: randomly remove k frames from the input sequence; train to reconstruct the full sequence
Masking ratios tested: 25%, 50%, 80% of input frames
Forecasting variant: predict up to 20 future frames from a short history

Results

Setup	Mask %	Frames predicted	SSIM
Reconstruction	50%	(filled in)	~0.92
Reconstruction	80%	(filled in)	~0.85
Forecasting	n/a	20 future	0.80+

What we learned

High temporal masking is a strong regularizer — forces the model to learn what the dynamics must look like, not memorize pixel-level shortcuts
Generalization improved on out-of-distribution initial conditions compared to a forecaster trained without masking
Spatial-only masking underperformed — temporal masking specifically captures the PDE structure

What came of it

This was a focused collaboration with Imperial College London researchers working on physics-informed ML. The findings fed into their larger program on scalable PDE surrogates. For me personally, it was the project that hooked me on predictive world models — the same intuition (predict in latent space, throw away noise) shows up in NavJEPA today.

Stack: Python · PyTorch · TensorFlow · NumPy

Period: Nov 2023 – Jan 2024