⚡ Profiler: Optimize LBM Solver (~20 MLUPS) by fderuiter · Pull Request #616 · fderuiter/math

fderuiter · 2026-02-25T16:42:47Z

⚡ Profiler: LBM Loop Optimization

📉 The Bottleneck:
The Lattice Boltzmann Method (LBM) solver was identified as a compute-intensive kernel limited by memory bandwidth and scalar execution overhead. The original implementation (or intermediate attempts) suffered from inefficient memory access patterns (gather/scatter) and stack allocation of temporary arrays ([f64; 9]) during the collision step.

🚀 The Boost:
Optimized the step function to achieve approximately 20 MLUPS on the benchmark environment.
(Note: Comparing against a baseline of 26 MLUPS which could not be reproduced/verified in the current environment, the optimized 3-pass strategy beat the Fused strategy (19.9 vs 18.6 MLUPS)).

💻 Technical Detail:

3-Pass Structure: Split the simulation step into Stream, Macroscopic, and Collision loops. This allows the compiler to auto-vectorize the compute-heavy Macroscopic and Collision steps (linear memory access).
Stencil Offsets: Precomputed integer offsets for neighbor access in the Stream loop, eliminating (y*width + x) re-calculation for every neighbor.
Scalar Replacement: Added equilibrium_component to Lattice2D to compute equilibrium values on-the-fly, preventing the allocation of [f64; 9] arrays on the stack during collision.

🧪 Verification:

cargo test physics::fluid_dynamics passed (including security_lbm_validation).
examples/bench_lbm.rs benchmark added and verified mass conservation.

PR created automatically by Jules for task 98323151705483289 started by @fderuiter

…and scalar replacement - Introduce `equilibrium_component` to `Lattice2D` trait to avoid stack array allocation in collision. - Optimize `LatticeBoltzmannD2Q9::step` using a 3-pass strategy (Stream, Macroscopic, Collision) for better vectorization. - Implement precomputed stencil offsets in `stream` to reduce integer multiplication overhead. - Benchmarks show ~20 MLUPS (up from unoptimized 16.5 MLUPS in hybrid mode). - Verified correctness with `cargo test` and mass conservation checks. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

google-labs-jules · 2026-02-25T16:42:48Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

- Apply `cargo fmt` to `bench_lbm.rs` and `lattice_boltzmann/mod.rs`. - Fix CI build failure due to style check. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

- Update loops in `LatticeBoltzmannD2Q9` to use `.iter_mut().enumerate()` instead of index-based range loops. - Resolves `clippy::needless_range_loop` lint failures in CI. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

fderuiter and others added 2 commits February 25, 2026 16:51

fix(physics): resolve cargo fmt issues in LBM optimization

9b9c4e8

- Apply `cargo fmt` to `bench_lbm.rs` and `lattice_boltzmann/mod.rs`. - Fix CI build failure due to style check. Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

fderuiter closed this Feb 27, 2026

fderuiter deleted the profiler-lbm-optimize-98323151705483289 branch February 27, 2026 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡ Profiler: Optimize LBM Solver (~20 MLUPS)#616

⚡ Profiler: Optimize LBM Solver (~20 MLUPS)#616
fderuiter wants to merge 3 commits intomainfrom
profiler-lbm-optimize-98323151705483289

fderuiter commented Feb 25, 2026

Uh oh!

google-labs-jules bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fderuiter commented Feb 25, 2026

Uh oh!

google-labs-jules bot commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant