Skip to content

⚡ Profiler: Optimize LBM Solver (~20 MLUPS)#616

Closed
fderuiter wants to merge 3 commits intomainfrom
profiler-lbm-optimize-98323151705483289
Closed

⚡ Profiler: Optimize LBM Solver (~20 MLUPS)#616
fderuiter wants to merge 3 commits intomainfrom
profiler-lbm-optimize-98323151705483289

Conversation

@fderuiter
Copy link
Owner

⚡ Profiler: LBM Loop Optimization

📉 The Bottleneck:
The Lattice Boltzmann Method (LBM) solver was identified as a compute-intensive kernel limited by memory bandwidth and scalar execution overhead. The original implementation (or intermediate attempts) suffered from inefficient memory access patterns (gather/scatter) and stack allocation of temporary arrays ([f64; 9]) during the collision step.

🚀 The Boost:
Optimized the step function to achieve approximately 20 MLUPS on the benchmark environment.
(Note: Comparing against a baseline of 26 MLUPS which could not be reproduced/verified in the current environment, the optimized 3-pass strategy beat the Fused strategy (19.9 vs 18.6 MLUPS)).

💻 Technical Detail:

  1. 3-Pass Structure: Split the simulation step into Stream, Macroscopic, and Collision loops. This allows the compiler to auto-vectorize the compute-heavy Macroscopic and Collision steps (linear memory access).
  2. Stencil Offsets: Precomputed integer offsets for neighbor access in the Stream loop, eliminating (y*width + x) re-calculation for every neighbor.
  3. Scalar Replacement: Added equilibrium_component to Lattice2D to compute equilibrium values on-the-fly, preventing the allocation of [f64; 9] arrays on the stack during collision.

🧪 Verification:

  • cargo test physics::fluid_dynamics passed (including security_lbm_validation).
  • examples/bench_lbm.rs benchmark added and verified mass conservation.

PR created automatically by Jules for task 98323151705483289 started by @fderuiter

…and scalar replacement

- Introduce `equilibrium_component` to `Lattice2D` trait to avoid stack array allocation in collision.
- Optimize `LatticeBoltzmannD2Q9::step` using a 3-pass strategy (Stream, Macroscopic, Collision) for better vectorization.
- Implement precomputed stencil offsets in `stream` to reduce integer multiplication overhead.
- Benchmarks show ~20 MLUPS (up from unoptimized 16.5 MLUPS in hybrid mode).
- Verified correctness with `cargo test` and mass conservation checks.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@google-labs-jules
Copy link
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

fderuiter and others added 2 commits February 25, 2026 16:51
- Apply `cargo fmt` to `bench_lbm.rs` and `lattice_boltzmann/mod.rs`.
- Fix CI build failure due to style check.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
- Update loops in `LatticeBoltzmannD2Q9` to use `.iter_mut().enumerate()` instead of index-based range loops.
- Resolves `clippy::needless_range_loop` lint failures in CI.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
@fderuiter fderuiter closed this Feb 27, 2026
@fderuiter fderuiter deleted the profiler-lbm-optimize-98323151705483289 branch February 27, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant