Skip to content

Conversation

@phermosomore
Copy link

…er sub batch

PhysicsNeMo Pull Request

Description

This PR fixes an out-of-memory (OOM) issue when running GeoTransolver inference on large meshes (10M+ cells) that was causing the process to be killed.

During inference on full car meshes, the broadcast_global_features: true setting caused fx (global features: air density, stream velocity) to be replicated to every mesh point before sub-batching.

This, combined with downstream processing in the ContextProjector, exceeded GPU memory before even the first forward pass.

The GeoTransolver model uses a global_tokenizer (ContextProjector) that processes the global features through linear projections and multi-head attention. When fx is broadcast to 2M+ tokens upfront, the intermediate activations and attention computations scale linearly with mesh size, causing OOM.

Solution:

inference_on_vtk.py: Force broadcast_global_features: false in the datapipe for inference, regardless of the training config. This keeps fx as a single token (B, 1, 2).

inference_on_zarr.py: Modified batched_inference_loop to broadcast fx per sub-batch dynamically:
If fx is single-token → expand to match sub-batch size
If fx is full-mesh (legacy path) → slice for sub-batch

Why This Doesn't Affect Inference Quality?
Since all tokens in broadcast fx have identical values, the aggregation result is mathematically equivalent. The model sees the same sub-batch size it was trained on, just processed sequentially instead of all at once.

Checklist

Dependencies

None

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

…er sub batch

Signed-off-by: Pablo Hermoso Moreno <phermosomore@nvidia.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 29, 2026

Greptile Overview

Greptile Summary

Fixed OOM issue during GeoTransolver inference on large meshes by disabling broadcast_global_features in the datapipe and implementing per-sub-batch broadcasting of global features (fx) in batched_inference_loop.

Key Changes:

  • inference_on_vtk.py: Forces broadcast_global_features: false to keep fx as single token (B, 1, 2) instead of broadcasting to 2M+ tokens upfront
  • inference_on_zarr.py: Dynamically broadcasts fx per sub-batch using expand() (memory-efficient view) or slices pre-broadcast fx for legacy compatibility
  • The approach maintains mathematical equivalence since all broadcast tokens have identical values

Issue Found:

  • The squeeze(1) operation in dimension normalization could incorrectly remove the batch dimension when batch size is 1, potentially causing shape mismatches

Important Files Changed

Filename Overview
CHANGELOG.md Added clear entry documenting the memory fix for GeoTransolver inference
examples/cfd/external_aerodynamics/transformer_models/src/inference_on_vtk.py Forces broadcast_global_features: false in datapipe to prevent pre-broadcasting fx to full mesh size
examples/cfd/external_aerodynamics/transformer_models/src/inference_on_zarr.py Implements per-sub-batch fx broadcasting for GeoTransolver with dimension normalization; potential issue with squeeze logic

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

…rence_on_zarr.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@coreyjadams
Copy link
Collaborator

[reposting from slack for posterity]

Hi @phermosomore - thanks for sharing! 

For the record, the intention in GeoTransolver is to never broadcast global features.  It's something to be set in the config file.  Here's the logic:

  • Transolver takes fx and embeddings and concatenates them at every point.  The only path to put global features into the model is to broadcast to every point, which is a bit wasteful.
  • GeoTransolver specifically avoids this by treating global features in a manner similar to geometry: leave them as un-broadcasted features (shape [B, N_features, C] etc) and encode them into the latent space via the ContextProjector.  In this way, the huge matrix should never be realized.

Additionally, in most of the inference scripts we usually aren't running 10M points at one time but batching them.  Is that what you are doing and seeing the OOM?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants