FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution. #1375

weilr · 2026-02-05T03:13:10Z

PhysicsNeMo Pull Request

Description

FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

Signed-off-by: lrwei <806871005@qq.com>

greptile-apps · 2026-02-05T03:15:07Z

Greptile Overview

Greptile Summary

Refactored split_by_node_equal to properly support multi-GPU execution by incorporating DataLoader worker processes alongside distributed training processes.

Key changes:

Now properly unpacks all 4 values from pytorch_worker_info (rank, world_size, worker, num_workers) instead of ignoring the worker-related values
Calculates global worker ID across both distributed nodes and DataLoader workers: g_worker = rank * num_workers + worker
Adjusts chunk size from world_size to g_world = world_size * num_workers to account for all workers
Simplifies tail handling logic with clearer control flow

The implementation correctly handles the case where PyTorch DataLoaders spawn multiple workers per GPU, ensuring data is properly sharded across all workers in a multi-GPU setup.

Important Files Changed

Filename	Overview
examples/cfd/external_aerodynamics/figconvnet/src/data/components/webdataset_utils.py	Refactored `split_by_node_equal` to properly support multi-GPU execution by handling worker processes alongside distributed processes

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

coreyjadams

Hi @weilr - Thanks for submitting this. I have one small question but otherwise looks reasonable to me.

coreyjadams · 2026-02-10T15:00:06Z

examples/cfd/external_aerodynamics/figconvnet/src/data/components/webdataset_utils.py

-    if not drop_last and tail_size > 0:
-        yield next_items[rank % tail_size]
+    it = iter(src)
+    while True:


I'm not super familiar with this data tool. Is there an alternative to an infinite loop here?

FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution.

ac0d126

Signed-off-by: lrwei <806871005@qq.com>

greptile-apps bot reviewed Feb 5, 2026

View reviewed changes

coreyjadams self-requested a review February 10, 2026 14:56

Merge branch 'main' into figconvnet_fixed

8b191b7

coreyjadams approved these changes Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution. #1375

FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution. #1375

weilr commented Feb 5, 2026

Uh oh!

greptile-apps bot commented Feb 5, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

coreyjadams left a comment

Uh oh!

coreyjadams Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution. #1375

Are you sure you want to change the base?

FIGConvNet: fixed 'split_by_node_equal', supports multi-GPU execution. #1375

Conversation

weilr commented Feb 5, 2026

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

greptile-apps bot commented Feb 5, 2026

Greptile Overview

Greptile Summary

Important Files Changed

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

coreyjadams left a comment

Choose a reason for hiding this comment

Uh oh!

coreyjadams Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants